docs+code: rename tool labels everywhere

Sweep follow-up to 93e43fc. Display labels now consistent across docs,
landing pages, CLI output, code comments, docstrings, and test prose.
Five parallel surfaces touched:

- docs (EN + ES): README, USER-GUIDE, CLI-REFERENCE, and 11 internal
  design/planning docs
- landing pages: index + bookkeeper/revops/shopify-pet
- src: CLI module docstrings, _TOOL_DISPLAY dicts in cli_analyze.py
  and gui/components/_legacy.py, core module headers, every tool
  page's module docstring
- tests: class/method/module docstrings and section-header comments
- test-cases READMEs

Page slugs (1_Deduplicator etc.), tool_id strings (01_deduplicator
etc.), Python class names (TestDeduplicatorWorkflow, FeatureFlag.*),
URL paths, anchor IDs, CSS classes, and asset filenames were left
intact since they're code identifiers / structural references.

All 2033 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-16 19:50:09 +00:00
parent 93e43fc0d9
commit db5ec084da
57 changed files with 205 additions and 205 deletions

View File

@@ -34,8 +34,8 @@ src/
normalizers.py # Per-column normalizers for dedup matching
text_clean.py # clean_dataframe + smart_title_case
_constants.py # Shared USPS abbrevs + state names
cli.py # Deduplicator CLI (Typer)
cli_text_clean.py # Text Cleaner CLI
cli.py # Find Duplicates CLI (Typer)
cli_text_clean.py # Clean Text CLI
cli_analyze.py # Analyzer CLI (--json)
gui/
app.py # Streamlit entry point
@@ -192,7 +192,7 @@ GUI / CLI handlers use `format_for_user()` so the user always sees: file path, o
| Bundle | Status |
|--------|--------|
| Data Cleaning Mastery | 3/9 tools Ready (Dedup, Text Cleaner, Format Standardizer); 6 stubs |
| Data Cleaning Mastery | 3/9 tools Ready (Find Duplicates, Clean Text, Standardize Formats); 6 stubs |
| Automated Business Reporting | Not started |
| Ecommerce Data Pipeline | Not started |
| Small Business Finance | Not started |
@@ -214,12 +214,12 @@ Deliberately separate. Confluent original spec was wrong.
| Script | Owns |
|--------|------|
| 04 Missing Value Handler | "What's not there." Disguised nulls (`N/A`, `-`, sentinel codes), missingness patterns, imputation, drop-by-threshold. |
| 06 Outlier Detector | "What shouldn't be there." z-score / IQR / modified-z, multivariate (Isolation Forest, Mahalanobis), domain rules, winsorization. |
| 04 Fix Missing Values | "What's not there." Disguised nulls (`N/A`, `-`, sentinel codes), missingness patterns, imputation, drop-by-threshold. |
| 06 Find Unusual Values | "What shouldn't be there." z-score / IQR / modified-z, multivariate (Isolation Forest, Mahalanobis), domain rules, winsorization. |
**Run order**: 04 before 06. Outlier stats on data with `NaN` / sentinels are mathematically poisoned (means dragged, IQR widens, false negatives).
**Pipeline order** (Pipeline Runner enforces): 02 → 03 → 04 → 05 → 06 → 07 → 08. 01 is order-flexible.
**Pipeline order** (Automated Workflows enforces): 02 → 03 → 04 → 05 → 06 → 07 → 08. 01 is order-flexible.
**Contested cases**:
- Whitespace-only cell — 02 trims to empty; 04 then flags empty as null.