docs+code: rename tool labels everywhere
Sweep follow-up to 93e43fc. Display labels now consistent across docs,
landing pages, CLI output, code comments, docstrings, and test prose.
Five parallel surfaces touched:
- docs (EN + ES): README, USER-GUIDE, CLI-REFERENCE, and 11 internal
design/planning docs
- landing pages: index + bookkeeper/revops/shopify-pet
- src: CLI module docstrings, _TOOL_DISPLAY dicts in cli_analyze.py
and gui/components/_legacy.py, core module headers, every tool
page's module docstring
- tests: class/method/module docstrings and section-header comments
- test-cases READMEs
Page slugs (1_Deduplicator etc.), tool_id strings (01_deduplicator
etc.), Python class names (TestDeduplicatorWorkflow, FeatureFlag.*),
URL paths, anchor IDs, CSS classes, and asset filenames were left
intact since they're code identifiers / structural references.
All 2033 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -34,8 +34,8 @@ src/
|
||||
normalizers.py # Per-column normalizers for dedup matching
|
||||
text_clean.py # clean_dataframe + smart_title_case
|
||||
_constants.py # Shared USPS abbrevs + state names
|
||||
cli.py # Deduplicator CLI (Typer)
|
||||
cli_text_clean.py # Text Cleaner CLI
|
||||
cli.py # Find Duplicates CLI (Typer)
|
||||
cli_text_clean.py # Clean Text CLI
|
||||
cli_analyze.py # Analyzer CLI (--json)
|
||||
gui/
|
||||
app.py # Streamlit entry point
|
||||
@@ -192,7 +192,7 @@ GUI / CLI handlers use `format_for_user()` so the user always sees: file path, o
|
||||
|
||||
| Bundle | Status |
|
||||
|--------|--------|
|
||||
| Data Cleaning Mastery | 3/9 tools Ready (Dedup, Text Cleaner, Format Standardizer); 6 stubs |
|
||||
| Data Cleaning Mastery | 3/9 tools Ready (Find Duplicates, Clean Text, Standardize Formats); 6 stubs |
|
||||
| Automated Business Reporting | Not started |
|
||||
| Ecommerce Data Pipeline | Not started |
|
||||
| Small Business Finance | Not started |
|
||||
@@ -214,12 +214,12 @@ Deliberately separate. Confluent original spec was wrong.
|
||||
|
||||
| Script | Owns |
|
||||
|--------|------|
|
||||
| 04 Missing Value Handler | "What's not there." Disguised nulls (`N/A`, `-`, sentinel codes), missingness patterns, imputation, drop-by-threshold. |
|
||||
| 06 Outlier Detector | "What shouldn't be there." z-score / IQR / modified-z, multivariate (Isolation Forest, Mahalanobis), domain rules, winsorization. |
|
||||
| 04 Fix Missing Values | "What's not there." Disguised nulls (`N/A`, `-`, sentinel codes), missingness patterns, imputation, drop-by-threshold. |
|
||||
| 06 Find Unusual Values | "What shouldn't be there." z-score / IQR / modified-z, multivariate (Isolation Forest, Mahalanobis), domain rules, winsorization. |
|
||||
|
||||
**Run order**: 04 before 06. Outlier stats on data with `NaN` / sentinels are mathematically poisoned (means dragged, IQR widens, false negatives).
|
||||
|
||||
**Pipeline order** (Pipeline Runner enforces): 02 → 03 → 04 → 05 → 06 → 07 → 08. 01 is order-flexible.
|
||||
**Pipeline order** (Automated Workflows enforces): 02 → 03 → 04 → 05 → 06 → 07 → 08. 01 is order-flexible.
|
||||
|
||||
**Contested cases**:
|
||||
- Whitespace-only cell — 02 trims to empty; 04 then flags empty as null.
|
||||
|
||||
Reference in New Issue
Block a user