docs+code: rename tool labels everywhere

Sweep follow-up to 93e43fc. Display labels now consistent across docs,
landing pages, CLI output, code comments, docstrings, and test prose.
Five parallel surfaces touched:

- docs (EN + ES): README, USER-GUIDE, CLI-REFERENCE, and 11 internal
  design/planning docs
- landing pages: index + bookkeeper/revops/shopify-pet
- src: CLI module docstrings, _TOOL_DISPLAY dicts in cli_analyze.py
  and gui/components/_legacy.py, core module headers, every tool
  page's module docstring
- tests: class/method/module docstrings and section-header comments
- test-cases READMEs

Page slugs (1_Deduplicator etc.), tool_id strings (01_deduplicator
etc.), Python class names (TestDeduplicatorWorkflow, FeatureFlag.*),
URL paths, anchor IDs, CSS classes, and asset filenames were left
intact since they're code identifiers / structural references.

All 2033 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-16 19:50:09 +00:00
parent 93e43fc0d9
commit db5ec084da
57 changed files with 205 additions and 205 deletions

View File

@@ -76,7 +76,7 @@ Sample size: 1,000 rows (configurable).
- Full-DataFrame `auto_fix`: ~5 min (~30 µs/cell).
- Output write: ~10 s.
- Recommended RAM: 34× input size for the full-Apply path.
- **Format standardizer** (`standardize_dataframe`): ~2.7M rows/sec on
- **Standardize Formats** (`standardize_dataframe`): ~2.7M rows/sec on
cache-warm repetition-heavy columns (synthetic 1M-row in-memory
benchmark, 2 typed columns); the fused single-pass loop replaced a
3-pass ``.tolist()`` cycle, so per-call overhead is now dominated by
@@ -87,20 +87,20 @@ Sample size: 1,000 rows (configurable).
thread-pool scaffolding; on CPython 3.12 with the GIL it's
roughly neutral, but the API is ready for the free-threaded
(PEP 703) Python 3.13+ build where it will help.
- **Text cleaner** (`clean_dataframe`): ~1M rows/sec on
- **Clean Text** (`clean_dataframe`): ~1M rows/sec on
repetition-heavy columns (per-call string cache: the pipeline runs
once per *unique* cell value, not once per row).
- **Missing handler** (`handle_missing`): lazy-copy — when sentinel
- **Fix Missing Values** (`handle_missing`): lazy-copy — when sentinel
standardization runs but finds nothing, AND no drops AND no fills
apply, the input frame is returned as-is. On a clean 1 GB file this
saves the 1 GB allocation that the unconditional upfront copy used
to take.
- **Column mapper** (`map_columns`): rename + drop both already
- **Map Columns** (`map_columns`): rename + drop both already
return fresh frames; the explicit upfront `df.copy()` is now
removed and downstream mutating steps (schema-add, coerce) copy on
demand via `_ensure_owned()`. Rename-only and identity-mapping
paths run with zero explicit copies.
- **Deduplicator**:
- **Find Duplicates**:
- **Exact-only strategies** (every column uses `Algorithm.EXACT` at
threshold 100 — covers strong-key dedup like email/phone, the
fallback drop-duplicates path, and explicit "match on this exact
@@ -117,19 +117,19 @@ Sample size: 1,000 rows (configurable).
(the common dedup workload) skip re-parsing.
## 11. Tools
1. Deduplicator — Ready
2. Text Cleaner — Ready
3. Format Standardizer — Ready
4. Missing Value Handler — Ready
5. Column Mapper — Ready
6. Outlier Detector — Coming Soon
7. Multi-File Merger — Coming Soon
8. Validator & Reporter — Coming Soon
9. Pipeline Runner — Ready
1. Find Duplicates — Ready
2. Clean Text — Ready
3. Standardize Formats — Ready
4. Fix Missing Values — Ready
5. Map Columns — Ready
6. Find Unusual Values — Coming Soon
7. Combine Files — Coming Soon
8. Quality Check — Coming Soon
9. Automated Workflows — Ready
### 11.a Recommended pipeline order (soft, not enforced)
The Pipeline Runner ships with a `SOFT_DEPENDENCIES` table; the
Automated Workflows ships with a `SOFT_DEPENDENCIES` table; the
following ordering is the default and the basis of the warning
surface. Re-ordering is allowed; the runner emits a warning string
and proceeds.
@@ -214,7 +214,7 @@ and proceeds.
fresh blob without losing the embedded buyer identity. Tier may
change during renewal (Lite → Core upgrade path).
- **Tiers**:
- ``lite`` — Deduplicator + Text Cleaner + Format Standardizer.
- ``lite`` — Find Duplicates + Clean Text + Standardize Formats.
Buyer pays once, gets the three universally-useful tools.
- ``core`` — every Ready tool (all 9 in v1.6).
- ``pro``, ``enterprise`` — scaffolded for future SKUs; currently