Files
Michael db5ec084da docs+code: rename tool labels everywhere
Sweep follow-up to 93e43fc. Display labels now consistent across docs,
landing pages, CLI output, code comments, docstrings, and test prose.
Five parallel surfaces touched:

- docs (EN + ES): README, USER-GUIDE, CLI-REFERENCE, and 11 internal
  design/planning docs
- landing pages: index + bookkeeper/revops/shopify-pet
- src: CLI module docstrings, _TOOL_DISPLAY dicts in cli_analyze.py
  and gui/components/_legacy.py, core module headers, every tool
  page's module docstring
- tests: class/method/module docstrings and section-header comments
- test-cases READMEs

Page slugs (1_Deduplicator etc.), tool_id strings (01_deduplicator
etc.), Python class names (TestDeduplicatorWorkflow, FeatureFlag.*),
URL paths, anchor IDs, CSS classes, and asset filenames were left
intact since they're code identifiers / structural references.

All 2033 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 19:50:09 +00:00
..

Clean Text Test Corpus

Test fixtures for 02_text_cleaner.py (Excel & CSV Data Cleaning Mastery Bundle).

Layout

text_cleaner_test_corpus/
├── README.md                # This file
├── TEST-CASES.md            # Full taxonomy and expected behavior per test
├── generate_test_data.py    # Regenerates the 20 CSV inputs and expected outputs
├── generate_xlsx.py         # Regenerates the multi-sheet XLSX fixture
├── test_data/               # Inputs (21 fixtures: 20 CSV + 1 XLSX)
└── expected/                # Expected outputs (with default and flag variants)

Quick start

Read TEST-CASES.md from top to bottom. Sections 1 (scope boundary) and 2 (default config assumed) are load-bearing; the per-test details in Section 4 don't make sense without them.

To regenerate the test files (e.g., after editing the generator):

python generate_test_data.py
python generate_xlsx.py

To use as pytest fixtures: see Section 6 of TEST-CASES.md.

Coverage summary

Category Fixtures
Whitespace (ASCII + Unicode) 01, 02
Smart punctuation 03
Unicode normalization 04
Invisible / zero-width / control 05, 06
BOM 07
Line endings (file-level + embedded) 08, 09, 10, 11
Case operations (opt-in) 12
International script preservation 13
Mojibake 14
Boundary with script 04 (missing values) 15
Headers 16, 19
Negative tests (must NOT touch) 17
File-level edge cases 18, 19
Integration 20
Excel-specific (multi-sheet, Alt+Enter) 21

Out of scope

Documented in TEST-CASES.md Section 5: encoding detection, large-file performance, GUI behavior, file-locking, CLI argument parsing. Each needs its own test layer.