Builds 02_text_cleaner.py from stub to working: character-level hygiene for CSV/Excel inputs covering trim, whitespace collapse, smart-character folding, Unicode NFC/NFKC, BOM strip, zero-width strip, control-char strip, line-ending normalization, and per-column case conversion. Three presets (minimal/excel-hygiene/paranoid) keep the buyer surface small. - src/core/text_clean.py: pure helpers + CleanOptions/CleanResult + clean_dataframe with dtype-safe column selection - src/cli_text_clean.py: Typer CLI mirroring the dedup CLI shape (dry-run by default, --apply writes cleaned + changes audit, JSON config save/load) - src/gui/pages/2_Text_Cleaner.py: real Streamlit page with preset picker, advanced toggles, preview, before/after metrics, and three download buttons - tests/test_text_clean.py + test_cli_text_clean.py: 92 new tests covering edge cases E1-E50 from the spec - samples/messy_text.csv: demo dataset surfacing UC1, UC3, UC6, UC10 in 10 rows - test-cases/uc16-uc26 + ec05-ec09: per-use-case and per-edge-case fixtures Docs: TECHNICAL.md §10.2 (full Tier 1/2/3 spec), DECISIONS.md v1.7 entry locking the spec, CLI-REFERENCE.md gains the text cleaner section, README.md gains a top-level Text Cleaner block, USER-GUIDE.md status row 02 promoted Skeleton -> Working. 200/200 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Excel & CSV Data Cleaning Mastery Bundle
Ready-to-sell Python automation product. 9 scripts for data cleaning, deduplication, text hygiene, formatting, merging, validation, and reporting.
Each script ships with both a GUI (runs in your browser locally, no internet needed) and a CLI.
Cross-platform: Windows, macOS, Linux.
Quick Start (for buyers)
- Download the installer for your operating system.
- Run the installer. No Python knowledge required.
- Launch via the desktop shortcut "Launch Bundle" (or the app icon on macOS, or the AppImage on Linux).
- Your default browser opens to a local page where the data tool runs. Your data never leaves your computer.
Full instructions: see USER-GUIDE.md.
Documentation Index
Ships with the product (buyer-facing)
- USER-GUIDE.md - Installation, script reference, usage examples for both GUI and CLI.
Creator-only (do not ship to buyers)
- BUSINESS.md - Business case, market analysis, pricing, marketing strategy (including the hosted browser demo as a conversion lever).
- TECHNICAL.md - Architecture (dual CLI + Streamlit GUI), build pipeline, dev standards.
- DECISIONS.md - Locked criteria, scoring rubric, decisions log, rationale for product choices including the GUI framework decision.
- RECOVERY.md - How to rebuild the entire project from scratch if lost.
Version: 1.6 Last updated: April 28, 2026 Owner: Michael