Files
datatools-dev/docs
Michael 54f92ae47e feat: implement text cleaner (script 02) with CLI, GUI, and tests
Builds 02_text_cleaner.py from stub to working: character-level hygiene
for CSV/Excel inputs covering trim, whitespace collapse, smart-character
folding, Unicode NFC/NFKC, BOM strip, zero-width strip, control-char
strip, line-ending normalization, and per-column case conversion. Three
presets (minimal/excel-hygiene/paranoid) keep the buyer surface small.

- src/core/text_clean.py: pure helpers + CleanOptions/CleanResult +
  clean_dataframe with dtype-safe column selection
- src/cli_text_clean.py: Typer CLI mirroring the dedup CLI shape
  (dry-run by default, --apply writes cleaned + changes audit, JSON
  config save/load)
- src/gui/pages/2_Text_Cleaner.py: real Streamlit page with preset
  picker, advanced toggles, preview, before/after metrics, and three
  download buttons
- tests/test_text_clean.py + test_cli_text_clean.py: 92 new tests
  covering edge cases E1-E50 from the spec
- samples/messy_text.csv: demo dataset surfacing UC1, UC3, UC6, UC10
  in 10 rows
- test-cases/uc16-uc26 + ec05-ec09: per-use-case and per-edge-case
  fixtures

Docs: TECHNICAL.md §10.2 (full Tier 1/2/3 spec), DECISIONS.md v1.7
entry locking the spec, CLI-REFERENCE.md gains the text cleaner
section, README.md gains a top-level Text Cleaner block, USER-GUIDE.md
status row 02 promoted Skeleton -> Working.

200/200 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 15:14:15 +00:00
..

Excel & CSV Data Cleaning Mastery Bundle

Ready-to-sell Python automation product. 9 scripts for data cleaning, deduplication, text hygiene, formatting, merging, validation, and reporting.

Each script ships with both a GUI (runs in your browser locally, no internet needed) and a CLI.

Cross-platform: Windows, macOS, Linux.


Quick Start (for buyers)

  1. Download the installer for your operating system.
  2. Run the installer. No Python knowledge required.
  3. Launch via the desktop shortcut "Launch Bundle" (or the app icon on macOS, or the AppImage on Linux).
  4. Your default browser opens to a local page where the data tool runs. Your data never leaves your computer.

Full instructions: see USER-GUIDE.md.


Documentation Index

Ships with the product (buyer-facing)

  • USER-GUIDE.md - Installation, script reference, usage examples for both GUI and CLI.

Creator-only (do not ship to buyers)

  • BUSINESS.md - Business case, market analysis, pricing, marketing strategy (including the hosted browser demo as a conversion lever).
  • TECHNICAL.md - Architecture (dual CLI + Streamlit GUI), build pipeline, dev standards.
  • DECISIONS.md - Locked criteria, scoring rubric, decisions log, rationale for product choices including the GUI framework decision.
  • RECOVERY.md - How to rebuild the entire project from scratch if lost.

Version: 1.6 Last updated: April 28, 2026 Owner: Michael