The chrome-hiding CSS was removing the Streamlit header wholesale, which also took the sidebar's expand chevron with it — a collapsed sidebar became unreopenable. Make the header transparent instead and explicitly preserve the sidebar collapsed-control. Also add a Quit button in the app footer that signals the Streamlit server (SIGTERM, falling back to SIGINT) so closing the GUI returns the shell prompt cleanly instead of leaving Python hung. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DataTools
Local CSV / Excel cleaning. CLI + browser GUI, no cloud, no install ceremony.
Tools
| # | Tool | Status |
|---|---|---|
| 01 | Deduplicator — exact + fuzzy match, 5 normalizers, survivor rules, audit | Ready |
| 02 | Text Cleaner — whitespace, smart chars, BOM, line endings, case ops | Ready |
| 03 | Format Standardizer — dates, phones, emails, addresses, names, currencies, booleans | Ready |
| 04 | Missing Value Handler — disguised-null detection, profile, mean/median/mode/ffill/bfill/interpolate, drop strategies | Ready |
| 05 | Column Mapper — fuzzy auto-rename, target schema with type coercion, required fields with defaults, drop/reorder | Ready |
| 06 | Outlier Detector | Coming Soon |
| 07 | Multi-File Merger | Coming Soon |
| 08 | Validator & Reporter | Coming Soon |
| 09 | Pipeline Runner — chain tools with recommended (not forced) order, save/load JSON, automate weekly cleanups | Ready |
Install
pip install -r requirements.txt
Python 3.10+ required.
Run
GUI (recommended):
streamlit run src/gui/app.py
CLI — seven entry points:
python -m src.cli customers.csv [--apply] # dedup
python -m src.cli_text_clean messy.csv [--apply] # text clean
python -m src.cli_format intl.csv [--apply] # format standardize (auto-streams >100 MB)
python -m src.cli_missing holes.csv [--apply] # missing values
python -m src.cli_column_map vendor.csv [--apply] # column mapper
python -m src.cli_pipeline any_file.csv [--apply] # chain tools end-to-end
python -m src.cli_analyze any_file.csv [--json] # scan only
Every CLI runs preview-only by default; add --apply to write output.
Review & Normalize gate
Every uploaded file passes through a CSV-normalization gate before any tool sees it. The analyzer flags ~15 issue types (whitespace, NBSP / zero-width chars, BOM, encoding, smart punct, dirty headers, null sentinels, mojibake, …) tagged by confidence (high / medium / low) and fix action. The GUI shows each finding with Auto-fix / Skip / Customize, a live before/after preview, and an encoding-override picker. Tool pages refuse to load until the gate passes.
Output
Every run writes:
{input}_<tool>.csv— the cleaned data{input}_changes.csv(text cleaner) or{input}_match_groups.csv(dedup) — audit traillogs/<tool>_YYYYMMDD_HHMMSS.log— debug-level run log
Original input file is never modified.
Docs
- User Guide — install, GUI workflow, gate
- CLI Reference — every flag with recipes
- Requirements — file sizes, encodings, detectors, perf targets
- Technical — architecture, gate internals, fix registry
- Developer Guide — adding fixes / detectors / standardizers
Dependencies
pandas, openpyxl, rapidfuzz, phonenumbers, typer, loguru, charset-normalizer, streamlit. Optional: ftfy for mojibake repair.
License
Proprietary.