The prior attempt used data-testid=stSidebarNavSectionHeader, which is not what Streamlit 1.57 emits — the correct testid is stNavSectionHeader (verified against the bundled JS in streamlit/static/static/js/). The section header is also a <div> with onClick, not a <button>, and the React component keeps the expanded state in a prop without surfacing aria-expanded on the DOM. Pure CSS can therefore neither locate the header nor switch the glyph by state, which is why the chevron was unchanged in the rendered UI. Switch strategies: - CSS now targets the correct stNavSectionHeader / stIconMaterial selectors, drops the Material Symbols font from the icon span, and restyles it so a plain ascii character reads as proper typography (size, weight, color, hover). - Add _SWAP_NAV_SECTION_INDICATOR_JS — small inline script that rewrites the icon's text node from "expand_more"/"expand_less" to "+"/"−" (U+2212), throttled via requestAnimationFrame, re-applied on every DOM mutation by a MutationObserver. Bundled into the same iframe injection as the existing brand/upload/findings scripts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
🌐 Language: English · Español
DataTools
Local CSV / Excel cleaning. CLI + browser GUI, no cloud, no install ceremony. GUI ships with English and Spanish language packs.
Tools
| # | Tool | Status |
|---|---|---|
| 01 | Find Duplicates — exact + fuzzy match, 5 normalizers, survivor rules, audit | Ready |
| 02 | Clean Text — whitespace, smart chars, BOM, line endings, case ops | Ready |
| 03 | Standardize Formats — dates, phones, emails, addresses, names, currencies, booleans | Ready |
| 04 | Fix Missing Values — disguised-null detection, profile, mean/median/mode/ffill/bfill/interpolate, drop strategies | Ready |
| 05 | Map Columns — fuzzy auto-rename, target schema with type coercion, required fields with defaults, drop/reorder | Ready |
| 06 | Find Unusual Values | Coming Soon |
| 07 | Combine Files | Coming Soon |
| 08 | Quality Check | Coming Soon |
| 09 | Automated Workflows — chain tools with recommended (not forced) order, save/load JSON, automate weekly cleanups | Ready |
Download (non-technical users)
Pre-built bundles — no Python install, no admin rights, no internet at runtime. Each release ships two flavors per OS: an installer that wires up Desktop + Start Menu / Launchpad shortcuts, and a portable .zip you unzip and double-click. Pick whichever your IT policy allows.
| Platform | Installer (recommended) | Portable (no install) |
|---|---|---|
| macOS | DataTools-X.Y.Z-mac.dmg — open, drag DataTools.app into /Applications, launch from Launchpad. |
DataTools-X.Y.Z-mac-portable.zip — unzip anywhere, double-click DataTools.app. |
| Windows | DataTools-X.Y.Z-win-setup.exe — run installer (per-user, no admin). Desktop shortcut + Start Menu entry created. |
DataTools-X.Y.Z-win-portable.zip — unzip anywhere, double-click DataTools.exe. |
| Linux | DataTools-X.Y.Z-linux-x86_64.AppImage — chmod +x, double-click. |
The AppImage is already portable. |
Latest release: see GitHub Releases (or the Gumroad listing). Each bundle is ~200 MB unpacked; on first launch the app starts a local server at http://127.0.0.1:8501 and opens your default browser. Nothing leaves your machine — installers and portables are byte-identical inside.
First-launch warnings (one-time):
- macOS unsigned builds: right-click → Open → confirm. (Signed builds skip this.)
- Windows SmartScreen: click More info → Run anyway.
Detailed install + troubleshooting walkthrough: User Guide §1.
Install from source (developers)
pip install -r requirements.txt
Python 3.10+ required.
Run
GUI (recommended):
streamlit run src/gui/app.py
CLI — seven entry points:
python -m src.cli customers.csv [--apply] # dedup
python -m src.cli_text_clean messy.csv [--apply] # text clean
python -m src.cli_format intl.csv [--apply] # format standardize (auto-streams >100 MB)
python -m src.cli_missing holes.csv [--apply] # missing values
python -m src.cli_column_map vendor.csv [--apply] # column mapper
python -m src.cli_pipeline any_file.csv [--apply] # chain tools end-to-end
python -m src.cli_analyze any_file.csv [--json] # scan only
Every CLI runs preview-only by default; add --apply to write output.
Language
The GUI sidebar has a language picker. Packs ship for English and Español (src/i18n/packs/); the choice persists for the session. Adding a language: drop a <code>.json next to en.json mirroring its key tree, then list it in LANGUAGES. See Developer Guide §i18n.
Review & Normalize gate
Every uploaded file passes through a CSV-normalization gate before any tool sees it. The analyzer flags ~15 issue types (whitespace, NBSP / zero-width chars, BOM, encoding, smart punct, dirty headers, null sentinels, mojibake, …) tagged by confidence (high / medium / low) and fix action. The GUI shows each finding with Auto-fix / Skip / Customize, a live before/after preview, and an encoding-override picker. Tool pages refuse to load until the gate passes.
Output
Every run writes:
{input}_<tool>.csv— the cleaned data{input}_changes.csv(text cleaner) or{input}_match_groups.csv(dedup) — audit traillogs/<tool>_YYYYMMDD_HHMMSS.log— debug-level run log
Original input file is never modified.
Docs
- User Guide — install, GUI workflow, gate
- CLI Reference — every flag with recipes
- Requirements — file sizes, encodings, detectors, perf targets
- Technical — architecture, gate internals, fix registry
- Developer Guide — adding fixes / detectors / standardizers
Dependencies
pandas, openpyxl, rapidfuzz, phonenumbers, typer, loguru, charset-normalizer, streamlit. Optional: ftfy for mojibake repair.
License
Proprietary.