Michael fe4b5dc755 fix(sidebar): correct testid + JS swap so +/− actually renders
The prior attempt used data-testid=stSidebarNavSectionHeader, which is
not what Streamlit 1.57 emits — the correct testid is stNavSectionHeader
(verified against the bundled JS in streamlit/static/static/js/).
The section header is also a <div> with onClick, not a <button>, and
the React component keeps the expanded state in a prop without
surfacing aria-expanded on the DOM. Pure CSS can therefore neither
locate the header nor switch the glyph by state, which is why the
chevron was unchanged in the rendered UI.

Switch strategies:
- CSS now targets the correct stNavSectionHeader / stIconMaterial
  selectors, drops the Material Symbols font from the icon span, and
  restyles it so a plain ascii character reads as proper typography
  (size, weight, color, hover).
- Add _SWAP_NAV_SECTION_INDICATOR_JS — small inline script that
  rewrites the icon's text node from "expand_more"/"expand_less" to
  "+"/"−" (U+2212), throttled via requestAnimationFrame, re-applied
  on every DOM mutation by a MutationObserver. Bundled into the same
  iframe injection as the existing brand/upload/findings scripts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-02 17:52:47 +00:00

🌐 Language: English · Español

DataTools

Local CSV / Excel cleaning. CLI + browser GUI, no cloud, no install ceremony. GUI ships with English and Spanish language packs.

Tools

# Tool Status
01 Find Duplicates — exact + fuzzy match, 5 normalizers, survivor rules, audit Ready
02 Clean Text — whitespace, smart chars, BOM, line endings, case ops Ready
03 Standardize Formats — dates, phones, emails, addresses, names, currencies, booleans Ready
04 Fix Missing Values — disguised-null detection, profile, mean/median/mode/ffill/bfill/interpolate, drop strategies Ready
05 Map Columns — fuzzy auto-rename, target schema with type coercion, required fields with defaults, drop/reorder Ready
06 Find Unusual Values Coming Soon
07 Combine Files Coming Soon
08 Quality Check Coming Soon
09 Automated Workflows — chain tools with recommended (not forced) order, save/load JSON, automate weekly cleanups Ready

Download (non-technical users)

Pre-built bundles — no Python install, no admin rights, no internet at runtime. Each release ships two flavors per OS: an installer that wires up Desktop + Start Menu / Launchpad shortcuts, and a portable .zip you unzip and double-click. Pick whichever your IT policy allows.

Platform Installer (recommended) Portable (no install)
macOS DataTools-X.Y.Z-mac.dmg — open, drag DataTools.app into /Applications, launch from Launchpad. DataTools-X.Y.Z-mac-portable.zip — unzip anywhere, double-click DataTools.app.
Windows DataTools-X.Y.Z-win-setup.exe — run installer (per-user, no admin). Desktop shortcut + Start Menu entry created. DataTools-X.Y.Z-win-portable.zip — unzip anywhere, double-click DataTools.exe.
Linux DataTools-X.Y.Z-linux-x86_64.AppImagechmod +x, double-click. The AppImage is already portable.

Latest release: see GitHub Releases (or the Gumroad listing). Each bundle is ~200 MB unpacked; on first launch the app starts a local server at http://127.0.0.1:8501 and opens your default browser. Nothing leaves your machine — installers and portables are byte-identical inside.

First-launch warnings (one-time):

  • macOS unsigned builds: right-click → Open → confirm. (Signed builds skip this.)
  • Windows SmartScreen: click More infoRun anyway.

Detailed install + troubleshooting walkthrough: User Guide §1.

Install from source (developers)

pip install -r requirements.txt

Python 3.10+ required.

Run

GUI (recommended):

streamlit run src/gui/app.py

CLI — seven entry points:

python -m src.cli            customers.csv [--apply]   # dedup
python -m src.cli_text_clean messy.csv     [--apply]   # text clean
python -m src.cli_format     intl.csv      [--apply]   # format standardize (auto-streams >100 MB)
python -m src.cli_missing    holes.csv     [--apply]   # missing values
python -m src.cli_column_map vendor.csv    [--apply]   # column mapper
python -m src.cli_pipeline   any_file.csv  [--apply]   # chain tools end-to-end
python -m src.cli_analyze    any_file.csv  [--json]    # scan only

Every CLI runs preview-only by default; add --apply to write output.

Language

The GUI sidebar has a language picker. Packs ship for English and Español (src/i18n/packs/); the choice persists for the session. Adding a language: drop a <code>.json next to en.json mirroring its key tree, then list it in LANGUAGES. See Developer Guide §i18n.

Review & Normalize gate

Every uploaded file passes through a CSV-normalization gate before any tool sees it. The analyzer flags ~15 issue types (whitespace, NBSP / zero-width chars, BOM, encoding, smart punct, dirty headers, null sentinels, mojibake, …) tagged by confidence (high / medium / low) and fix action. The GUI shows each finding with Auto-fix / Skip / Customize, a live before/after preview, and an encoding-override picker. Tool pages refuse to load until the gate passes.

Output

Every run writes:

  • {input}_<tool>.csv — the cleaned data
  • {input}_changes.csv (text cleaner) or {input}_match_groups.csv (dedup) — audit trail
  • logs/<tool>_YYYYMMDD_HHMMSS.log — debug-level run log

Original input file is never modified.

Docs

Dependencies

pandas, openpyxl, rapidfuzz, phonenumbers, typer, loguru, charset-normalizer, streamlit. Optional: ftfy for mojibake repair.

License

Proprietary.

Description
Data tools development
Readme 7.7 MiB
Languages
Python 87.3%
HTML 10%
CSS 1.8%
Shell 0.4%
JavaScript 0.2%
Other 0.2%