Michael 59c6d0f914 fix(audit): defensive wrap so audit failures can never blank the GUI
Reported: after pulling commit c73d716 (audit log) the main body of
every page showed empty. Sidebar nav still worked.

Diagnosis: the most likely path is that something inside the audit
calls — ``_render_diagnostics_sidebar()`` calling ``audit_log_path()``,
or ``log_session_start()`` itself — raises during ``hide_streamlit_chrome``
on the user's environment (Python 3.14 on Windows, a less-tested
combo than the test environment). Streamlit's script runner sees the
exception, and on some chrome paths it eats it without surfacing an
error block, leaving the page body empty.

The audit log is best-effort by design. Make that contract real:

1. ``hide_streamlit_chrome`` now wraps both ``log_session_start()``
   and ``_render_diagnostics_sidebar()`` in try/except. Errors print
   to stderr (so the developer running ``python -m src.gui`` sees
   them in the launcher's console) but never bubble up to kill the
   page render.

2. ``audit_log_path()`` already had a tempdir fallback for the
   primary mkdir failure, but the SECOND mkdir wasn't protected
   either. Restructured to a two-level fallback: configured dir →
   tempdir → ``/dev/null`` (or ``NUL`` on Windows). The last fallback
   ensures the function never raises; ``log_event``'s own try/except
   handles the eventual unwritable-file case.

3. ``log_page_open(slug)`` now has an outer try/except so it cannot
   raise either — protecting every tool page's render path.

If a user reports the same symptom again, the launcher terminal will
now show a real traceback explaining what's wrong, and the GUI will
still render normally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 02:00:31 +00:00

🌐 Language: English · Español

DataTools

Local CSV / Excel cleaning. CLI + browser GUI, no cloud, no install ceremony. GUI ships with English and Spanish language packs.

Tools

# Tool Status
01 Find Duplicates — exact + fuzzy match, 5 normalizers, survivor rules, audit Ready
02 Clean Text — whitespace, smart chars, BOM, line endings, case ops Ready
03 Standardize Formats — dates, phones, emails, addresses, names, currencies, booleans Ready
04 Fix Missing Values — disguised-null detection, profile, mean/median/mode/ffill/bfill/interpolate, drop strategies Ready
05 Map Columns — fuzzy auto-rename, target schema with type coercion, required fields with defaults, drop/reorder Ready
06 Find Unusual Values Coming Soon
07 Combine Files Coming Soon
08 Quality Check Coming Soon
09 Automated Workflows — chain tools with recommended (not forced) order, save/load JSON, automate weekly cleanups Ready

Download (non-technical users)

Pre-built installers — no Python required:

Platform Download First-launch note
macOS DataTools-X.Y.Z-mac.dmg Drag DataTools.app into /Applications, then double-click.
Windows DataTools-X.Y.Z-win-setup.exe Run the installer; launches from Start Menu.
Linux DataTools-X.Y.Z-linux-x86_64.AppImage chmod +x the file, then double-click.

Latest release: see GitHub Releases (or the Gumroad listing). The installers are ~150200 MB; the launcher boots a local server at http://127.0.0.1:8501 and opens your browser. Nothing is sent to the cloud.

Install from source (developers)

pip install -r requirements.txt

Python 3.10+ required.

Run

GUI (recommended):

streamlit run src/gui/app.py

CLI — seven entry points:

python -m src.cli            customers.csv [--apply]   # dedup
python -m src.cli_text_clean messy.csv     [--apply]   # text clean
python -m src.cli_format     intl.csv      [--apply]   # format standardize (auto-streams >100 MB)
python -m src.cli_missing    holes.csv     [--apply]   # missing values
python -m src.cli_column_map vendor.csv    [--apply]   # column mapper
python -m src.cli_pipeline   any_file.csv  [--apply]   # chain tools end-to-end
python -m src.cli_analyze    any_file.csv  [--json]    # scan only

Every CLI runs preview-only by default; add --apply to write output.

Language

The GUI sidebar has a language picker. Packs ship for English and Español (src/i18n/packs/); the choice persists for the session. Adding a language: drop a <code>.json next to en.json mirroring its key tree, then list it in LANGUAGES. See Developer Guide §i18n.

Review & Normalize gate

Every uploaded file passes through a CSV-normalization gate before any tool sees it. The analyzer flags ~15 issue types (whitespace, NBSP / zero-width chars, BOM, encoding, smart punct, dirty headers, null sentinels, mojibake, …) tagged by confidence (high / medium / low) and fix action. The GUI shows each finding with Auto-fix / Skip / Customize, a live before/after preview, and an encoding-override picker. Tool pages refuse to load until the gate passes.

Output

Every run writes:

  • {input}_<tool>.csv — the cleaned data
  • {input}_changes.csv (text cleaner) or {input}_match_groups.csv (dedup) — audit trail
  • logs/<tool>_YYYYMMDD_HHMMSS.log — debug-level run log

Original input file is never modified.

Docs

Dependencies

pandas, openpyxl, rapidfuzz, phonenumbers, typer, loguru, charset-normalizer, streamlit. Optional: ftfy for mojibake repair.

License

Proprietary.

Description
Data tools development
Readme 7.7 MiB
Languages
Python 87.3%
HTML 10%
CSS 1.8%
Shell 0.4%
JavaScript 0.2%
Other 0.2%