Michael a9788ba712 feat(ui): page header + files card + action bar + findings cards (mockup 2)
Closes the remaining gaps between the live home page and the
``datatools_layout_redesign2.html`` mockup. Four pieces land
together because they all consume the same new CSS scaffold:

1. Page header (§page-header)
   ``st.title`` + ``st.caption`` + ``st.divider`` collapse into one
   flex header: h1 + body subtitle on the left, ``Runs 100% locally``
   privacy pill (success-fill + lock SVG) on the right, soft border
   below. The "Runs 100% locally" phrase moved out of
   ``home.caption`` into the new ``home.privacy_pill`` i18n key
   (en + es).

2. Files card (§files-card)
   The "Imported files" list is now a single bordered card with a
   section head (count + KB total on the right, mockup §section-head).
   Each row renders a 28px accent-fill chip carrying the inline
   document SVG, a mono filename, a right-aligned mono size, and a
   compact ``✕`` button. The word-button ``Remove`` is gone —
   replaced by an icon-only tertiary button styled via a new CSS
   rule that goes transparent → danger-fill on hover (mockup
   §file-remove).

3. Action bar (§action-bar)
   Three buttons in one row: ``Run analysis`` (primary ink), a new
   disabled ``Export report`` (secondary; coming soon, tooltip), and
   ``Clear results``. New i18n key ``upload.export_report``.

4. Findings — per-file group cards (§finding-group)
   ``render_findings_panel`` rewritten end-to-end. Output is now:
     • A head row (``dt-finding-group-head``) bleeding to the card
       edges: worst-severity dot · mono filename · count pills
       enumerating non-zero severities (e.g. ``2 info`` blue,
       ``1 warning`` amber, ``1 error`` rose).
     • A flat list of finding rows sorted error → warn → info.
       Each row: tinted Material-icon chip + title (description
       with optional ``<code>`` column chip) + mono meta line
       (rows affected, samples captured) + tertiary
       ``Open <Tool> →`` action button that ``st.switch_page``s
       to the relevant tool.
   The previous tool-grouped expander stack is dropped — the new
   layout is denser and matches the mockup's single-card-per-file
   structure.

   ``_render_one_finding`` (the old per-finding helper that emitted
   markdown lines + sample tables) remains in the file but is no
   longer called from the home flow; left in place for any other
   surface that still depends on the markdown style.

   The "no issues" success state renders a green dot + mono
   filename + ``no issues`` success pill in the same card chrome,
   so empty-result files visually match the rest of the panel
   rather than getting a generic ``st.success`` callout.

CSS additions (``_DESIGN_TOKENS_CSS``):
  ``.dt-page-header / .dt-page-subtitle / .dt-privacy-pill``
  ``.dt-files-section-head / .dt-section-meta``
  ``.dt-file-row / .dt-file-icon-chip / .dt-file-name / .dt-file-size``
  ``.dt-finding-group-head / .dt-severity-dot{.warn,.info,.error,.success}``
  ``.dt-group-filename / .dt-group-counts``
  ``.dt-count-pill{.warn,.info,.error,.success}``
  ``.dt-finding-row / .dt-finding-icon{.warn,.info,.error}``
  ``.dt-finding-title / .dt-finding-meta``
  Tertiary button rule (transparent → danger-fill on hover) for
  the X button and the ``Open Tool →`` row action.

theme.py:
  Explicitly loads Material Symbols Outlined alongside Geist —
  the severity-chip ligatures (``info`` / ``warning`` / ``error``)
  need the font present even when no ``:material/`` token has been
  emitted yet on the page. Tightened ``.dt-finding-icon .dt-mui``
  selector with ``[data-testid="stMarkdownContainer"]``-scoped
  variant so the Material font wins over theme.py's base
  ``var(--font-sans) !important`` on markdown descendants.

Leading section-heading emojis stripped from i18n
(``upload.heading``) for parity with the mockup's clean ``Files``
/ ``Findings`` h2s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 00:43:42 +00:00

🌐 Language: English · Español

DataTools

Local CSV / Excel cleaning. CLI + browser GUI, no cloud, no install ceremony. GUI ships with English and Spanish language packs.

Tools

# Tool Status
01 Find Duplicates — exact + fuzzy match, 5 normalizers, survivor rules, audit Ready
02 Clean Text — whitespace, smart chars, BOM, line endings, case ops Ready
03 Standardize Formats — dates, phones, emails, addresses, names, currencies, booleans Ready
04 Fix Missing Values — disguised-null detection, profile, mean/median/mode/ffill/bfill/interpolate, drop strategies Ready
05 Map Columns — fuzzy auto-rename, target schema with type coercion, required fields with defaults, drop/reorder Ready
06 Find Unusual Values Coming Soon
07 Combine Files Coming Soon
08 Quality Check Coming Soon
09 Automated Workflows — chain tools with recommended (not forced) order, save/load JSON, automate weekly cleanups Ready

Download (non-technical users)

Pre-built installers — no Python required:

Platform Download First-launch note
macOS DataTools-X.Y.Z-mac.dmg Drag DataTools.app into /Applications, then double-click.
Windows DataTools-X.Y.Z-win-setup.exe Run the installer; launches from Start Menu.
Linux DataTools-X.Y.Z-linux-x86_64.AppImage chmod +x the file, then double-click.

Latest release: see GitHub Releases (or the Gumroad listing). The installers are ~150200 MB; the launcher boots a local server at http://127.0.0.1:8501 and opens your browser. Nothing is sent to the cloud.

Install from source (developers)

pip install -r requirements.txt

Python 3.10+ required.

Run

GUI (recommended):

streamlit run src/gui/app.py

CLI — seven entry points:

python -m src.cli            customers.csv [--apply]   # dedup
python -m src.cli_text_clean messy.csv     [--apply]   # text clean
python -m src.cli_format     intl.csv      [--apply]   # format standardize (auto-streams >100 MB)
python -m src.cli_missing    holes.csv     [--apply]   # missing values
python -m src.cli_column_map vendor.csv    [--apply]   # column mapper
python -m src.cli_pipeline   any_file.csv  [--apply]   # chain tools end-to-end
python -m src.cli_analyze    any_file.csv  [--json]    # scan only

Every CLI runs preview-only by default; add --apply to write output.

Language

The GUI sidebar has a language picker. Packs ship for English and Español (src/i18n/packs/); the choice persists for the session. Adding a language: drop a <code>.json next to en.json mirroring its key tree, then list it in LANGUAGES. See Developer Guide §i18n.

Review & Normalize gate

Every uploaded file passes through a CSV-normalization gate before any tool sees it. The analyzer flags ~15 issue types (whitespace, NBSP / zero-width chars, BOM, encoding, smart punct, dirty headers, null sentinels, mojibake, …) tagged by confidence (high / medium / low) and fix action. The GUI shows each finding with Auto-fix / Skip / Customize, a live before/after preview, and an encoding-override picker. Tool pages refuse to load until the gate passes.

Output

Every run writes:

  • {input}_<tool>.csv — the cleaned data
  • {input}_changes.csv (text cleaner) or {input}_match_groups.csv (dedup) — audit trail
  • logs/<tool>_YYYYMMDD_HHMMSS.log — debug-level run log

Original input file is never modified.

Docs

Dependencies

pandas, openpyxl, rapidfuzz, phonenumbers, typer, loguru, charset-normalizer, streamlit. Optional: ftfy for mojibake repair.

License

Proprietary.

Description
Data tools development
Readme 7.7 MiB
Languages
Python 87.3%
HTML 10%
CSS 1.8%
Shell 0.4%
JavaScript 0.2%
Other 0.2%