Commit Graph

82 Commits

Author SHA1 Message Date
36510eee7b fix(findings): namespace per-tool button keys so multi-file render works
Reported: uploading multiple files on the home page and clicking Run
analysis blew up with

    StreamlitDuplicateElementKey: key='_findings_open_02_text_cleaner'

when two uploaded files both had Clean Text findings.

Root cause: ``render_findings_panel`` is invoked once per uploaded
file from ``_home.py``, but the per-tool jump button used a
filename-agnostic key:

    key=f"_findings_open_{tool_id}"

Two files both flagging Clean Text → two buttons with identical keys
→ Streamlit rejects the second one.

Fix:

- Add ``key_namespace: str = ""`` to ``render_findings_panel``. The
  helper hashes it (sha1 truncated to 8 chars) and appends to every
  button key, so different namespaces produce different keys but the
  same namespace stays stable across reruns.
- The home page now passes the filename:
  ``render_findings_panel(findings, header=f"📄 {name}", key_namespace=name)``.
- The single-call site in ``upload_and_analyze_section`` (the legacy
  helper, only used outside the new home-page path) keeps the default
  empty namespace, which is fine because that path renders findings
  for ONE file at a time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 02:17:03 +00:00
c0bfd4dbc9 bisect: temporarily disable new chrome additions to diagnose blank pages
Reported: every page renders empty in the main body even after the
audit-log defensive-wrap commit (59c6d0f). Close button also doesn't
trigger shutdown — that page is blank too. Sidebar nav still renders,
so the chrome path that runs on every page is the suspect.

Three chrome additions land all at once and are temporarily turned
off so the user can see whether bare chrome restores rendering:

1. **Sticky footer (``render_sticky_footer``)**: short-circuited with
   ``return`` at the top of the function. The CSS-injection +
   components-html iframe mechanic is the highest-suspicion item —
   if the iframe script throws or the CSS interacts badly with the
   user's Streamlit / Python build, the side effects can be
   page-killing on theirs while invisible on ours. The original body
   is preserved as ``_render_sticky_footer_DISABLED`` so re-enabling
   is a one-line change.

2. **Diagnostics sidebar (``_render_diagnostics_sidebar``)**: call
   site in ``hide_streamlit_chrome`` is gated by ``if False:``.
   Wrapping in try/except (the previous commit) caught exceptions
   but didn't help — silent partial renders inside
   ``with st.sidebar: with st.expander: ...`` can still leave the
   render stack in a bad state on some Streamlit versions.

3. **Compact-spacing CSS layer**: the ``gap: 0.5rem !important;`` on
   ``stVerticalBlock`` / ``stHorizontalBlock``, the slim heading
   margins, the slim hr / caption / expander / button / metric
   rules — all stripped back to the pre-compact ``_HIDE_CHROME_CSS``.
   The ``gap`` rule in particular is a suspect: if the user's
   Streamlit version doesn't render stVerticalBlock as a flex
   container, the rule is harmless; if it does and interacts badly
   with overflow, content could be clipped.

What's deliberately KEPT enabled:

- The audit-log calls (already wrapped from 59c6d0f).
- ``log_page_open`` calls in tool pages (already wrapped internally).
- All UI changes pre-compact (the unified tool-page layout, the
  download-button helper, etc.).

If pages render after this commit, we know it's one of the three
disabled items above and can bisect further. If they still don't
render, the cause is in code that pre-dated the audit-log work and
the bisection has to keep going.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 02:09:23 +00:00
59c6d0f914 fix(audit): defensive wrap so audit failures can never blank the GUI
Reported: after pulling commit c73d716 (audit log) the main body of
every page showed empty. Sidebar nav still worked.

Diagnosis: the most likely path is that something inside the audit
calls — ``_render_diagnostics_sidebar()`` calling ``audit_log_path()``,
or ``log_session_start()`` itself — raises during ``hide_streamlit_chrome``
on the user's environment (Python 3.14 on Windows, a less-tested
combo than the test environment). Streamlit's script runner sees the
exception, and on some chrome paths it eats it without surfacing an
error block, leaving the page body empty.

The audit log is best-effort by design. Make that contract real:

1. ``hide_streamlit_chrome`` now wraps both ``log_session_start()``
   and ``_render_diagnostics_sidebar()`` in try/except. Errors print
   to stderr (so the developer running ``python -m src.gui`` sees
   them in the launcher's console) but never bubble up to kill the
   page render.

2. ``audit_log_path()`` already had a tempdir fallback for the
   primary mkdir failure, but the SECOND mkdir wasn't protected
   either. Restructured to a two-level fallback: configured dir →
   tempdir → ``/dev/null`` (or ``NUL`` on Windows). The last fallback
   ensures the function never raises; ``log_event``'s own try/except
   handles the eventual unwritable-file case.

3. ``log_page_open(slug)`` now has an outer try/except so it cannot
   raise either — protecting every tool page's render path.

If a user reports the same symptom again, the launcher terminal will
now show a real traceback explaining what's wrong, and the GUI will
still render normally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 02:00:31 +00:00
c73d716d06 feat(audit): JSONL audit log for support diagnostics
New ``src/audit.py`` module records GUI actions to a per-session
JSONL file under ``~/.datatools/logs/`` (overrideable via
``DATATOOLS_AUDIT_DIR``). The file is human-readable (one JSON
object per line, each with a ``message`` field) AND trivially
machine-parseable — the support flow is "client mails the file,
we read it and explain what went wrong."

Format example::

    {"ts":"2026-05-17T05:30:00.123+00:00","level":"info","category":"session",
     "session":"a1b2c3d4","message":"Session started",
     "platform":"Windows 11","python":"3.14.0","user":"Michael Dombaugh",
     "log_file":"C:\\Users\\Michael Dombaugh\\.datatools\\logs\\datatools-...jsonl"}
    {"ts":"...","category":"upload","message":"Uploaded customers.csv",
     "filename":"customers.csv","bytes":24813}
    {"ts":"...","category":"analyze","message":"Analyzed customers.csv (3 findings)",
     "filename":"customers.csv","findings":3,"rows":120,"cols":8}
    {"ts":"...","category":"tool_run","message":"Clean Text run",
     "page":"2_Text_Cleaner"}
    {"ts":"...","category":"error","level":"error",
     "message":"analyze(weird.csv): EmptyDataError: No columns to parse",
     "filename":"weird.csv","outcome":"empty_after_repair"}

Public API:

- ``log_event(category, message, **extra)``
- ``log_session_start()`` — idempotent banner with platform info
- ``log_page_open(slug)`` — emit a ``nav`` event, deduplicated per
  Streamlit session so reruns don't spam the log
- ``log_exception(where, exc, **extra)`` — convenience wrapper
- ``audit_log_path()`` / ``audit_log_dir()`` — for the UI

Wired in at:

- ``hide_streamlit_chrome``: stamps session start, mounts a small
  "🩺  Diagnostics" expander in the sidebar with the log path and
  an "Open log folder" button so the user can grab the file to
  attach to a support email.
- Home page: ``upload`` event on every new file, ``upload`` event
  on per-file remove, ``analyze`` event with file count when
  Run-analysis fires.
- ``_run_analysis_on_upload``: ``analyze`` event with rows / cols /
  findings count per file, plus ``error`` events on every caught
  exception (empty upload, empty after repair, pandas EmptyDataError,
  generic Exception).
- Every Ready tool page (1, 2, 3, 4, 5, 9): ``tool_run`` event
  immediately after the primary action stashes its result.
- Every tool page (1-9): ``log_page_open(slug)`` on render — deduped
  via session state so we don't get one event per Streamlit rerun.

Safety:

- ``log_event`` wraps every write in try/except. A broken audit
  log must NOT crash the GUI.
- Non-JSON-serializable extras are ``str()``-coerced before writing.
- File CONTENTS are never logged. We capture filename, byte count,
  and (in the analyzer) a 12-char sha1 fingerprint of the bytes so
  the same file re-uploaded gets the same trace.
- License keys, session cookies, etc. are not logged.
- ``DATATOOLS_AUDIT_DIR`` env var lets tests redirect writes into a
  tmp dir.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 01:36:35 +00:00
f0885aeb1e feat(analyze,ui): recommend Standardize Formats + bold red Open buttons
Two reported issues addressed together because they're the same UX
flow (home findings panel → jump to relevant tool).

(1) Format-Standardizer recommendations weren't firing.

Reported: uploading a file from the format-cleaner test corpus
(``24_format_dates.csv``, ``25_format_phones.csv``,
``29_format_currencies.csv``, ``30_format_integration.csv``) showed
zero "Standardize Formats" recommendations even though the columns
clearly mixed multiple date / phone / currency formats.

Two underlying causes:

- ``_detect_inconsistent_date_format`` required two MATCHES per
  distinct format. A test column with N rows each in a different
  format had ≤1 match per format and was silently passed over.
  Loosened to "≥1 match per format" — the inconsistency signal is
  the presence of ≥2 distinct formats, not their volume.
- Only date inconsistency was detected. Phones, currency, and
  booleans (the other format-standardizer fix categories) had no
  detector at all.

Added three new detectors:

- ``_detect_inconsistent_phone_format``: nine phone-format regexes
  (plain-10, US paren / dash / dot / space, +country, extension,
  intl plus). Fires when a column is ≥35% phone-shaped AND mixes
  ≥2 formats.
- ``_detect_inconsistent_currency_format``: thirteen currency regexes
  covering US ($1,234.56 / $1234.56), EU (€1.234,56), India lakh
  notation, Swiss apostrophe, trailing-symbol, parens-negative,
  prefix-currency-code, suffix-currency-code, and negative variants.
  Same fire criteria as phone.
- ``_detect_inconsistent_boolean_format``: column is ≥80% boolean
  tokens (yes/no/y/n/true/false/1/0) AND uses ≥3 distinct surface
  forms (e.g. yes / Y / true / 1 mixed together).

Verified on every file in ``test-cases/format-cleaner-corpus/``:
24_format_dates, 25_format_phones, 29_format_currencies all now
produce a format-standardizer Finding. The integration test file
flags all three.

The threshold loosening (from 50% to 35% of values format-shaped) is
still strict enough to avoid false-positives on free-text comment
columns where a few cells happen to look phone- or date-shaped.

(2) The "Open <Tool>" jump links blended into the page.

Reported: the per-tool jump links inside the home findings panel
were too subtle to notice.

Replaced ``st.page_link`` with ``st.button(type="primary")`` so the
buttons render in Streamlit's primary-action red colour, matching the
"Clean Text" / "Find Duplicates" / etc. run buttons. Click handler
delegates to ``st.switch_page(page_slug)`` so it's still a soft
in-app navigation (no full reload).

2220 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 00:54:31 +00:00
229e1afd45 fix(footer): mount Back-to-Home outside Streamlit's container tree
Reported: the sticky footer rendered, but the Back to Home button
inside it wasn't visible.

Likely cause: ``st.markdown`` inserts the footer div inside Streamlit's
content tree, which sits under ``.stApp { zoom: 0.85 }`` (our compact
scaler) and several nested padding/positioning contexts. Streamlit's
own ``<a>`` styling rules can also colour-collide with our anchor.

Switch the mount strategy. Two passes:

1. CSS rules go to the parent document via ``st.markdown`` as before,
   but every property carries ``!important`` and the selectors key on
   ``#datatools-sticky-footer`` (id, not class) plus a dedicated
   ``.datatools-sticky-footer-link`` class on the anchor — so
   Streamlit's default ``<a>`` styles can't override colour or
   padding. ``z-index: 2147483646`` keeps the footer above
   anything else in the page.

2. The footer DOM node itself is created by a script inside a
   zero-height ``streamlit.components.v1.html`` iframe. The script
   does ``window.parent.document.body.appendChild(...)`` so the div
   lives as a direct child of ``<body>`` — outside ``.stApp``,
   outside every Streamlit container, free of every parent's
   ``zoom`` / ``transform`` / ``overflow`` rules.

   If the cross-frame access ever fails (Streamlit sandbox config
   change), the script falls through to appending inside the
   iframe's own document — degraded but still visible.

Each rerun replaces any prior ``#datatools-sticky-footer`` so we
don't accumulate stacked footers on every script pass.

2220 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 00:47:44 +00:00
7ad19ac7f4 feat(nav,i18n): sticky footer with Back-to-Home + localized tool headers
Two unrelated UX issues addressed in one sweep across all nine tool
pages because they share the same edit surface.

(1) Sticky footer replaces the top + bottom back-link buttons.

Reported: a big white empty footer space at the bottom of every page;
the Back to Home button at the top scrolled out of view on long pages.

New ``render_sticky_footer()`` helper in ``components/_legacy.py``
injects a fixed-position bar at ``bottom: 0`` of the viewport with:

- A border-top so it visually reads as a non-movable bar.
- A semi-transparent background (rgba 0.96 + ``backdrop-filter: blur``)
  so content underneath shows through faintly when the user scrolls.
- A styled ``<a href="home">`` anchor (not an ``st.button``) because
  Streamlit widgets can't be CSS-positioned reliably — Streamlit owns
  the widget's DOM container and re-mounts it on every rerun. A real
  anchor sits exactly where the CSS puts it and triggers Streamlit's
  URL routing to the home page.
- ``padding-bottom: 3.5rem`` on the main container so the last widget
  isn't hidden behind the bar.

Called once per tool page, immediately after ``hide_streamlit_chrome()``
so it renders even on pages that ``st.stop()`` early before any other
content runs. The old top-and-bottom ``back_to_home_link()`` calls are
removed from every tool page; their entry/exit points were dropping
the button when the script short-circuited.

(2) Tool-page headers now localize.

Reported: switching the sidebar language picker to Spanish left the
tool page's title + caption in English. Root cause: every page had
hard-coded ``st.title("✂️ Clean Text")`` / ``st.caption("Trim
whitespace...")`` strings.

Added per-tool ``tools.<id>.page_title`` and
``tools.<id>.page_caption`` keys to ``en.json`` and ``es.json`` for
all nine tools. Routed each page's title/caption call through ``t()``.
Verified: with ``ui_lang=es`` set, the Clean Text page now renders
"✂️ Limpiar texto" + the Spanish caption.

Updated ``tests/gui/test_smoke.py::EXPECTED_SUBSTRINGS`` so the
``es`` column for each tool page asserts the actual Spanish string
(was a duplicate of the English string back when the page bodies
were English-only).

2220 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 00:42:45 +00:00
4685bb4289 style(chrome): tighter vertical rhythm — less whitespace across screens
Reported: too much whitespace between widgets, dividers, and headings.

Compact-spacing CSS layer added to ``_HIDE_CHROME_CSS`` (so it applies
on every page that calls ``hide_streamlit_chrome``):

- ``[data-testid="stVerticalBlock"]`` and ``stHorizontalBlock`` gap
  trimmed from Streamlit's default ~1rem to 0.5rem.
- Heading margins (h1-h4) tightened — h1/h2/h3 used to leave 1-1.5rem
  above; now 0.25-0.5rem.
- ``hr`` (``st.divider()``) drops from 1rem above+below to 0.4rem.
- Markdown paragraphs and captions: 0.25rem bottom margin instead of
  the default 1rem.
- Expander summary padding reduced (0.35rem top/bottom).
- File-uploader, button, and metric tiles: trimmed internal padding.

Also slimmed the main-container padding from 1rem top / Streamlit
default bottom (~6rem) to 0.5rem top / 0.75rem bottom.

The existing ``zoom: 0.85`` on ``.stApp`` is kept — the user wanted
*less white space*, not *smaller content*, and dropping zoom would
shrink type alongside everything else.

2220 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 00:28:58 +00:00
e96d5901f4 fix(close): graceful about:blank fallback + display-mode aware hint
Reported: user asked whether we can send Alt+F4 / Ctrl+W to the
browser from JavaScript to force-close a tab.

Honest answer that's now baked into the hint message: NO. Synthesized
keyboard events from page JS only reach DOM event listeners, not the
browser chrome or the OS. There is no flag, API, or trick that lets
a page close a tab the user opened themselves. The page CAN close a
window it opened (window.opener trail) or one whose display-mode is
``standalone`` (Chrome/Edge ``--app=URL``) — that's what
``python -m src.gui`` arranges, and that's the path that actually
closes the window without a manual Ctrl+W.

Improvements landed:

1. ``isStandalone(win)`` detects Chrome --app windows up front
   (``matchMedia('(display-mode: standalone)').matches``). In a
   regular tab the manual hint surfaces immediately on the
   "Close this window" click; in --app mode we only show it if the
   close attempt actually fails.

2. ``fallbackToBlank(win)`` navigates the tab to ``about:blank``
   via ``location.replace`` (no history pollution) so the user
   sees a clean empty tab instead of the farewell overlay frozen
   over Streamlit's connection-error banner. They still have to
   Ctrl+W the blank tab, but the screen is no longer a misleading
   "did it close or not?" mess. Fires 250 ms after a failed close
   in --app mode (very rare path), or 1.5 s in a regular tab so
   the user has time to read the hint.

3. Hint message rewritten in en + es to explain WHY the close is
   blocked (browser security — not something we can override), to
   acknowledge the Alt+F4 / Ctrl+W question directly (those don't
   work either, for the same reason), and to point at
   ``python -m src.gui`` as the path that gives a clean auto-close.

2220 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 00:07:51 +00:00
21fd8a4cd7 fix(nav): switch_page resolves correctly + bottom-of-page back link
Two issues, same fix surface.

(1) Reported crash on Back-to-Home:

    StreamlitAPIException: Could not find page: app.py.

``st.switch_page("app.py")`` doesn't work under ``st.navigation`` —
the entry script is the nav manager itself and is not a registered
page. The fix needs to pass an ``st.Page`` object whose script
identity matches one registered in the nav.

First-pass attempt (``from src.gui.app import _home_page``) hit a
worse failure: importing ``app.py`` from inside a tool-page render
re-executes the nav setup with the WRONG "main script" context, so
every ``st.Page("pages/N_foo.py", ...)`` call in ``_build_navigation``
fails with "file could not be found".

Extract the home renderer into its own module ``src/gui/_home.py``
which has no top-level Streamlit side effects. Both the nav manager
and the back-link helper import ``_home_page`` from there. The Page
object built at click time has the same callable identity as the one
registered, so ``st.switch_page`` resolves it.

(2) Reported UX: the back button scrolled out of view on long pages.

Add a second ``back_to_home_link(key="_back_to_home_link_bottom")``
call near the footer of every tool page (1-9). The unique key avoids
widget-id collision with the top instance. Coming-Soon stubs get it
unconditionally; Ready tools render it only after a result exists
because the page short-circuits with ``st.stop()`` before then —
when no result is on screen the page is short enough that the top
link is sufficient.

2220 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:58:33 +00:00
42f8d78dd5 fix(downloads): drop /select on Windows — opens wrong folder
Reported: clicking "Open Downloads folder" was opening the Documents
folder instead of Downloads. Root cause is the classic Windows
gotcha: when the path contains a space (e.g.
``C:\Users\Michael Dombaugh\Downloads``), Python's
``subprocess.Popen`` packs the ``/select,...`` argument into a single
quoted token, and Explorer's ``/select`` argument parser does NOT
accept that form — it silently falls back to whatever the user's
default Explorer view is (typically Documents).

Resolution paths considered:

- ``shell=True`` with a hand-built command string — works but opens
  the door to shell-injection if a file_name ever contained a quote
  or special char.
- ``cmd /c start "" explorer /select,...`` — same parsing issue.
- ctypes ShellExecuteW — pulls in a Windows-only dependency.
- **Skip /select. Open the folder directly.** ✓

Going with the last. ``explorer <folder>`` reliably opens the folder
regardless of spaces in the path; the user finds the freshly-saved
file by its name. The previous "highlight the file" nicety wasn't
worth the path-parsing fragility — every user folder on Windows is
``C:\Users\<name>`` and every Windows username can contain a space.

macOS keeps the ``open -R <file>`` reveal-in-Finder path because
macOS argument parsing is sane and that's a strict UX win.

2220 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:45:47 +00:00
0f89d7ba66 fix(downloads): use explorer /select on Windows + show open feedback
Reported: clicking "Open Downloads folder" did nothing visible. The
previous implementation called ``os.startfile(folder)`` on Windows,
which is known to silently no-op or open Explorer behind the active
window in some configurations (Streamlit running headless, no
foreground rights inherited by the click handler thread, etc.).

Switch to the more reliable ``explorer /select,<file>`` form:

- Opens Explorer with the just-saved file pre-highlighted instead of
  just navigating to the folder — better UX than the old behavior.
- explorer.exe is a real GUI process that's spawned in the user's
  session with foreground rights, so it shows up on top.
- Fallback chain on Windows: ``/select`` first, then plain
  ``explorer <folder>``, then ``os.startfile`` as a last resort.

macOS upgraded the same way: ``open -R <file>`` reveals in Finder
rather than opening the directory.

Linux: no reliable cross-distro reveal, so ``xdg-open <folder>``.

Plus user feedback at the call site:

- On successful dispatch: ``st.toast("Opening <folder>", icon="📂")``
  — confirms we tried, in case the window comes up behind the
  browser.
- On dispatch failure: ``st.warning`` with the full path the user
  can copy/paste into their file manager manually.

2220 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:25:06 +00:00
b9147f3b66 fix(downloads): save server-side to ~/Downloads + open-folder link
Switch the download mechanic from "browser <a download> with a data:
URL" to "write the bytes directly to the user's Downloads folder and
show them the exact path". DataTools runs as a local Streamlit app,
so the "server" IS the user's machine — there's no reason to go
through the browser save dialog at all.

Flow:

1. Click "Download <something>" button (rendered as a regular
   ``st.button``, so no widget-collision issues).
2. Bytes are written to ``Path.home() / "Downloads" / file_name``
   (overwriting any same-named file).
3. The page reruns and renders a success caption with the absolute
   path the file landed at.
4. An "📂 Open Downloads folder" button appears. Clicking it pops the
   OS file manager via ``os.startfile`` (Windows), ``open`` (macOS),
   or ``xdg-open`` (Linux).

Why this is better than the previous HTML-data-URL helper:

- Unambiguous about where the file went — user sees the full path,
  not "wherever your browser was configured to save".
- The data: URL approach base64-inflated the page payload by 33% and
  bloated for large outputs; server-side write is byte-for-byte.
- No more browser-side widget collision class of bug.
- The save action is a real Streamlit button, so the existing widget
  semantics (disabled, help tooltip, key isolation) work without
  workarounds.

API surface unchanged. New canonical name ``local_download_button``;
``html_download_button`` is kept as a back-compat alias that points
at the same implementation — every existing call site continues to
work without edits.

Tests are protected from polluting the developer's home dir via a
``DATATOOLS_DOWNLOADS_DIR`` env var override returned by the new
``_downloads_dir()`` helper. Smoke verified end-to-end via AppTest:
click → file appears in tmp dir → success banner shows path →
open-folder button renders.

2220 tests pass, 91 skipped, 35 s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:48:28 +00:00
ae9d4a2db5 fix(home): defensive analysis errors don't crash the whole page
Reported: uploading 13_non_latin_scripts.csv made the home page bubble
a ``pandas.errors.EmptyDataError`` traceback up through the page
chrome instead of surfacing as a per-file error. In a multi-file
analysis run that kills every other file's results too, which is
worse than the symptom itself.

Wrap ``_run_analysis_on_upload`` in proper error handling:

- Empty bytes ``getvalue() == b""`` short-circuits with a synthetic
  error Finding telling the user the upload was zero-byte and to
  re-upload.
- Empty ``repair.repaired_bytes`` (file was all NULs / BOM / stripped
  to nothing) likewise surfaces as a synthetic Finding rather than
  reaching pd.read_csv.
- ``pd.errors.EmptyDataError`` from pandas is caught and rendered as
  a Finding that names the file, its byte size, and suggests opening
  it in a text editor to verify the header row matches the data row
  delimiter.
- Any other exception during read/analyze is caught and surfaces as
  a Finding via ``format_for_user`` so the user gets a clean message,
  not a Python traceback.

Each file in a multi-file run now stands alone: a bad file produces
one red banner in its own card, every other file analyzes normally.

The 13_non_latin_scripts.csv corpus file is 249 bytes of valid UTF-8
on disk and parses cleanly under the same code path locally — the
user's specific symptom is likely a zero-byte upload (browser /
network / Python 3.14 + Streamlit edge case). The new ``empty_upload``
finding will name the bytes count so they can confirm.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:22:10 +00:00
ef9f8b5de4 fix(close): Edge fallback + better tryClose + honest hint
There is no JavaScript override for browser tab-close security:
``window.close()`` only succeeds on windows JS opened (Chrome --app
windows qualify; a regular browser tab does not). What we can do is
make the --app path easier to hit and the failure case more
actionable.

Three changes:

1. ``src/gui/__main__.py`` — extend browser detection. PATH lookup
   now also looks for ``msedge`` / ``microsoft-edge``; Windows install
   candidates include the Edge install path; macOS candidates include
   Edge and Chromium. Edge is Chromium-based, supports ``--app``, and
   ships on every Windows 10+ machine — so users without Chrome no
   longer fall through to the regular browser tab. When the fallback
   IS hit, print a warning to stderr explaining why Close-from-page
   will require Ctrl+W. Renamed ``_find_chrome`` to
   ``_find_app_browser`` to reflect the broader scope.

2. ``_FAREWELL_SCRIPT_TEMPLATE`` in ``components/_legacy.py`` —
   factor close attempts into a ``tryClose`` helper that runs three
   escalating tries: standard ``win.close()``, the
   ``win.open('', '_self')`` history-rewrite trick (no-op in modern
   Chrome but free), and ``win.top.close()``. Auto-close on paint AND
   the manual button now both call this helper. Skip the manual hint
   if the close eventually succeeded between the click and the 250 ms
   timeout.

3. ``quit.close_hint`` in en/es i18n packs — rewrite the message to
   tell the user honestly that this is a browser security restriction,
   tell them the Ctrl+W keystroke that works, and point them at
   ``python -m src.gui`` for the auto-closing app-mode experience.

2008 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:17:18 +00:00
aeead05e4c fix(downloads): swap st.download_button for an HTML <a download> helper
Reported symptom: only the FIRST download button in a multi-button
row pops the browser save dialog. The second and third do nothing on
click. Affects every tool page that exposes (cleaned + audit + config)
downloads.

Root cause is ``st.download_button`` itself — when several render in
the same script pass, the click-to-bytes wiring on the browser side
mis-routes and only one button's data is actually exposed. Explicit
``key`` arguments don't fix it; ``use_container_width=True`` doesn't
help either; we confirmed this in the Text Cleaner reverts.

Replace the widget with a real ``<a download="file" href="data:...">``
anchor rendered via ``st.markdown(..., unsafe_allow_html=True)``.
Bypasses Streamlit's widget machinery entirely; behaves identically to
a native browser download. Side benefit: clicking it does NOT trigger
a script rerun, so other in-flight UI state survives.

New helper ``html_download_button`` lives in
``src/gui/components/_legacy.py`` (exported from ``components``). API:

    html_download_button(
        label, data,
        *, file_name, mime="application/octet-stream",
        disabled=False, help=None, use_container_width=True,
    )

Translation pattern applied across every tool page (and shared
``results_summary`` / ``config_panel`` widgets in ``_legacy.py``):

- ``st.download_button(`` -> ``html_download_button(``
- ``data=foo_bytes`` kwarg -> positional second arg
- ``key="..."`` -> dropped (helper has no widget identity)
- ``use_container_width=True`` -> dropped (default)
- ``disabled=`` and ``help=`` pass through unchanged
- Pre-computed byte buffers kept where they were

Total: 17 sites replaced (3 in Text Cleaner, 3 in Format
Standardizer, 3 in Fix Missing Values, 3 in Map Columns, 3 in
Automated Workflows, 2 in Find Duplicates page + 4 in shared
_legacy.py widgets used by Find Duplicates).

Caveat: data: URLs balloon by 33% (base64). Fine for tool output
sizes we ship; if a future result topped a few hundred MB we'd want a
Blob-URL fallback.

The marketing demo at src/gui/app_demo.py keeps its single
st.download_button — single button, no collision, no need to switch.

2008 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:13:41 +00:00
d1aaf3c2b9 feat(quit): close-window button + manual hint on the farewell overlay
The farewell overlay already attempted ``window.top.close()`` after a
Close click — but browsers only honour that for tabs that JS opened
(Chrome --app windows qualify; a regular browser tab does not). For
users whose Chrome wasn't auto-detected and who fall back to
``webbrowser.open``, the overlay stays put and they had no in-page
way to close.

Add to the overlay HTML:
- A "Close this window" button (uses the user-gesture path, which has
  slightly looser browser rules than auto-close).
- A hidden hint paragraph that reveals itself 250 ms after the
  button is clicked IF the window is still here, telling the user to
  press Ctrl+W (⌘W on Mac).

Wired through the existing _farewell_script template + ``_js_html_safe``
escaping so neither label can break out of the JS string literal.

New i18n keys (en + es): ``quit.close_window_button`` and
``quit.close_hint``.

The existing auto-close attempt remains — Chrome --app users still get
their window closed without touching the button.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 20:59:17 +00:00
502a72cd46 feat(nav): ← Back to Home link on every tool page
Multi-file workflow: a user uploads several files on Home, clicks
"Open <Tool>" on one file's findings, lands on a tool page. The
sidebar lets them get back to Home, but a top-of-page back affordance
is more discoverable and keeps the hand in the same screen region as
the upload list they're working through.

- New ``back_to_home_link()`` helper in components/_legacy.py renders
  a secondary button that calls ``st.switch_page("app.py")`` — under
  ``st.navigation`` that routes to the default (Home) page.
- Wired into every tool page (1-9) directly after
  ``hide_streamlit_chrome()`` and BEFORE the license gate so a Lite
  user who lands on a locked tool can navigate away without paying.
- New i18n key ``nav.back_to_home`` ("← Back to Home" /
  "← Volver al inicio") in en/es packs.

2008 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 20:38:01 +00:00
c568aec8a7 feat(gui): one-click Close in its own bottom sidebar section
Close is now a direct shutdown trigger: visiting the Close page (the
sidebar entry) fires shutdown_app() immediately — no confirm step, no
intermediate body. The farewell overlay paints and os._exit(0) lands
~1s later from a daemon thread.

Layout: Close moved into its own bottom-of-sidebar section so the
destructive action is visually separated from Account/Activate.

- New shutdown_app() in components/_legacy.py replaces quit_button.
  os._exit thread is skipped when "pytest" is in sys.modules so the
  test suite doesn't suicide on rendering 99_Close.
- pages/99_Close.py shrinks to set_page_config + chrome + shutdown_app.
- app.py nav grows a new "Close" section header (new
  nav.section_close key in en/es packs) pinned at the bottom of the
  navigation dict.

Tests updated:
- TestQuitButtonRenders → TestClosePageShutsDownImmediately.
  Assert the shutdown caption renders + no confirm button exists.
- test_smoke EXPECTED_SUBSTRINGS["99_Close"] now pins
  "Shutting down" / "Cerrando" (the visible page body) instead of
  the removed page title.

2008 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 20:17:14 +00:00
dad744f17f refactor(gui): drop Review page + normalization gate
Home is now the only entry point: the "Run analysis" button on the
upload section IS the review step (findings render inline via
render_findings_panel). Tool pages no longer gate on a passed
normalization — running the analyzer is sufficient context.

Removed:
- src/gui/pages/0_Review.py
- src/gui/components/gate.py (re-export seam)
- require_normalization_gate() in src/gui/components/_legacy.py
- "review" section enum in tools_registry.py
- Data Review entry in app.py navigation
- require_normalization_gate() calls + imports in all nine tool pages
- tests/gui/test_gate.py (whole file)
- TestReviewWorkflow in tests/gui/test_workflows.py
- 0_Review entry in tests/gui/test_smoke.py PAGE_SLUGS
- stash_upload's normalization_result+normalization_for stashing
- stash_upload_without_gate (was the gate's negative-path helper)

2017 tests pass (16 retired with the gate flow).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 20:04:33 +00:00
db5ec084da docs+code: rename tool labels everywhere
Sweep follow-up to 93e43fc. Display labels now consistent across docs,
landing pages, CLI output, code comments, docstrings, and test prose.
Five parallel surfaces touched:

- docs (EN + ES): README, USER-GUIDE, CLI-REFERENCE, and 11 internal
  design/planning docs
- landing pages: index + bookkeeper/revops/shopify-pet
- src: CLI module docstrings, _TOOL_DISPLAY dicts in cli_analyze.py
  and gui/components/_legacy.py, core module headers, every tool
  page's module docstring
- tests: class/method/module docstrings and section-header comments
- test-cases READMEs

Page slugs (1_Deduplicator etc.), tool_id strings (01_deduplicator
etc.), Python class names (TestDeduplicatorWorkflow, FeatureFlag.*),
URL paths, anchor IDs, CSS classes, and asset filenames were left
intact since they're code identifiers / structural references.

All 2033 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 19:50:09 +00:00
e534fb4989 sec(license): Ed25519 sigs + production-safe tripwire
Two coupled hardening upgrades.

1. Asymmetric signatures (HMAC → Ed25519)

The previous HMAC scheme used a symmetric secret that any motivated
reverse engineer could pull out of the shipped binary and use to
mint blobs for any tier / name / email. With Ed25519, the binary
ships only the public verification key; the signing key never
leaves the seller's environment, so binary compromise no longer
yields forgery.

- src/license/crypto.py rewritten around
  cryptography.hazmat.primitives.asymmetric.ed25519. Same public
  API surface (sign/verify/encode_blob/decode_blob), same canonical
  JSON encoding — drop-in for the manager / cli / GUI layers.
- DATATOOLS_LICENSE_PRIVKEY (seller-side) and
  DATATOOLS_LICENSE_PUBKEY (build-time) env vars supply the keys;
  the in-source dev keypair (src/license/_dev_keypair.py)
  deterministically derives from a seed phrase for repro builds and
  tests.
- Blob prefix bumped DTLIC1: → DTLIC2:. Decoding a DTLIC1 blob
  surfaces a clear "old format" error rather than a confusing
  signature mismatch.
- scripts/generate_keypair.py mints fresh production keypairs for
  the seller (run once, stash the private key offline). Adds
  cryptography>=41,<46 to requirements.txt (was an undeclared
  transitive dep).

2. Production-safe tripwire

assert_production_safe() refuses to boot a frozen / shipped build
when either:

- DATATOOLS_DEV_MODE=1 is set (would unconditionally bypass every
  license check — fine in source/test but catastrophic in a buyer
  install).
- The active verification key is still the embedded dev key (the
  build pipeline forgot to set DATATOOLS_LICENSE_PUBKEY).

No-op in source / pytest runs (sys.frozen is unset) so test
fixtures and dev workflows keep working without ceremony. Called
from src/cli_license_guard.guard() and from hide_streamlit_chrome
— so it fires on every CLI invocation and every GUI page load.

Tests: 49 license-layer unit tests (was 40); added Ed25519
wrong-key rejection, dev-keypair seed pin, blob v2 prefix, v1
rejection with clear message, and four production-safe scenarios
(no-op in source, fires on DEV_MODE in frozen, fires on dev key in
frozen, passes in frozen with prod pubkey). Total: 2024 → 2033.

Docs (REQUIREMENTS §17a, DEVELOPER licensing recipe, DECISIONS
§9b + decision log) updated with the new threat-model write-up,
key-storage workflow, and tripwire behaviour.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 17:34:48 +00:00
e435103113 feat(license): registration + 1-year licenses + tier scaffolding
A complete offline licensing layer (no internet at any step):

Core
- src/license/ — schema (License, Tier, FeatureFlag), HMAC crypto,
  JSON storage, LicenseManager singleton with activate/renew/
  deactivate/issue_trial. Tier-scaffolded so future SKUs can carve
  per-tool feature sets without consumer-code edits.
- scripts/generate_license.py — creator-only key generator. Mints a
  DTLIC1: blob the buyer pastes into the activation page.

GUI
- New activation form component (src/gui/components/activation.py).
- hide_streamlit_chrome() now inline-renders the activation form when
  no valid license is present (every page short-circuits to the form
  until activated).
- Sidebar shows tier + days remaining; renewal warning under 30 days.
- New pages/_Activate.py for revisiting the form after activation.

CLI
- src/license_cli.py — activate / renew / status / trial / deactivate
  commands. Exempt from the guard.
- src/cli_license_guard.py — drop-in guard call added to every tool
  CLI's main(). Lets --help through; respects DATATOOLS_DEV_MODE.

i18n
- New activation.* and license.* keys in en.json + es.json
  (page title, form labels, status badges, renewal warnings, error
  messages). Pack parity test stays green.

Test infrastructure
- tests/conftest.py autouse fixture sets DATATOOLS_DEV_MODE=1 so the
  existing 1916 tests continue to pass.
- isolated_license_path / activated_license_manager /
  unactivated_license_manager fixtures for tests that want to drive
  the real check.

Tests (+79)
- tests/test_license.py (40): schema, crypto roundtrip, blob
  encode/decode, tier→feature mapping, activation flow, name/email
  mismatch rejection, tamper detection, expiration, renewal,
  dev-mode bypass.
- tests/test_license_cli.py (26): every license_cli command +
  subprocess tests confirming every tool CLI refuses to run without
  a license, --help always works, DEV_MODE bypasses.
- tests/gui/test_activation.py (13): gate blocks without license,
  passes with trial, activation form submission unlocks the gate,
  sidebar status, renewal warning, i18n.

Total: 1916 → 1995 tests. All pass under the strict warning filter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 16:54:23 +00:00
c4ce86bd64 feat(i18n): add language-pack scaffold with English and Spanish
Introduces ``src/i18n`` with a tiny JSON-backed t() lookup, an in-session
language preference, and a sidebar selector wired through
``hide_streamlit_chrome`` so every page picks up the same picker. Covers
home, tool cards, findings panel, gate, shutdown, and pickup banner
strings. Tests pin pack parity and the farewell-overlay JS escape so
future packs can't silently regress.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 15:11:30 +00:00
ea89c4d399 ui(gui): say 'window' instead of 'browser tab' in shutdown copy
Update the Close page intro, the shutdown overlay, and the toast so
they all read "you can close this window" — clearer for users running
the app in a dedicated browser window rather than a tab.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 13:51:32 +00:00
701108c9d5 fix(gui): inject farewell overlay into parent DOM on shutdown
Replaces the data:-URL navigation (blocked by Chrome since v60 for
top-frame navigation) with a direct DOM-append of a full-screen
overlay onto the parent document. Uses z-index 2147483647 so it sits
above Streamlit's connection-error banner when the websocket drops.

Note: still doesn't fully suppress the connection-error banner in
testing — the next iteration will render the overlay through
Streamlit's own page rather than via a component iframe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 13:49:48 +00:00
340614e642 feat(gui): promote Quit to a 'Close' menu item in the sidebar nav
Move the shutdown control out of the inline sidebar widget and into
its own page (pages/99_Close.py), so it appears in the sidebar nav
alongside the tool pages. An explicit confirm button on the page
prevents accidental nav clicks from killing a live session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 13:38:02 +00:00
58c0195def fix(gui): make Quit button actually terminate the server
Signalling the process with SIGTERM/SIGINT didn't reliably shut Streamlit
down — its tornado/asyncio loop swallowed or deferred the signal, so the
browser saw the websocket drop ("Connection error") while the python
process kept running. Replace the signal with a daemon-thread
``os._exit(0)`` after a short delay so the current rerun can paint the
"shutting down" message before the process is hard-killed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 13:36:36 +00:00
30e257cc44 fix(gui): move Quit button to sidebar so it shows on every page
The footer placement was easy to miss (below all tool cards) and only
rendered on the home page. Hook the button into hide_streamlit_chrome()
so every page that hides default chrome — home + all 9 tool pages — gets
the Quit button at the bottom of the sidebar without per-page edits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 13:33:32 +00:00
0c25d80146 fix(gui): keep sidebar reopenable + add clean Quit button
The chrome-hiding CSS was removing the Streamlit header wholesale,
which also took the sidebar's expand chevron with it — a collapsed
sidebar became unreopenable. Make the header transparent instead and
explicitly preserve the sidebar collapsed-control.

Also add a Quit button in the app footer that signals the Streamlit
server (SIGTERM, falling back to SIGINT) so closing the GUI returns
the shell prompt cleanly instead of leaving Python hung.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 13:30:10 +00:00
3f007ef3d6 feat(gui): 1 GB upload cap + delimiter / encoding diversity caption
Streamlit's default file_uploader footer reads "Limit 200MB per file —
CSV, TSV, XLSX, XLS" which contradicts the 1 GB efficiency target shipped
in 438bc0f and codified in docs/REQUIREMENTS.md §1.1.

Three changes:

1. .streamlit/config.toml — set [server] maxUploadSize = 1024. Footer
   now reads "Limit 1024MB per file".

2. upload_and_analyze_section (home page) — adds an explicit caption
   above the uploader stating size limit, supported formats, the four
   auto-detected delimiters, and the 13 auto-detected encodings (with
   the Review-page override as the safety net).

3. pickup_or_upload (every tool page that falls back to its own
   uploader when no home-page upload is present) — same caption,
   only rendered when the upload accepts CSV/TSV/XLSX/XLS so JSON
   schema / config uploaders aren't decorated.

Test suite: 765 passed, 17 xfailed (no regressions). Home + Review +
Deduplicator pages all serve HTTP 200 under the new config.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 21:23:21 +00:00
f891c6116d refactor(gui): tool registry + components package for per-tool builds
Two low-risk seam moves to enable selling per-tool subsets without
breaking the existing all-in-one bundle. Behaviour identical; every
existing import still resolves; full pytest suite + every page returns
HTTP 200.

1. **Tool registry** (src/gui/tools_registry.py) — replaces the
   inline dict-of-dicts in app.py with a Tool dataclass and a TOOLS
   list. Adds a tier field ("core" today, "pro" / "enterprise" later)
   and tools_for_tier() / tool_by_id() / display_name() helpers. A
   per-tool build slices TOOLS at import time without code changes.

2. **components package** (src/gui/components/) — converts the former
   single components.py into a package with:
     _legacy.py        — original file, unchanged.
     __init__.py       — re-exports the legacy surface; existing
                         "from src.gui.components import …" calls
                         continue to work.
     shared.py         — hide_streamlit_chrome, pickup_or_upload
                         (every build needs these).
     gate.py           — require_normalization_gate (Pro / Suite SKUs).
     findings.py       — analyzer-finding widgets (drops out of a
                         standalone-Dedup build).
     dedup_review.py   — match-group cards + apply pipeline (drops out
                         of a non-dedup build).

   The seam modules are narrow re-exports today. As code migrates out
   of _legacy.py into the focused modules, the public import path
   stays stable via the shim.

E2E: 765 passed, 17 xfailed (unchanged); home page + all 9 tool pages
+ Review page render HTTP 200; full pipeline (analyze → auto_fix →
apply_decisions → output bytes) round-trips on the kitchen-sink
fixture with zero high-confidence findings remaining post-fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:56:21 +00:00