datatools-dev

Author	SHA1	Message	Date
Michael	f106275643	test(home): replace clutter outliner with click-to-inspect User reported the previous diagnostic was too cluttered to read, and the white bar showed no outline anyway — meaning the flat ``querySelectorAll('body *')`` walker missed it (likely inside an iframe's contentDocument, which the script didn't recurse into). New approach: a single red button "CLAUDE: click here, then click the white bar" in the top-right. Clicking the button arms an inspect handler. The next click anywhere on the page reports the full element stack at that point via ``elementsFromPoint`` AND recursively descends into any same-origin iframe at the click location, so iframe contents are no longer invisible. A black report panel lists every element in the stack with its tag/id/testid/class, position, z-index, background color, and bounding rect — TOP element highlighted in red. User clicks the white bar exactly once and we know what it is. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 22:23:35 +00:00
Michael	8232ab1ca7	test(home): broader diagnostic — outline anything near viewport bottom Previous diagnostic only outlined fixed/sticky elements; user confirmed the offending white bar isn't one of those. Cast a much wider net: - Outline every element whose visible rect intersects the bottom 200px of the viewport, regardless of position. - Border style encodes position: solid=fixed, dashed=sticky, dotted=absolute, thin=static/relative. - Render a readable list in a top-right panel showing each element's tag/id/testid/class, position, z-index, height, and background. - Skip fully transparent + un-positioned elements (those can't actually overlay anything). With this, scroll to the bottom and the panel + colored outlines will identify exactly which element is the white bar — fixed or not. The user can paste the panel list (or just name the colored box) so we know what to remove. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 22:18:56 +00:00
Michael	4c8e1199a4	test(home): outline every fixed/sticky element to find the white bar User reports: TEST #3 marker sits at the true bottom of the home page's main content, but when scrolled the test text "goes behind" an opaque white bar — confirming the bar is fixed/sticky (overlays scrolling content). Our CSS only declares ONE fixed element near the bottom (``#datatools-sticky-footer``), which the user already ruled out. So something else — Streamlit native chrome, a third- party widget, or a fixed element we haven't enumerated — is overlaying the content. Inject a small diagnostic iframe whose JS, running against the parent document, walks every element on the page and outlines each ``position: fixed`` or ``position: sticky`` node with a distinct color + a top-left label showing ``tagName#id[data-testid] pos=… h=…px bg=…``. Re-runs after initial paint, on a couple of delays (for late-mounting components), and on every scroll. This is read-only — no DOM mutations beyond outline styles and labels — so it's safe to ship even if I miss removing it. The user can now visually identify which colored box is the offending white bar and report its label. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 22:15:19 +00:00
Michael	e282f061dc	test(home): move marker to true bottom of main content User reported the previous TEST #2 banner appeared at the top of the main content area instead of the bottom. Root cause: on the home page, ``render_sticky_footer()`` is called at line 107 — before ``st.title()`` — so anything that function injects in document flow lands at the top of ``stAppViewBlockContainer``. Other pages call ``render_sticky_footer()`` at the end of their script, so the flow content lands at the bottom there. Remove the marker from ``render_sticky_footer`` and add it directly at the very end of ``_home._home_page()`` — after the findings panels. If this banner lines up with the offending white strip when scrolled to the bottom, the strip is something rendered at the tail of the page (likely an iframe wrapper from ``render_findings_panel`` or the block container's ``padding-bottom``). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 22:11:24 +00:00
Michael	5daae9e5fa	test(footer): move marker out of footer into main content flow User confirmed the previous marker landed inside the Help/Close sticky footer — which is NOT the offending white bar. They want the sticky footer kept; the white strip they're trying to remove sits above the footer in the main content area. Move the marker out of ``#datatools-sticky-footer`` and render it via ``st.markdown`` immediately before the ``st.iframe`` call that injects the footer. That places it at the very bottom of ``stAppViewBlockContainer`` — exactly where the iframe wrapper (``stElementContainer``) and the block container's ``padding-bottom: 3rem`` reservation live. Styled as a red dashed banner so it's unmistakable. If it lines up with the white strip clipping text on scroll, one of those two is the culprit and the next commit can target it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 22:09:21 +00:00
Michael	48cb802dfb	test(footer): inject visible marker into #datatools-sticky-footer The user reports a "white bar/box" at the bottom of the main content area that clips text when scrolling. The DOM inspector found only one fixed-position white element near the viewport bottom — ``#datatools-sticky-footer`` (bg ``rgba(255,255,255,0.97)``, ~33px tall) — so this is my best candidate for what they're seeing. Append a red marker span "◀ CLAUDE TEST: is this the white bar you want removed? ▶" inside the footer div so the user can visually confirm. If the text shows up where they see the offending white bar, the footer is the right target; if the bar is somewhere else, this confirms it's a different element. Temporary — to be reverted in the next commit either way. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 22:06:56 +00:00
Michael	d022167ba2	fix(home): widget's "✕" Remove now actually removes the file Reported: on the Home page after uploading data files, the Remove buttons "on the right side" did nothing — the file kept showing up in the list. That was the file_uploader widget's BUILT-IN ✕ icons (the ones inside the uploader's chrome, on the right of each file row), not our custom "Remove" buttons further down — the custom ones have worked correctly since `84e4665`. Cause: ``_home_page`` deliberately treated the widget as add-only and never honored widget-side removals. The reasoning, per the prior comment, was that navigation can remount the widget with value ``[]`` — a render-time sync would then wipe ``home_uploads``. Real, but the side effect was that the widget's own ✕ appeared to do nothing: the file vanished from the widget chrome, stayed in ``home_uploads``, and re-rendered immediately in the custom list below. Fix: hook the file_uploader's ``on_change`` callback to reconcile ``home_uploads`` against the widget's current value. Streamlit's ``on_change`` fires ONLY on user-initiated value changes; the remount-induced ``[]`` reset doesn't trigger it, so the stash still survives navigation. Removals from the callback also drop the file's findings entry and clear the singular ``home_uploaded_*`` keys when the active upload was removed — matching the custom-button path. The custom "Remove" buttons further down keep working unchanged; the existing AppTest path through ``_home_remove_<sha1>`` still removes exactly the file clicked. 2220 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 20:52:20 +00:00
Michael	24ee021314	fix(footer): hide the helper page_link row that was leaking into pages Same wrong-testid bug as the Close click handler: the CSS rule that's supposed to position the hidden ``st.page_link`` off-screen was selecting ``a[data-testid="stPageLink"]``, but the bare ``stPageLink`` testid is on the OUTER wrapper div — the anchor uses ``stPageLink-NavLink``. ``:has(a[data-testid="stPageLink"]...)`` matched nothing, so the helper rendered as a full-size visible row at the bottom of every page (the "large white bar blocking content" the user reported). Fix: switch both the ``:has()`` rule and the no-:has() fallback to ``a[data-testid="stPageLink-NavLink"][href="close"]``. The ``href="close"`` form also works for base-path deployments (``/myapp/close``), matching the click handler's selector. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 16:07:07 +00:00
Michael	add3b866ee	fix(footer): Close button now actually fires — wrong testid + bad fallback Two bugs combined to make the footer Close a no-op: 1. The helper page_link's anchor carries ``data-testid="stPageLink-NavLink"`` — the bare ``stPageLink`` testid is on the OUTER WRAPPER div, not the anchor. The old selector ``a[data-testid="stPageLink"]`` matched nothing, so ``helper`` was always ``null``. 2. The fallback ``window.location.href = './close'`` ran inside the component iframe, so it only navigated the (invisible) srcdoc iframe. The main app stayed put. End result: click → nothing visible → shutdown_app never runs → farewell-script's ``window.close()`` attempt never happens → user sees the Close button as broken. Fixes: - Selector → ``a[data-testid="stPageLink-NavLink"][href="close"]``. ``href="close"`` covers both root (/close) and base-path (/myapp/close) deployments. - Fallback → resolve the parent window via ``doc.defaultView`` (the parent doc's window) with a ``window.top`` fallback, so the hard-nav navigates the whole app instead of just the iframe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 16:02:46 +00:00
Michael	b568773a1f	chore(streamlit): migrate components.v1.html → st.iframe (deprecation) Streamlit logs a deprecation notice on every render: Please replace ``st.components.v1.html`` with ``st.iframe``. ``st.components.v1.html`` will be removed after 2026-06-01. Replace all 9 call sites (6 tool pages + 3 in ``_legacy.py``). Both APIs feed ``srcdoc`` to the underlying iframe so the HTML/JS payload and the cross-frame DOM access pattern (``window.parent.document``) are unchanged. ``st.iframe`` rejects ``height=0`` (raises ``StreamlitInvalid HeightError``), so bump every zero-height call to ``height=1``. 1px is effectively invisible — these are script-only iframes, no visible payload — and avoids the validator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:57:40 +00:00
Michael	4a7f99f0ec	fix(footer): restore soft-nav for Close (no page reload on shutdown) Footer Close was using ``<a href="./close">`` which triggers a browser hard-nav. That's a visible page-reload flash, websocket churn, and slower shutdown than the previous sidebar Close — which used ``st.navigation``'s soft nav. Restore the soft-nav path: - ``render_sticky_footer`` now renders a hidden ``st.page_link`` pointing at ``pages/99_Close.py``. Positioned off-screen via CSS (``stElementContainer:has(a[data-testid=stPageLink] [href$=/close])``) so it occupies no layout space but stays in the DOM, reachable + clickable. - Footer's Close <button> click handler now dispatches a programmatic click on that hidden page_link. Streamlit's React handler picks it up and runs the soft nav (same code path the old sidebar entry used). Falls back to ``window.location.href`` if the helper link hasn't rendered yet so the button is never a no-op. - The page_link call is wrapped in try/except: ``AppTest`` doesn't populate the page-nav session keys it needs and raises ``KeyError('url_pathname')``. Failure costs only the soft-nav optimization — Close still works via the hard-nav fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:52:00 +00:00
Michael	b2449d3139	fix(nav,footer): drop orphan _hidden section header, show footer on Activate Two follow-ups to the prior sidebar/footer cleanup: - The "_hidden" section header was still visible in the sidebar because Streamlit renders ``stNavSectionHeader`` as a sibling of ``stNavSection``, not a child — so the ``:has()`` rule on the section was hiding the items list but leaving the header (and its collapse/drilldown marker) behind. Move Activate + Close into the unlabeled section (key ``""``) alongside Home so there is no header to leak in the first place, then hide just the two links via ``stSidebarNavLinkContainer:has(...)`` (with a defensive ``a[href$=...]`` fallback for browsers without ``:has()`` support). - The sticky footer was missing on ``pages/_Activate.py`` because the page never called ``render_sticky_footer`` — added the call so the Help / Close bar persists when the user follows the popover's Activate / Manage link. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:45:22 +00:00
Michael	d840230e48	fix(nav,footer): hide Activate from sidebar, surface it in Help popover - Collapse the Account section: Activate now lives in the same hidden sidebar section as Close (single ``_hidden`` group). Both pages stay registered with ``st.navigation`` so /activate and /close remain URL-routable for the Help-popover / Close-button links — only the sidebar entries + their section header are hidden via CSS. - Help popover always exposes a license-management link now: ``Activate now →`` when the license is inactive, ``Manage license →`` when it is active and valid. Both point at ``./activate``. - Extend the sidebar-hide CSS to also match ``a[href$="/activate"]`` and the section that contains it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:39:14 +00:00
Michael	9e8b4b2ca9	feat(footer): help popover shows license state + Activate link - Bump version to 3.0 (src/__init__.py). - Switch support address to support@unalogix.com. - Help popover now includes a License section that reads ``src.license.current_state()``: * When activated + valid: name + expiry date + days remaining. * Otherwise: "Not activated" + an ``Activate now →`` link pointing at ``./activate``. License-state queries are wrapped so a corrupted license file can't take the footer down — it falls through to the inactive branch. - Popover HTML is now built in Python (so the license branch lives in one place) and passed to the JS as a single string. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:35:47 +00:00
Michael	dd231f5a38	fix(footer): render sticky Close+Help footer on the home page too The sticky footer was only wired into the 9 tool pages — the home page (``_home.py``) called ``hide_streamlit_chrome`` but never ``render_sticky_footer``, so the app-level Close+Help bar was missing whenever the user was on the home page. Add the call. Also drop the home page's now-redundant trailing ``st.divider() + st.caption(t("chrome.footer"))`` block — same "blank white bar above the sticky footer" symptom that motivated removing the per-page version from the tool pages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:32:16 +00:00
Michael	143c775cdf	fix(footer,nav): left-justify buttons, drop per-page caption bar, hide sidebar Close Three small follow-ups to the sticky-footer rework: - Left-justify the footer buttons (and reposition the Help popover to anchor at the left edge so it lines up with its trigger). - Remove the per-page ``st.divider() + st.caption("Runs locally…")`` trailing block from all 9 tool pages. The new sticky footer covers that text, so it was rendering as an empty white bar at the bottom of each tool page. - Hide the Close entry from the sidebar nav via CSS. The page stays registered with st.navigation so /close is still routable for the sticky-footer Close button — only the sidebar link + its section header are hidden (via :has() on stNavSection). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 15:04:12 +00:00
Michael	d1b9f642e2	feat(footer): slim sticky footer with Close + Help, drop bottom Back-to-Home The duplicate full-width Back-to-Home button at the bottom of every tool page was reading as a "huge footer." Replace it with a real slim sticky footer holding two controls: - Close: <a href="./close"> to the Close page (which shuts down). Full-page nav is fine here — the process is terminating, so the session-state-loss concern that retired the previous sticky footer doesn't apply. - Help: JS-toggled popover showing version + support@datatools.app. No navigation, no state loss. Top-of-page Back-to-Home stays (uses st.switch_page, preserves state). Add footer.* i18n keys for en + es. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 14:56:02 +00:00
Michael	65c85107b6	revert: restore audit-log kill switch — async redesign didn't help User pulled `d9e32e5` (async-writer audit log + re-enabled diagnostics sidebar) and still sees blank pages. The synchronous-write theory from the previous round was at most a partial explanation; something ELSE in the audit-log code path is also taking the page render down on the user's machine. Restore the kill switch so the user has a working app while we diagnose: - ``src/audit.py``: ``_DISABLED = True`` re-introduced at module top, each of ``log_event`` / ``log_session_start`` / ``log_page_open`` / ``flush_audit_log`` early-returns. The async writer thread is never started. - ``hide_streamlit_chrome``: ``_render_diagnostics_sidebar()`` call re-gated behind ``if False:``. The async writer code stays in place — easier to flip the flag back when we identify the real cause than to rewrite a third time. The shutdown-flush call in ``shutdown_app`` also stays; it early-returns on the kill switch and is harmless. Diagnostic plan for the next session: ask the user for the launcher terminal output (the new stderr "DataTools audit: writes failing..." message would tell us if the writer thread DID start and DID fail), and whether ``~/.datatools/logs/`` is being created at all. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 02:44:23 +00:00
Michael	d9e32e578b	feat(audit): async writer thread — safe to re-enable Reported earlier: synchronous file writes in ``log_event`` blocked the GUI render thread on hostile filesystems (Windows antivirus on ``~/.datatools/logs/`` is the prime suspect). A blocking ``open`` call doesn't raise — try/except can't recover from it — so the only safe re-enable is to take file I/O off the render path. Refactor: - ``log_event`` and friends push events onto a ``deque(maxlen=5000)`` via ``put_nowait`` and return in microseconds. - A single daemon thread (``datatools-audit-writer``) drains the queue and writes batches. Holds the queue lock only long enough to snapshot + clear, then does I/O outside the lock so producers can keep enqueueing. - ``audit_log_path()`` is now pure path arithmetic — no ``mkdir`` no ``open``. The writer thread does the directory creation off the request path, so any hang there only affects the writer. - Bounded queue means an unwritable disk doesn't unbounded-grow memory; the queue caps at 5000 and overflow drops OLDEST events so the most-recent (most-diagnostic) ones survive. - First write failure prints once to stderr; subsequent failures are silent so logs don't drown the launcher terminal. - ``flush_audit_log(timeout_s=0.5)`` drains the queue and signals the writer to exit; bounded so a stuck disk can't delay shutdown. Other changes in this commit: - ``shutdown_app`` now emits a "Session ending" event and calls ``flush_audit_log`` before kicking the os._exit timer, so the closing session's events make it to disk. - The Diagnostics sidebar in ``hide_streamlit_chrome`` is re-enabled (the ``if False:`` gate is removed). Wrapped in try/except defensively — render errors print to stderr, never blank the page. - ``_DISABLED`` kill-switch is gone. The async design IS the safety mechanism now. Tests in ``tests/test_audit.py``: - log_event burst of 1000 events completes in well under 1s (proves non-blocking). - Events queued before flush land on disk with the expected JSON shape; session_start renders; idempotent. - Pointing the audit dir at a file (so mkdir fails) doesn't hang or crash the producer. - Non-JSON extras are str()-coerced rather than dropped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 02:39:48 +00:00
Michael	7cb1bc922d	fix(nav): restore real Streamlit Back-to-Home button — preserves state Reported: after the sticky-footer href fix (`be7191a`) the back-to-home click worked but the home-page upload list disappeared. Full-page navigation via ``<a href>`` doesn't preserve ``st.session_state`` on the user's Streamlit build. Trade-off forced: pick visible-from-anywhere sticky footer OR state preservation. Can't have both because ``st.switch_page`` (soft nav, preserves state) needs a real Streamlit button widget, and Streamlit widgets can't be reliably CSS-positioned to the viewport bottom — Streamlit owns the widget DOM and remounts it on every rerun. State preservation wins. Going back to the pre-sticky design: - ``render_sticky_footer()`` becomes a no-op shim. Kept as a callable so the call sites in every tool page don't have to be touched in this commit; the original implementation is preserved as ``_render_sticky_footer_DISABLED`` if we ever decide to revisit. - Every Ready/Coming-Soon tool page (1-9) gets ``back_to_home_link()`` reinstated near the top of the page (visible at scroll-top) AND ``back_to_home_link(key="_back_to_home_link_bottom")`` reinstated near the bottom of the page (visible at scroll-bottom). Both instances call ``st.switch_page`` via the existing helper — soft nav, no full reload, ``st.session_state["home_uploads"]`` and every other session-state key survive. User trades the "always-visible while scrolling" sticky behavior for the upload-list-survives-navigation behavior. The two-button pattern (top + bottom) was what we had before the sticky-footer experiment; on short pages both are visible at once, on long pages the user has one in reach at either end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 02:31:50 +00:00
Michael	be7191a5d1	fix(footer): navigate to / instead of /home on Back to Home Reported: clicking Back to Home in the sticky footer surfaced Streamlit's "Page not found — Running the app's main page" message in the user's build. Root cause: ``url_path="home"`` on the home page's ``st.Page`` registration is treated as an alias for the default page in some Streamlit minor versions, but the user's build doesn't honour the alias for the page that ALSO has ``default=True``. The default page is served at the root URL ``/``; ``/home`` is treated as a missing page on that build. Switch the footer anchor's href from ``"home"`` (which resolved to ``/home`` from any tool-page URL) to ``"./"`` (resolves to the current document's directory, which on a single-segment URL is the server root → default page → Home). Robust across Streamlit minor versions regardless of how the url_path alias is interpreted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 02:25:57 +00:00
Michael	2d2ff43754	re-enable sticky footer + compact CSS — the audit-log I/O was the hang User confirmed: with the audit-log kill switch (`1caedbb`) in place, pages render. So the hang was 100% in the audit-log file writes — ``open()`` blocking on Windows somewhere — not in the chrome additions disabled during bisection. Two of those three additions are pure UI and have no filesystem exposure, so they're safe to re-enable now: - Sticky footer: pure CSS + a components-html iframe whose JS appends a div to ``parent.document.body``. No disk touch. The user just reported losing the Back-to-Home button to the bisection commit — restoring this brings it back. - Compact-spacing CSS layer: gap reductions on stVerticalBlock / stHorizontalBlock, slim heading margins, slim hr / caption / expander / button / metric padding. Pure CSS. What stays disabled: - Audit-log writes (``src/audit.py:_DISABLED = True``). Any resumption needs an async-write design with a hard timeout so a stuck filesystem can't hang the GUI render. - Diagnostics sidebar: it calls ``audit_log_path()`` which itself does a ``mkdir()`` — and a hanging mkdir would re-introduce the same blank-pages symptom. Will re-enable once the audit log is rewritten not to block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 02:22:55 +00:00
Michael	36510eee7b	fix(findings): namespace per-tool button keys so multi-file render works Reported: uploading multiple files on the home page and clicking Run analysis blew up with StreamlitDuplicateElementKey: key='_findings_open_02_text_cleaner' when two uploaded files both had Clean Text findings. Root cause: ``render_findings_panel`` is invoked once per uploaded file from ``_home.py``, but the per-tool jump button used a filename-agnostic key: key=f"_findings_open_{tool_id}" Two files both flagging Clean Text → two buttons with identical keys → Streamlit rejects the second one. Fix: - Add ``key_namespace: str = ""`` to ``render_findings_panel``. The helper hashes it (sha1 truncated to 8 chars) and appends to every button key, so different namespaces produce different keys but the same namespace stays stable across reruns. - The home page now passes the filename: ``render_findings_panel(findings, header=f"📄 {name}", key_namespace=name)``. - The single-call site in ``upload_and_analyze_section`` (the legacy helper, only used outside the new home-page path) keeps the default empty namespace, which is fine because that path renders findings for ONE file at a time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 02:17:03 +00:00
Michael	1caedbbbc7	bisect: kill-switch every audit-log write Reported: bisection commit `c0bfd4d` that disabled the sticky footer, diagnostics sidebar, and compact-CSS didn't fix the blank-page symptom. User adds that Ctrl+C also can't kill the launcher. Ctrl+C-doesn't-work + every-page-blank together points at a hang in the Python process, not an exception. The most likely hang point in the chrome path is the audit log's file I/O — ``open()`` inside the ``with`` block in ``log_event`` blocks on a stuck filesystem (Windows antivirus quarantining ``~/.datatools/logs/datatools-*.jsonl`` on every write is a plausible culprit on the user's machine). A blocking ``open`` call does NOT raise — try/except can't recover from it — which is why our prior defensive wrap didn't help. Add a module-level ``_DISABLED = True`` kill switch. ``log_event``, ``log_session_start``, and ``log_page_open`` each early-return at the very top of the function when the flag is set, before any file-system call. Path resolution (``audit_log_path``) still works since it's needed for the diagnostics sidebar (still disabled in `c0bfd4d`, but kept harmless). If pages render after this commit, file I/O from the audit log is confirmed as the culprit; we'll redesign with an async writer queue and a tighter timeout. If they still don't, the cause is somewhere we haven't bisected yet and we move to a hard revert. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 02:14:29 +00:00
Michael	c0bfd4dbc9	bisect: temporarily disable new chrome additions to diagnose blank pages Reported: every page renders empty in the main body even after the audit-log defensive-wrap commit (`59c6d0f`). Close button also doesn't trigger shutdown — that page is blank too. Sidebar nav still renders, so the chrome path that runs on every page is the suspect. Three chrome additions land all at once and are temporarily turned off so the user can see whether bare chrome restores rendering: 1. Sticky footer (``render_sticky_footer``): short-circuited with ``return`` at the top of the function. The CSS-injection + components-html iframe mechanic is the highest-suspicion item — if the iframe script throws or the CSS interacts badly with the user's Streamlit / Python build, the side effects can be page-killing on theirs while invisible on ours. The original body is preserved as ``_render_sticky_footer_DISABLED`` so re-enabling is a one-line change. 2. Diagnostics sidebar (``_render_diagnostics_sidebar``): call site in ``hide_streamlit_chrome`` is gated by ``if False:``. Wrapping in try/except (the previous commit) caught exceptions but didn't help — silent partial renders inside ``with st.sidebar: with st.expander: ...`` can still leave the render stack in a bad state on some Streamlit versions. 3. Compact-spacing CSS layer: the ``gap: 0.5rem !important;`` on ``stVerticalBlock`` / ``stHorizontalBlock``, the slim heading margins, the slim hr / caption / expander / button / metric rules — all stripped back to the pre-compact ``_HIDE_CHROME_CSS``. The ``gap`` rule in particular is a suspect: if the user's Streamlit version doesn't render stVerticalBlock as a flex container, the rule is harmless; if it does and interacts badly with overflow, content could be clipped. What's deliberately KEPT enabled: - The audit-log calls (already wrapped from `59c6d0f`). - ``log_page_open`` calls in tool pages (already wrapped internally). - All UI changes pre-compact (the unified tool-page layout, the download-button helper, etc.). If pages render after this commit, we know it's one of the three disabled items above and can bisect further. If they still don't render, the cause is in code that pre-dated the audit-log work and the bisection has to keep going. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 02:09:23 +00:00
Michael	59c6d0f914	fix(audit): defensive wrap so audit failures can never blank the GUI Reported: after pulling commit `c73d716` (audit log) the main body of every page showed empty. Sidebar nav still worked. Diagnosis: the most likely path is that something inside the audit calls — ``_render_diagnostics_sidebar()`` calling ``audit_log_path()``, or ``log_session_start()`` itself — raises during ``hide_streamlit_chrome`` on the user's environment (Python 3.14 on Windows, a less-tested combo than the test environment). Streamlit's script runner sees the exception, and on some chrome paths it eats it without surfacing an error block, leaving the page body empty. The audit log is best-effort by design. Make that contract real: 1. ``hide_streamlit_chrome`` now wraps both ``log_session_start()`` and ``_render_diagnostics_sidebar()`` in try/except. Errors print to stderr (so the developer running ``python -m src.gui`` sees them in the launcher's console) but never bubble up to kill the page render. 2. ``audit_log_path()`` already had a tempdir fallback for the primary mkdir failure, but the SECOND mkdir wasn't protected either. Restructured to a two-level fallback: configured dir → tempdir → ``/dev/null`` (or ``NUL`` on Windows). The last fallback ensures the function never raises; ``log_event``'s own try/except handles the eventual unwritable-file case. 3. ``log_page_open(slug)`` now has an outer try/except so it cannot raise either — protecting every tool page's render path. If a user reports the same symptom again, the launcher terminal will now show a real traceback explaining what's wrong, and the GUI will still render normally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 02:00:31 +00:00
Michael	ee0b1f6f6b	docs: design notes for future PDF→CSV tool New ``docs/FUTURE-TOOLS.md`` captures post-launch tool ideas with a consistent shape — What / Why / Can we ship now / Approach / GUI sketch / Effort / Risks / Ship criteria. Resting place for things the new-tool freeze in ``PLAN.md`` §2.1 refuses to build but that keep coming up. First entry: #10 PDF → CSV extractor (bank statements et al.). Key facts captured: - Current state: no PDF infrastructure exists. Zero PDF dependencies in requirements.txt; zero PDF-touching code under ``src/``. The only "PDF" string in the codebase is the planned- output copy for the Quality Check tool, unrelated to extraction. - Library picks: pdfplumber as the extraction core (BSD-3, no native compiler, gives coordinate-aware text), Tesseract via pytesseract as the OCR fallback for scanned PDFs, streamlit-drawable-canvas as the region-picker component. - GUI sketch: user draws a header strip + a row template on a rendered page; the tool applies that template across N pages, saves the template by layout fingerprint for next month's statement, emits CSV. - Effort phased A–E: 3–4 weeks for a text-only MVP; 6–10 weeks for a polished version with multi-page template recall; +2–3 weeks if scanned-PDF OCR is required. - Difficulty: medium-hard. The pieces are well-trodden; the combination (region selection that persists across pages and across documents with similar layouts) is where the engineering goes. - Ship criteria: ≥1 paying customer + ≥3 paid or ≥5 demo emails asking for PDF extraction + the bookkeeper niche converting at least one customer first. None have fired. Cross-references added: - ``docs/REQUIREMENTS.md`` §11: pointer to FUTURE-TOOLS.md for parked tool ideas, with a one-paragraph summary of #10. - ``docs/PLAN.md`` §2.1: notes that the freeze parks future tools in FUTURE-TOOLS.md and explicitly names #10 as the current highest-pressure entry. - ``docs/NEXT-STEPS.md`` Phase 5 "what NOT to build" table: a new row for the PDF tool tied to the same ship-trigger language. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 01:52:42 +00:00
Michael	c73d716d06	feat(audit): JSONL audit log for support diagnostics New ``src/audit.py`` module records GUI actions to a per-session JSONL file under ``~/.datatools/logs/`` (overrideable via ``DATATOOLS_AUDIT_DIR``). The file is human-readable (one JSON object per line, each with a ``message`` field) AND trivially machine-parseable — the support flow is "client mails the file, we read it and explain what went wrong." Format example:: {"ts":"2026-05-17T05:30:00.123+00:00","level":"info","category":"session", "session":"a1b2c3d4","message":"Session started", "platform":"Windows 11","python":"3.14.0","user":"Michael Dombaugh", "log_file":"C:\\Users\\Michael Dombaugh\\.datatools\\logs\\datatools-...jsonl"} {"ts":"...","category":"upload","message":"Uploaded customers.csv", "filename":"customers.csv","bytes":24813} {"ts":"...","category":"analyze","message":"Analyzed customers.csv (3 findings)", "filename":"customers.csv","findings":3,"rows":120,"cols":8} {"ts":"...","category":"tool_run","message":"Clean Text run", "page":"2_Text_Cleaner"} {"ts":"...","category":"error","level":"error", "message":"analyze(weird.csv): EmptyDataError: No columns to parse", "filename":"weird.csv","outcome":"empty_after_repair"} Public API: - ``log_event(category, message, extra)`` - ``log_session_start()`` — idempotent banner with platform info - ``log_page_open(slug)`` — emit a ``nav`` event, deduplicated per Streamlit session so reruns don't spam the log - ``log_exception(where, exc, extra)`` — convenience wrapper - ``audit_log_path()`` / ``audit_log_dir()`` — for the UI Wired in at: - ``hide_streamlit_chrome``: stamps session start, mounts a small "🩺 Diagnostics" expander in the sidebar with the log path and an "Open log folder" button so the user can grab the file to attach to a support email. - Home page: ``upload`` event on every new file, ``upload`` event on per-file remove, ``analyze`` event with file count when Run-analysis fires. - ``_run_analysis_on_upload``: ``analyze`` event with rows / cols / findings count per file, plus ``error`` events on every caught exception (empty upload, empty after repair, pandas EmptyDataError, generic Exception). - Every Ready tool page (1, 2, 3, 4, 5, 9): ``tool_run`` event immediately after the primary action stashes its result. - Every tool page (1-9): ``log_page_open(slug)`` on render — deduped via session state so we don't get one event per Streamlit rerun. Safety: - ``log_event`` wraps every write in try/except. A broken audit log must NOT crash the GUI. - Non-JSON-serializable extras are ``str()``-coerced before writing. - File CONTENTS are never logged. We capture filename, byte count, and (in the analyzer) a 12-char sha1 fingerprint of the bytes so the same file re-uploaded gets the same trace. - License keys, session cookies, etc. are not logged. - ``DATATOOLS_AUDIT_DIR`` env var lets tests redirect writes into a tmp dir. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 01:36:35 +00:00
Michael	f0885aeb1e	feat(analyze,ui): recommend Standardize Formats + bold red Open buttons Two reported issues addressed together because they're the same UX flow (home findings panel → jump to relevant tool). (1) Format-Standardizer recommendations weren't firing. Reported: uploading a file from the format-cleaner test corpus (``24_format_dates.csv``, ``25_format_phones.csv``, ``29_format_currencies.csv``, ``30_format_integration.csv``) showed zero "Standardize Formats" recommendations even though the columns clearly mixed multiple date / phone / currency formats. Two underlying causes: - ``_detect_inconsistent_date_format`` required two MATCHES per distinct format. A test column with N rows each in a different format had ≤1 match per format and was silently passed over. Loosened to "≥1 match per format" — the inconsistency signal is the presence of ≥2 distinct formats, not their volume. - Only date inconsistency was detected. Phones, currency, and booleans (the other format-standardizer fix categories) had no detector at all. Added three new detectors: - ``_detect_inconsistent_phone_format``: nine phone-format regexes (plain-10, US paren / dash / dot / space, +country, extension, intl plus). Fires when a column is ≥35% phone-shaped AND mixes ≥2 formats. - ``_detect_inconsistent_currency_format``: thirteen currency regexes covering US ($1,234.56 / $1234.56), EU (€1.234,56), India lakh notation, Swiss apostrophe, trailing-symbol, parens-negative, prefix-currency-code, suffix-currency-code, and negative variants. Same fire criteria as phone. - ``_detect_inconsistent_boolean_format``: column is ≥80% boolean tokens (yes/no/y/n/true/false/1/0) AND uses ≥3 distinct surface forms (e.g. yes / Y / true / 1 mixed together). Verified on every file in ``test-cases/format-cleaner-corpus/``: 24_format_dates, 25_format_phones, 29_format_currencies all now produce a format-standardizer Finding. The integration test file flags all three. The threshold loosening (from 50% to 35% of values format-shaped) is still strict enough to avoid false-positives on free-text comment columns where a few cells happen to look phone- or date-shaped. (2) The "Open <Tool>" jump links blended into the page. Reported: the per-tool jump links inside the home findings panel were too subtle to notice. Replaced ``st.page_link`` with ``st.button(type="primary")`` so the buttons render in Streamlit's primary-action red colour, matching the "Clean Text" / "Find Duplicates" / etc. run buttons. Click handler delegates to ``st.switch_page(page_slug)`` so it's still a soft in-app navigation (no full reload). 2220 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 00:54:31 +00:00
Michael	229e1afd45	fix(footer): mount Back-to-Home outside Streamlit's container tree Reported: the sticky footer rendered, but the Back to Home button inside it wasn't visible. Likely cause: ``st.markdown`` inserts the footer div inside Streamlit's content tree, which sits under ``.stApp { zoom: 0.85 }`` (our compact scaler) and several nested padding/positioning contexts. Streamlit's own ``<a>`` styling rules can also colour-collide with our anchor. Switch the mount strategy. Two passes: 1. CSS rules go to the parent document via ``st.markdown`` as before, but every property carries ``!important`` and the selectors key on ``#datatools-sticky-footer`` (id, not class) plus a dedicated ``.datatools-sticky-footer-link`` class on the anchor — so Streamlit's default ``<a>`` styles can't override colour or padding. ``z-index: 2147483646`` keeps the footer above anything else in the page. 2. The footer DOM node itself is created by a script inside a zero-height ``streamlit.components.v1.html`` iframe. The script does ``window.parent.document.body.appendChild(...)`` so the div lives as a direct child of ``<body>`` — outside ``.stApp``, outside every Streamlit container, free of every parent's ``zoom`` / ``transform`` / ``overflow`` rules. If the cross-frame access ever fails (Streamlit sandbox config change), the script falls through to appending inside the iframe's own document — degraded but still visible. Each rerun replaces any prior ``#datatools-sticky-footer`` so we don't accumulate stacked footers on every script pass. 2220 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 00:47:44 +00:00
Michael	7ad19ac7f4	feat(nav,i18n): sticky footer with Back-to-Home + localized tool headers Two unrelated UX issues addressed in one sweep across all nine tool pages because they share the same edit surface. (1) Sticky footer replaces the top + bottom back-link buttons. Reported: a big white empty footer space at the bottom of every page; the Back to Home button at the top scrolled out of view on long pages. New ``render_sticky_footer()`` helper in ``components/_legacy.py`` injects a fixed-position bar at ``bottom: 0`` of the viewport with: - A border-top so it visually reads as a non-movable bar. - A semi-transparent background (rgba 0.96 + ``backdrop-filter: blur``) so content underneath shows through faintly when the user scrolls. - A styled ``<a href="home">`` anchor (not an ``st.button``) because Streamlit widgets can't be CSS-positioned reliably — Streamlit owns the widget's DOM container and re-mounts it on every rerun. A real anchor sits exactly where the CSS puts it and triggers Streamlit's URL routing to the home page. - ``padding-bottom: 3.5rem`` on the main container so the last widget isn't hidden behind the bar. Called once per tool page, immediately after ``hide_streamlit_chrome()`` so it renders even on pages that ``st.stop()`` early before any other content runs. The old top-and-bottom ``back_to_home_link()`` calls are removed from every tool page; their entry/exit points were dropping the button when the script short-circuited. (2) Tool-page headers now localize. Reported: switching the sidebar language picker to Spanish left the tool page's title + caption in English. Root cause: every page had hard-coded ``st.title("✂️ Clean Text")`` / ``st.caption("Trim whitespace...")`` strings. Added per-tool ``tools.<id>.page_title`` and ``tools.<id>.page_caption`` keys to ``en.json`` and ``es.json`` for all nine tools. Routed each page's title/caption call through ``t()``. Verified: with ``ui_lang=es`` set, the Clean Text page now renders "✂️ Limpiar texto" + the Spanish caption. Updated ``tests/gui/test_smoke.py::EXPECTED_SUBSTRINGS`` so the ``es`` column for each tool page asserts the actual Spanish string (was a duplicate of the English string back when the page bodies were English-only). 2220 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 00:42:45 +00:00
Michael	84e4665ab0	fix(home): make per-file Remove button reliable Reported: the "✕" buttons on the uploaded file list removed files inconsistently — some clicks took, some didn't. Two compounding causes: 1. ``key=f"_home_remove_{name}"`` embedded the raw filename in the Streamlit widget key. Streamlit's widget-identity machinery normalizes keys differently across reruns when they contain spaces, dots, brackets, or non-ASCII characters, so a button's identity could shift between the render where the user clicked it and the rerun that should have processed the click. The click was registered, but the post-rerun render produced a new widget under a new effective key, and the original click was "lost". 2. The handler mutated ``home_uploads`` mid-loop while subsequent iterations were still creating buttons. ``st.rerun()`` raises synchronously, but if ANOTHER button's state changed in the same pass (e.g. a stale click held over from a fast double-tap), the ordering of state-mutation vs widget-key-update vs rerun could race. Fixes: - Stable widget keys: ``f"_home_remove_{sha1(name)[:10]}"``. The hash is identifier-safe regardless of spaces / dots / Unicode in the filename. Verified across "sample with spaces.csv", "sample.csv", and "日本語.csv" — three sequential Remove clicks each remove exactly one file with no clicks lost. - Two-phase capture: the loop collects the target ``to_remove`` filename, finishes rendering every other row at consistent widget identity, THEN mutates state once and reruns. No more mid-loop ``del`` racing other widgets' click handlers. - Wider click target: column ratio ``[8, 1]`` (was ``[12, 1]``) and ``use_container_width=True`` on the Remove button so the click surface fills the entire column. Label changed to "Remove" for the same reason — "✕" is a thin glyph that compressed the hit-test region. 2220 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 00:34:20 +00:00
Michael	4685bb4289	style(chrome): tighter vertical rhythm — less whitespace across screens Reported: too much whitespace between widgets, dividers, and headings. Compact-spacing CSS layer added to ``_HIDE_CHROME_CSS`` (so it applies on every page that calls ``hide_streamlit_chrome``): - ``[data-testid="stVerticalBlock"]`` and ``stHorizontalBlock`` gap trimmed from Streamlit's default ~1rem to 0.5rem. - Heading margins (h1-h4) tightened — h1/h2/h3 used to leave 1-1.5rem above; now 0.25-0.5rem. - ``hr`` (``st.divider()``) drops from 1rem above+below to 0.4rem. - Markdown paragraphs and captions: 0.25rem bottom margin instead of the default 1rem. - Expander summary padding reduced (0.35rem top/bottom). - File-uploader, button, and metric tiles: trimmed internal padding. Also slimmed the main-container padding from 1rem top / Streamlit default bottom (~6rem) to 0.5rem top / 0.75rem bottom. The existing ``zoom: 0.85`` on ``.stApp`` is kept — the user wanted less white space, not smaller content, and dropping zoom would shrink type alongside everything else. 2220 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 00:28:58 +00:00
Michael	e96d5901f4	fix(close): graceful about:blank fallback + display-mode aware hint Reported: user asked whether we can send Alt+F4 / Ctrl+W to the browser from JavaScript to force-close a tab. Honest answer that's now baked into the hint message: NO. Synthesized keyboard events from page JS only reach DOM event listeners, not the browser chrome or the OS. There is no flag, API, or trick that lets a page close a tab the user opened themselves. The page CAN close a window it opened (window.opener trail) or one whose display-mode is ``standalone`` (Chrome/Edge ``--app=URL``) — that's what ``python -m src.gui`` arranges, and that's the path that actually closes the window without a manual Ctrl+W. Improvements landed: 1. ``isStandalone(win)`` detects Chrome --app windows up front (``matchMedia('(display-mode: standalone)').matches``). In a regular tab the manual hint surfaces immediately on the "Close this window" click; in --app mode we only show it if the close attempt actually fails. 2. ``fallbackToBlank(win)`` navigates the tab to ``about:blank`` via ``location.replace`` (no history pollution) so the user sees a clean empty tab instead of the farewell overlay frozen over Streamlit's connection-error banner. They still have to Ctrl+W the blank tab, but the screen is no longer a misleading "did it close or not?" mess. Fires 250 ms after a failed close in --app mode (very rare path), or 1.5 s in a regular tab so the user has time to read the hint. 3. Hint message rewritten in en + es to explain WHY the close is blocked (browser security — not something we can override), to acknowledge the Alt+F4 / Ctrl+W question directly (those don't work either, for the same reason), and to point at ``python -m src.gui`` as the path that gives a clean auto-close. 2220 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 00:07:51 +00:00
Michael	ecfc52499f	fix(home): persist upload list across page navigation Reported: clicking "Back to Home" from a tool page returned the user to an empty home — their previously-uploaded files were gone. Root cause: Streamlit's ``st.file_uploader`` widget state does not reliably survive ``st.switch_page``. The widget gets unmounted on navigation, and its ``UploadedFile`` objects don't always re-attach on remount. The home page was treating the widget's return value as the source of truth, so after navigation the list was empty. Fix: introduce a session-state stash keyed by filename (``home_uploads: dict[str, {"bytes": bytes, "size": int}]``) and treat it as the source of truth for everything downstream — the active-file pickup keys for tool pages, the per-file findings cache, and the rendered file list. The widget is reduced to its narrow role of capturing NEW uploads, which we merge into the stash without ever removing. Per-file remove: a "✕" button next to each filename drops just that file (and its findings). The widget's own "✕" is bypassed by our rendering, since trusting it would let the widget's state diverge from the stash. Clear-results button is unchanged: it wipes only the analysis cache, leaving uploaded files intact (per the user's "persistent until cleared" requirement — removal is per-file via "✕"). Tool-page compatibility: the singular ``home_uploaded_{name,size, bytes}`` keys still get populated from the first entry in the stash on every render, so ``pickup_or_upload`` on a tool page keeps finding the active upload. When the user removes the active file, those keys are cleared so the next render repopulates from whatever file is now first. ``_StashedUpload`` is a small duck type ( ``.name``, ``.size``, ``.getvalue()`` ) so ``_run_analysis_on_upload`` accepts entries restored from the stash without changes. 2220 tests pass. Smoke-verified via AppTest: pre-stashed ``home_uploads`` renders the file list with per-file remove buttons, and the persistent state survives a simulated navigation round-trip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 00:04:12 +00:00
Michael	21fd8a4cd7	fix(nav): switch_page resolves correctly + bottom-of-page back link Two issues, same fix surface. (1) Reported crash on Back-to-Home: StreamlitAPIException: Could not find page: app.py. ``st.switch_page("app.py")`` doesn't work under ``st.navigation`` — the entry script is the nav manager itself and is not a registered page. The fix needs to pass an ``st.Page`` object whose script identity matches one registered in the nav. First-pass attempt (``from src.gui.app import _home_page``) hit a worse failure: importing ``app.py`` from inside a tool-page render re-executes the nav setup with the WRONG "main script" context, so every ``st.Page("pages/N_foo.py", ...)`` call in ``_build_navigation`` fails with "file could not be found". Extract the home renderer into its own module ``src/gui/_home.py`` which has no top-level Streamlit side effects. Both the nav manager and the back-link helper import ``_home_page`` from there. The Page object built at click time has the same callable identity as the one registered, so ``st.switch_page`` resolves it. (2) Reported UX: the back button scrolled out of view on long pages. Add a second ``back_to_home_link(key="_back_to_home_link_bottom")`` call near the footer of every tool page (1-9). The unique key avoids widget-id collision with the top instance. Coming-Soon stubs get it unconditionally; Ready tools render it only after a result exists because the page short-circuits with ``st.stop()`` before then — when no result is on screen the page is short enough that the top link is sufficient. 2220 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:58:33 +00:00
Michael	42f8d78dd5	fix(downloads): drop /select on Windows — opens wrong folder Reported: clicking "Open Downloads folder" was opening the Documents folder instead of Downloads. Root cause is the classic Windows gotcha: when the path contains a space (e.g. ``C:\Users\Michael Dombaugh\Downloads``), Python's ``subprocess.Popen`` packs the ``/select,...`` argument into a single quoted token, and Explorer's ``/select`` argument parser does NOT accept that form — it silently falls back to whatever the user's default Explorer view is (typically Documents). Resolution paths considered: - ``shell=True`` with a hand-built command string — works but opens the door to shell-injection if a file_name ever contained a quote or special char. - ``cmd /c start "" explorer /select,...`` — same parsing issue. - ctypes ShellExecuteW — pulls in a Windows-only dependency. - Skip /select. Open the folder directly. ✓ Going with the last. ``explorer <folder>`` reliably opens the folder regardless of spaces in the path; the user finds the freshly-saved file by its name. The previous "highlight the file" nicety wasn't worth the path-parsing fragility — every user folder on Windows is ``C:\Users\<name>`` and every Windows username can contain a space. macOS keeps the ``open -R <file>`` reveal-in-Finder path because macOS argument parsing is sane and that's a strict UX win. 2220 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:45:47 +00:00
Michael	0f89d7ba66	fix(downloads): use explorer /select on Windows + show open feedback Reported: clicking "Open Downloads folder" did nothing visible. The previous implementation called ``os.startfile(folder)`` on Windows, which is known to silently no-op or open Explorer behind the active window in some configurations (Streamlit running headless, no foreground rights inherited by the click handler thread, etc.). Switch to the more reliable ``explorer /select,<file>`` form: - Opens Explorer with the just-saved file pre-highlighted instead of just navigating to the folder — better UX than the old behavior. - explorer.exe is a real GUI process that's spawned in the user's session with foreground rights, so it shows up on top. - Fallback chain on Windows: ``/select`` first, then plain ``explorer <folder>``, then ``os.startfile`` as a last resort. macOS upgraded the same way: ``open -R <file>`` reveals in Finder rather than opening the directory. Linux: no reliable cross-distro reveal, so ``xdg-open <folder>``. Plus user feedback at the call site: - On successful dispatch: ``st.toast("Opening <folder>", icon="📂")`` — confirms we tried, in case the window comes up behind the browser. - On dispatch failure: ``st.warning`` with the full path the user can copy/paste into their file manager manually. 2220 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:25:06 +00:00
Michael	b9147f3b66	fix(downloads): save server-side to ~/Downloads + open-folder link Switch the download mechanic from "browser <a download> with a data: URL" to "write the bytes directly to the user's Downloads folder and show them the exact path". DataTools runs as a local Streamlit app, so the "server" IS the user's machine — there's no reason to go through the browser save dialog at all. Flow: 1. Click "Download <something>" button (rendered as a regular ``st.button``, so no widget-collision issues). 2. Bytes are written to ``Path.home() / "Downloads" / file_name`` (overwriting any same-named file). 3. The page reruns and renders a success caption with the absolute path the file landed at. 4. An "📂 Open Downloads folder" button appears. Clicking it pops the OS file manager via ``os.startfile`` (Windows), ``open`` (macOS), or ``xdg-open`` (Linux). Why this is better than the previous HTML-data-URL helper: - Unambiguous about where the file went — user sees the full path, not "wherever your browser was configured to save". - The data: URL approach base64-inflated the page payload by 33% and bloated for large outputs; server-side write is byte-for-byte. - No more browser-side widget collision class of bug. - The save action is a real Streamlit button, so the existing widget semantics (disabled, help tooltip, key isolation) work without workarounds. API surface unchanged. New canonical name ``local_download_button``; ``html_download_button`` is kept as a back-compat alias that points at the same implementation — every existing call site continues to work without edits. Tests are protected from polluting the developer's home dir via a ``DATATOOLS_DOWNLOADS_DIR`` env var override returned by the new ``_downloads_dir()`` helper. Smoke verified end-to-end via AppTest: click → file appears in tmp dir → success banner shows path → open-folder button renders. 2220 tests pass, 91 skipped, 35 s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:48:28 +00:00
Michael	5128d35961	fix(text-cleaner): hoist show_hidden + stress-test all tool pages Reported crash: clicking "Clean Text" with mojibake.csv (a junk corpus file that the cleaner ran on but produced zero changes) blew up the results render with NameError: name 'show_hidden' is not defined at the cleaned-preview block. ``show_hidden`` was defined inside ``if result.cells_changed:`` and referenced unconditionally below. Fix on the page itself: hoist the ``show_hidden = st.toggle(...)`` declaration out of the conditional so it's always in scope for the downstream cleaned-preview render. One toggle now drives both the Examples table (which only renders when there are changes) AND the cleaned preview (which always renders). Generalized regression net: ``tests/test_junk_corpus_tool_pages.py``. For nine representative junk files (empty, only_nul, mojibake, invalid_utf8, utf16_le_no_bom, mismatched_columns, all_nulls, corrupt_xlsx, single_column) and every Ready/Coming-Soon tool page, the test: 1. Stashes the junk bytes as the home upload via session_state. 2. Runs the page through AppTest, asserts ``app.exception`` is empty. 3. If the page exposes a deterministic primary-action button label, clicks it and asserts no exception on the post-click render. Pages that catch a bad file at read time and short-circuit via ``st.error`` + ``st.stop`` are correctly skipped from the primary-action half (the button isn't rendered). A genuine crash shows up as ``app.exception`` carrying a Python traceback — exactly what the user reported, exactly what we now catch. 162 tests collected, 102 passed, 60 skipped. 4 seconds. Full suite: 2220 passed, 91 skipped, 35 s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:41:14 +00:00
Michael	696996c119	test(junk-corpus): pathological-input stress suite for the analyzer Build a corpus of 35 deliberately-broken files (empty bytes, NUL bytes, mojibake, UTF-16 without BOM, mismatched columns, unescaped quotes, corrupt zip, etc.) and pin the analyzer's stability contract against them. Files land in ``test-cases/junk-corpus/test_data/``. The generator ``make_junk_corpus.py`` produces them deterministically (one random sample uses ``secrets.token_bytes`` — committed bytes are stable across regenerations because the byte stream is captured at commit time). README documents the categories and how to add new shapes. ``tests/test_junk_corpus.py`` parametrizes over every file in the corpus and asserts: 1. ``_run_analysis_on_upload`` never raises — exceptions must be caught and surfaced as a synthetic ``Finding`` with severity="error". This was the user-reported crash for 13_non_latin_scripts.csv that the previous fix in `ae9d4a2` defensively wrapped; the corpus now stops the regression from re-landing on a different shape. 2. Every Finding in the result list is well-formed (string id, valid severity, non-empty description). 3. A high-risk subset (empty.csv, only_bom.csv, only_nul.csv, corrupt_xlsx.xlsx) MUST surface at least one error-level Finding — otherwise the GUI would render "no issues found" for a structurally broken file. 4. Error-level Finding descriptions are at least 20 chars so the UI banner gives the user something to act on. Also exclude ``junk-corpus`` from ``tests/test_fixtures_sweep.py`` since that sweep is happy-path (round-trip the text cleaner) and fights with files designed to break it. The contract is enforced by the dedicated junk-corpus test, not the sweep. Runtime: 12 s for the junk-corpus tests, 30 s for the full project suite (was 19 s without these). 2118 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:35:22 +00:00
Michael	ae9d4a2db5	fix(home): defensive analysis errors don't crash the whole page Reported: uploading 13_non_latin_scripts.csv made the home page bubble a ``pandas.errors.EmptyDataError`` traceback up through the page chrome instead of surfacing as a per-file error. In a multi-file analysis run that kills every other file's results too, which is worse than the symptom itself. Wrap ``_run_analysis_on_upload`` in proper error handling: - Empty bytes ``getvalue() == b""`` short-circuits with a synthetic error Finding telling the user the upload was zero-byte and to re-upload. - Empty ``repair.repaired_bytes`` (file was all NULs / BOM / stripped to nothing) likewise surfaces as a synthetic Finding rather than reaching pd.read_csv. - ``pd.errors.EmptyDataError`` from pandas is caught and rendered as a Finding that names the file, its byte size, and suggests opening it in a text editor to verify the header row matches the data row delimiter. - Any other exception during read/analyze is caught and surfaces as a Finding via ``format_for_user`` so the user gets a clean message, not a Python traceback. Each file in a multi-file run now stands alone: a bad file produces one red banner in its own card, every other file analyzes normally. The 13_non_latin_scripts.csv corpus file is 249 bytes of valid UTF-8 on disk and parses cleanly under the same code path locally — the user's specific symptom is likely a zero-byte upload (browser / network / Python 3.14 + Streamlit edge case). The new ``empty_upload`` finding will name the bytes count so they can confirm. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:22:10 +00:00
Michael	ef9f8b5de4	fix(close): Edge fallback + better tryClose + honest hint There is no JavaScript override for browser tab-close security: ``window.close()`` only succeeds on windows JS opened (Chrome --app windows qualify; a regular browser tab does not). What we can do is make the --app path easier to hit and the failure case more actionable. Three changes: 1. ``src/gui/__main__.py`` — extend browser detection. PATH lookup now also looks for ``msedge`` / ``microsoft-edge``; Windows install candidates include the Edge install path; macOS candidates include Edge and Chromium. Edge is Chromium-based, supports ``--app``, and ships on every Windows 10+ machine — so users without Chrome no longer fall through to the regular browser tab. When the fallback IS hit, print a warning to stderr explaining why Close-from-page will require Ctrl+W. Renamed ``_find_chrome`` to ``_find_app_browser`` to reflect the broader scope. 2. ``_FAREWELL_SCRIPT_TEMPLATE`` in ``components/_legacy.py`` — factor close attempts into a ``tryClose`` helper that runs three escalating tries: standard ``win.close()``, the ``win.open('', '_self')`` history-rewrite trick (no-op in modern Chrome but free), and ``win.top.close()``. Auto-close on paint AND the manual button now both call this helper. Skip the manual hint if the close eventually succeeded between the click and the 250 ms timeout. 3. ``quit.close_hint`` in en/es i18n packs — rewrite the message to tell the user honestly that this is a browser security restriction, tell them the Ctrl+W keystroke that works, and point them at ``python -m src.gui`` for the auto-closing app-mode experience. 2008 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:17:18 +00:00
Michael	aeead05e4c	fix(downloads): swap st.download_button for an HTML <a download> helper Reported symptom: only the FIRST download button in a multi-button row pops the browser save dialog. The second and third do nothing on click. Affects every tool page that exposes (cleaned + audit + config) downloads. Root cause is ``st.download_button`` itself — when several render in the same script pass, the click-to-bytes wiring on the browser side mis-routes and only one button's data is actually exposed. Explicit ``key`` arguments don't fix it; ``use_container_width=True`` doesn't help either; we confirmed this in the Text Cleaner reverts. Replace the widget with a real ``<a download="file" href="data:...">`` anchor rendered via ``st.markdown(..., unsafe_allow_html=True)``. Bypasses Streamlit's widget machinery entirely; behaves identically to a native browser download. Side benefit: clicking it does NOT trigger a script rerun, so other in-flight UI state survives. New helper ``html_download_button`` lives in ``src/gui/components/_legacy.py`` (exported from ``components``). API: html_download_button( label, data, *, file_name, mime="application/octet-stream", disabled=False, help=None, use_container_width=True, ) Translation pattern applied across every tool page (and shared ``results_summary`` / ``config_panel`` widgets in ``_legacy.py``): - ``st.download_button(`` -> ``html_download_button(`` - ``data=foo_bytes`` kwarg -> positional second arg - ``key="..."`` -> dropped (helper has no widget identity) - ``use_container_width=True`` -> dropped (default) - ``disabled=`` and ``help=`` pass through unchanged - Pre-computed byte buffers kept where they were Total: 17 sites replaced (3 in Text Cleaner, 3 in Format Standardizer, 3 in Fix Missing Values, 3 in Map Columns, 3 in Automated Workflows, 2 in Find Duplicates page + 4 in shared _legacy.py widgets used by Find Duplicates). Caveat: data: URLs balloon by 33% (base64). Fine for tool output sizes we ship; if a future result topped a few hundred MB we'd want a Blob-URL fallback. The marketing demo at src/gui/app_demo.py keeps its single st.download_button — single button, no collision, no need to switch. 2008 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:13:41 +00:00
Michael	6415be8bf4	feat(tools): unified post-run UX across all Ready tool pages Apply the Clean Text page's post-run UX pattern to every other Ready tool page (Find Duplicates, Standardize Formats, Fix Missing Values, Map Columns, Automated Workflows) for consistency and ease of use. Per page: 1. Preview wrapped in ``st.expander(f"Preview: {filename}", expanded=not _has_result)``. Open before a result exists, folded afterwards. 2. Options / configuration controls wrapped in ``st.expander("Options", expanded=not _has_result)``. Inner sub-expanders preserved (Streamlit 1.36+ supports nesting). 3. After the primary action stashes the result, set a one-shot ``_<tool>_scroll_to_results`` flag in session state and call ``st.rerun()`` so the preview + options expanders see the new state on the next pass and collapse themselves. 4. ``<div id="<tool>-results-anchor" style="height:1px">`` placed immediately before the Results subheader. 5. End-of-page: pop the scroll flag and inject a tiny ``streamlit.components.v1.html`` iframe whose ``<script>`` calls ``scrollIntoView`` on the parent document's anchor. One-shot, so unrelated reruns (toggling Show-hidden, etc.) don't yank the viewport. 6. Download buttons hardened against the multi-button Streamlit footgun: byte buffers pre-computed outside the column scopes, explicit unique ``key="<tool>_dl_<purpose>"`` per button, ``use_container_width=True``, and previously-conditional buttons now render unconditionally with ``disabled=True`` + a help tooltip when the underlying data is empty so layout stays steady. Per-page judgment calls (already noted in agent reports): - Find Duplicates: sheet picker and delimiter selector kept OUTSIDE expanders (the user still needs to see them when a file fails to parse). - Fix Missing Values: missingness profile wrapped INSIDE the Options expander together with Strategy — the Results section already shows a before/after missingness comparison that supersedes the static input profile. - Map Columns: all three subsections (Target schema, Strategy, Mapping) wrapped under one outer Options expander, matching the Text Cleaner pattern. - Automated Workflows: inner "Recommended tool order" expander stays nested inside the outer Options wrap; Run button stays outside Options so the user can re-run after tweaking the (collapsed) editor. 2008 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:04:37 +00:00
Michael	d1aaf3c2b9	feat(quit): close-window button + manual hint on the farewell overlay The farewell overlay already attempted ``window.top.close()`` after a Close click — but browsers only honour that for tabs that JS opened (Chrome --app windows qualify; a regular browser tab does not). For users whose Chrome wasn't auto-detected and who fall back to ``webbrowser.open``, the overlay stays put and they had no in-page way to close. Add to the overlay HTML: - A "Close this window" button (uses the user-gesture path, which has slightly looser browser rules than auto-close). - A hidden hint paragraph that reveals itself 250 ms after the button is clicked IF the window is still here, telling the user to press Ctrl+W (⌘W on Mac). Wired through the existing _farewell_script template + ``_js_html_safe`` escaping so neither label can break out of the JS string literal. New i18n keys (en + es): ``quit.close_window_button`` and ``quit.close_hint``. The existing auto-close attempt remains — Chrome --app users still get their window closed without touching the button. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 20:59:17 +00:00
Michael	27f0648093	fix(text-cleaner): make all three download buttons actually fire Only "Download cleaned CSV" was working; "Download changes audit" and "Download config JSON" did nothing on click. The symptom is the classic Streamlit footgun for multiple ``st.download_button`` widgets in adjacent columns: without an explicit ``key`` argument the auto-derived widget IDs can collide, especially when one button is conditionally rendered, and only the first button in source order actually fires on click. Same goes for unstable ``data`` bytes recomputed inside the ``with col:`` block — the widget identity can drift between renders. Robustness pattern applied: - Compute all three byte buffers up front, outside the columns, so the ``data`` parameter is the same object across reruns. - Pass an explicit unique ``key`` ("textclean_dl_cleaned" / "textclean_dl_changes" / "textclean_dl_config") to each button. - Render the changes button unconditionally with ``disabled=True`` and a help tooltip when ``result.changes.empty`` — instead of hiding it. Layout stays steady and the empty case is self-explanatory. - ``use_container_width=True`` so the three buttons size identically inside their columns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 20:56:52 +00:00
Michael	0a61d52200	feat(text-cleaner): collapse options + auto-scroll to Results on run After clicking Clean Text the user was left at the bottom of the script with the Options block still expanded and no viewport movement — they had to scroll to find the Results. - Wrap the whole Options block in an outer ``st.expander("Options", expanded=not _has_result)``. After the Clean Text rerun, both Preview AND Options collapse, leaving the primary action button + Results as the only prominent elements above the fold. The inner Advanced-options expander is preserved as a nested expander (supported in Streamlit 1.36+; this repo pins 1.35+). - Add a 1px anchor div ``#textclean-results-anchor`` immediately before the Results subheader. - On Clean Text click, set a one-shot ``_textclean_scroll_to_results`` flag in session state; on the next render, pop the flag and inject a tiny ``st.components.v1.html`` iframe whose ``<script>`` calls ``scrollIntoView`` on the parent document's anchor. One-shot so re-renders triggered by other widgets (Show-hidden toggle, etc.) don't jerk the viewport back to the top of Results. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 20:50:43 +00:00
Michael	ca14ce2952	feat(text-cleaner): collapse preview on run + full hidden-char audit Two small UX fixes on the Clean Text page: 1. The input preview is now wrapped in an ``st.expander`` whose default-expanded state is ``not has_result``. Clicking the "Clean Text" primary button stashes the result and calls ``st.rerun()`` so the next pass sees the result in session state and the expander folds — the Results section becomes the primary visual focus. User can re-expand manually to re-inspect the source. 2. The Examples (changes audit) table's Before/After columns were calling ``visualize_hidden_html`` WITHOUT ``mark_outer_whitespace``, so leading/trailing whitespace — which is exactly what the cleaner most often removes — was invisible. Pass ``mark_outer_whitespace=True`` to match the input-preview rendering. Column-name cell now mirrors that flag too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 20:43:52 +00:00
Michael	502a72cd46	feat(nav): ← Back to Home link on every tool page Multi-file workflow: a user uploads several files on Home, clicks "Open <Tool>" on one file's findings, lands on a tool page. The sidebar lets them get back to Home, but a top-of-page back affordance is more discoverable and keeps the hand in the same screen region as the upload list they're working through. - New ``back_to_home_link()`` helper in components/_legacy.py renders a secondary button that calls ``st.switch_page("app.py")`` — under ``st.navigation`` that routes to the default (Home) page. - Wired into every tool page (1-9) directly after ``hide_streamlit_chrome()`` and BEFORE the license gate so a Lite user who lands on a locked tool can navigate away without paying. - New i18n key ``nav.back_to_home`` ("← Back to Home" / "← Volver al inicio") in en/es packs. 2008 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 20:38:01 +00:00

1 2 3 4 5

243 Commits