Two visual cleanups:
1. The block-container "claim padding" rule was a no-op — it targets
the legacy ``stAppViewBlockContainer`` testid; Streamlit renamed
it to ``stMainBlockContainer`` in the current release. Updated the
selector list to match both, so the page title now sits close to
the top edge again (~0.5rem from the hidden header) instead of
inheriting Streamlit's default ~6rem header reservation.
2. ``.dt-finding-group-head`` margin tightened to ``margin: -1rem
-1rem 0.75rem``: -1rem on top/sides still bleeds the head to the
card edges, but +0.75rem on the bottom is breathing room between
the head's bottom border and the first finding row, which were
abutting before.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-row file sizes and the Files-card total-size meta both read as
human-readable units now. Smallest unit is KB even for sub-kilobyte
files (so ``538 B`` → ``0.5 KB``, ``4914 B`` → ``4.8 KB``), steps up
to MB at 1 MiB and GB at 1 GiB. Always one decimal place.
New module-level helper ``_format_size(int) -> str`` in ``_home.py``;
both the section meta (``1 file · 4.8 KB total``) and the per-row
``dt-file-size`` cell call it instead of the previous ad-hoc
``f"{n:,} B"`` formatter. Keeps the display consistent regardless of
file size — and keeps the GUI free of raw byte counts that nobody
needs to read.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mockup §file-add lands as the canonical import affordance:
- Streamlit's ``st.file_uploader`` widget is still mounted (only path
that actually receives browser file events), but parked off-screen
via a new ``[data-testid="stFileUploader"] { position:absolute;
left:-10000px; … pointer-events:none }`` rule. Its hidden
``<input type="file">`` stays reachable to JavaScript.
- The Files card is now always rendered (header + bordered body).
The bottom row of the card is a ``button.dt-file-add`` styled per
mockup §file-add: dashed top border bleeding to the card edges,
surface-hover background, ``+ Add more files`` text in
``--ink-secondary``, accent-fill on hover.
- A small ``<script>`` shipped through ``st.iframe`` wires the
button: ``click → input.click()`` on the off-screen
``stFileUploaderDropzoneInput``. Streamlit's HTML sanitizer
strips inline ``onclick`` from ``unsafe_allow_html`` content, so
the binding has to come from a real script element — same pattern
the sticky footer and Upload→Import rewriter use. A
``MutationObserver`` re-wires the button when Streamlit remounts
it across reruns. The ``dataset.dtWired`` guard prevents double
binding.
Section structure also tightened to match the mockup:
- Section heading is now ``<h2>Files</h2>`` (was ``### Import one
or more files to start``) with the count + total size on the
right of the same flex row. When no files: ``No files imported
yet``. When files exist: ``1 file · 4.8 KB total``.
- Dropped the ``upload.intro_multi`` caption and the
``upload.empty_state`` info banner — the card itself plus the
in-card Add button cover both prompts.
- Empty state now ends after the Files card (no stats / no action
bar / no findings rendered) — matches mockup's single-section
empty view.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the remaining gaps between the live home page and the
``datatools_layout_redesign2.html`` mockup. Four pieces land
together because they all consume the same new CSS scaffold:
1. Page header (§page-header)
``st.title`` + ``st.caption`` + ``st.divider`` collapse into one
flex header: h1 + body subtitle on the left, ``Runs 100% locally``
privacy pill (success-fill + lock SVG) on the right, soft border
below. The "Runs 100% locally" phrase moved out of
``home.caption`` into the new ``home.privacy_pill`` i18n key
(en + es).
2. Files card (§files-card)
The "Imported files" list is now a single bordered card with a
section head (count + KB total on the right, mockup §section-head).
Each row renders a 28px accent-fill chip carrying the inline
document SVG, a mono filename, a right-aligned mono size, and a
compact ``✕`` button. The word-button ``Remove`` is gone —
replaced by an icon-only tertiary button styled via a new CSS
rule that goes transparent → danger-fill on hover (mockup
§file-remove).
3. Action bar (§action-bar)
Three buttons in one row: ``Run analysis`` (primary ink), a new
disabled ``Export report`` (secondary; coming soon, tooltip), and
``Clear results``. New i18n key ``upload.export_report``.
4. Findings — per-file group cards (§finding-group)
``render_findings_panel`` rewritten end-to-end. Output is now:
• A head row (``dt-finding-group-head``) bleeding to the card
edges: worst-severity dot · mono filename · count pills
enumerating non-zero severities (e.g. ``2 info`` blue,
``1 warning`` amber, ``1 error`` rose).
• A flat list of finding rows sorted error → warn → info.
Each row: tinted Material-icon chip + title (description
with optional ``<code>`` column chip) + mono meta line
(rows affected, samples captured) + tertiary
``Open <Tool> →`` action button that ``st.switch_page``s
to the relevant tool.
The previous tool-grouped expander stack is dropped — the new
layout is denser and matches the mockup's single-card-per-file
structure.
``_render_one_finding`` (the old per-finding helper that emitted
markdown lines + sample tables) remains in the file but is no
longer called from the home flow; left in place for any other
surface that still depends on the markdown style.
The "no issues" success state renders a green dot + mono
filename + ``no issues`` success pill in the same card chrome,
so empty-result files visually match the rest of the panel
rather than getting a generic ``st.success`` callout.
CSS additions (``_DESIGN_TOKENS_CSS``):
``.dt-page-header / .dt-page-subtitle / .dt-privacy-pill``
``.dt-files-section-head / .dt-section-meta``
``.dt-file-row / .dt-file-icon-chip / .dt-file-name / .dt-file-size``
``.dt-finding-group-head / .dt-severity-dot{.warn,.info,.error,.success}``
``.dt-group-filename / .dt-group-counts``
``.dt-count-pill{.warn,.info,.error,.success}``
``.dt-finding-row / .dt-finding-icon{.warn,.info,.error}``
``.dt-finding-title / .dt-finding-meta``
Tertiary button rule (transparent → danger-fill on hover) for
the X button and the ``Open Tool →`` row action.
theme.py:
Explicitly loads Material Symbols Outlined alongside Geist —
the severity-chip ligatures (``info`` / ``warning`` / ``error``)
need the font present even when no ``:material/`` token has been
emitted yet on the page. Tightened ``.dt-finding-icon .dt-mui``
selector with ``[data-testid="stMarkdownContainer"]``-scoped
variant so the Material font wins over theme.py's base
``var(--font-sans) !important`` on markdown descendants.
Leading section-heading emojis stripped from i18n
(``upload.heading``) for parity with the mockup's clean ``Files``
/ ``Findings`` h2s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two pieces of the mockup 2 layout that hadn't landed yet:
1. Sidebar nav icons — emoji glyphs (🧹✂️🔍 …) swapped for
Streamlit's ``:material/<name>:`` syntax, picking the outline
Material Symbol that best matches each mockup SVG:
Home → :material/home:
Fix Missing Values → :material/help_outline:
Find Unusual Vals → :material/insights:
Clean Text → :material/text_format:
Standardize Fmts → :material/format_list_bulleted:
Find Duplicates → :material/search:
Quality Check → :material/check_circle:
Map Columns → :material/view_column:
Combine Files → :material/account_tree:
Auto Workflows → :material/auto_awesome:
Activate → :material/key:
Close → :material/close:
Streamlit injects the icon name as a literal ligature inside a
first-child ``<span>`` of the nav anchor, expected to render
through the Material Symbols font. theme.py's base rule was
forcing Geist on every span under ``stSidebarNav``, turning the
ligatures back into plain text labels — added a structural
exception that targets ``[data-testid="stSidebarNavLink"] >
span:first-child`` (and any descendant), restoring the Material
font family, neutralizing the inherited ``ss01/cv01/cv11``
feature settings, and sizing to 18px.
Also stripped the leading emojis from every page title in the
en/es i18n packs (``home.title``, ``close_page.title``,
``activation.title``, ``tools.*.page_title``) — the icons live
in the sidebar now, the page H1 no longer needs to carry one.
2. Stats overview on home — new ``_render_stats_overview`` in
_home.py emits a 4-card grid above the per-file findings panels:
Files analyzed, Total findings, Warnings (severity ``warn`` ∪
``error``), Info (severity ``info``). Card layout follows the
mockup §stats verbatim — Geist 28px / 600 / -0.03em for the
numeric value (the "Display number" row in spec §4), tiny
uppercase tracked label, paper-surface card with the standard
warm border + faint shadow. The Warnings / Info cards tint the
number with ``--warn`` / ``--info`` when the count is non-zero.
CSS for ``.dt-stats / .dt-stat / .dt-stat-label / .dt-stat-value /
.dt-stat-unit`` added to ``_DESIGN_TOKENS_CSS``; falls to a
2-column grid below 900px viewport, matching the mockup's media
query.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switches the type system to the single-family Geist spec referenced
in ``Business/DataTools/geist_spec.md`` and the matching
``datatools_layout_redesign2.html`` mockup. Editorial-serif headings
are out; the product now reads as modern SaaS-tool typography per
the spec's positioning note (§10).
src/gui/theme.py (new)
Implements geist_spec.md §3 verbatim — preconnect + Google Fonts
link for Geist (400/500/600/700) and Geist Mono (400/500), the
canonical ``:root`` token table (§7) plus severity extensions,
and the type scale (§4): h1 32/600/-0.035em, h2 22/600/-0.025em,
h3 18/500/-0.018em, h4 15/500/-0.012em, body 14/400, caption
12.5/400, mono 0.92× ss02. ``apply_theme()`` is the single entry
point.
Two deviations from the spec, both anticipated by spec §6.1:
- ``font-family: var(--font-sans) !important`` on the base rule.
Streamlit applies ``font-family: "Source Sans"`` directly to
``[data-testid="stMarkdownContainer"]`` and a few widget
wrappers at equal-or-higher specificity than the spec's
selector list, so plain inheritance loses the cascade.
- The base selector list explicitly enumerates
``stSidebarNav``, ``stMarkdownContainer``, ``stVerticalBlock``
and a few siblings so Streamlit's per-widget font reset
doesn't reach descendant text.
src/gui/components/_legacy.py
- ``_DESIGN_TOKENS_CSS`` no longer redeclares fonts or the
heading rules — those are theme.py's job (spec §9 says the
spec is type-only; everything below is component chrome).
- Token references switched from ``--dt-*`` to the spec names
(``--ink``, ``--bg``, ``--surface``, ``--border``, ``--accent``,
``--font-sans``, ``--font-mono``, …).
- Sidebar section-label rule tightened to 11.5px / 500 to match
the "Eyebrow" row in spec §4.
- Primary-button text color now also targets every descendant
(``button[kind="primary"] *``) so the inner
``stMarkdownContainer > p`` doesn't pick up
``color: var(--ink)`` from the base rule and render
near-invisible ink-on-ink.
- ``hide_streamlit_chrome`` now calls ``apply_theme`` before
injecting component CSS so the base tokens are defined first.
Acceptance criteria from spec §8 verified at 1920×1050:
- h1 computes ``font-family: Geist``, ``font-weight: 600``,
``letter-spacing: -1.12px`` (= 32px × -0.035em), size ``32px``.
- Body ``<p>`` inside ``stMarkdownContainer``: Geist 400 / 14px.
- Caption: Geist 400 / 12.5px.
- Inline mono filenames: Geist Mono in accent-fill chip.
- No Source Sans Pro leaks into any text the user reads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DataTools is local-first — "Upload" reads like "send data somewhere
remote", which contradicts the product positioning. Sweep replaces
the user-visible term throughout the UI:
- ``src/i18n/packs/en.json`` + ``es.json``: all ``upload.*`` strings
(heading, intro, uploader labels, empty state, switch-back, etc.)
and ``gate.default_name``. The ``intro_multi`` "no upload anywhere"
phrasing dropped the verb entirely — now reads "nothing leaves
this computer".
- All 9 tool pages: ``st.file_uploader(label="Upload …")`` →
``"Import …"``; matching ``st.info("Upload a …")`` empty-state
banners; ``help="Upload …"`` strings on disabled uploaders.
- ``9_Pipeline_Runner`` + ``5_Column_Mapper``: radio-option text
``"Upload schema/pipeline JSON"`` → ``"Import …"`` plus the
``.startswith("Upload")`` branch guards that read those values.
- ``_home.py``: "**Uploaded files**" → "**Imported files**".
- ``app_demo.py``: "Uploaded file is …" → "Imported file is …".
Internal identifiers left untouched: function names
(``pickup_or_upload``, ``_StashedUpload``), session-state keys
(``home_upload``, ``home_uploads``, ``home_uploaded_*``,
``merger_file_upload``), audit-log event category (``"upload"``),
Streamlit testid CSS selectors. None of those are visible to the
user.
The file_uploader's dropzone button text is a baked-in React
literal that Streamlit's ``label=`` doesn't reach; rewritten at the
DOM level with a small ``_RENAME_UPLOAD_BUTTON_JS`` snippet shipped
through ``st.iframe`` (same pattern the sticky footer uses to mount
on ``<body>``). A ``MutationObserver`` on the parent document re-
applies the swap when Streamlit remounts the dropzone after file
add/remove or page navigation, throttled via ``requestAnimationFrame``.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After upload, two near-identical file lists were shown stacked:
Streamlit's built-in compact chip row inside the dropzone (icon +
``messy_sales.csv`` + size) and the home page's own "Uploaded files"
section beneath it (filename + Remove button). User flagged the
duplication.
Hide ``[data-testid="stFileChip"]`` and its first-child wrapper so
the chip row collapses; the dropzone's borderless ``+`` button is
preserved as the "add more files" affordance, and our "Uploaded
files" list is now the single source of truth visually.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lifts ideas from the ``datatools_layout_redesign.html`` mockup
(artistic licence, not literal). Two changes:
1. ``.streamlit/config.toml`` ``[theme]`` block — cream paper bg
(#fafaf7), warm sidebar (#f5f4ef), stone ink (#1c1917), burnt
orange primary (#c2410c). Streamlit threads these through its
chrome (focus rings, file-uploader accents, link colors).
2. ``_DESIGN_TOKENS_CSS`` injected by ``hide_streamlit_chrome`` on
every page. Imports Fraunces (display serif), Geist (body sans),
Geist Mono. Restyles, scoped through ``--dt-*`` custom properties:
- Page surface + sidebar — warm cream backgrounds, soft warm
borders, no harsh white.
- Sidebar nav — section labels in tiny uppercase tracking, nav
items with soft hover, active item as a white pill with subtle
shadow.
- Typography — H1/H2/H3 in Fraunces with tightened tracking;
body Geist; inline code Geist Mono with orange-on-cream chip.
- Buttons — primary = dark ink (``#1c1917``) with white text;
secondary = paper surface with warm border; disabled = muted
cream.
- Containers / expanders — editorial cards: 14px radius, 1px
warm border, faint shadow, warm-cream summary headers.
- File uploader — cream dropzone with dashed border + per-file
paper chips.
- Alerts — soft tinted fills (info=sky, success=mint, warn=amber,
error=rose) over the kind-specific palette.
- Inputs, tabs, dataframes — paper surfaces with rounded warm
borders.
Verified at 1920x1050 + 1400x900 on home page (empty + with file
uploaded + with findings rendered) and Clean Text tool page; no
regressions in the white-bar fix from 65b663b.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User screenshot pinned the actual culprit: a horizontal white band
across the FULL viewport width (including over the sidebar) above
the Help/Close footer. Diagnosis:
- ``.stApp`` carries ``zoom: 0.85``, so any descendant sized at
``100vh`` only renders at ~85vh visually.
- At 1920x1050 the visual end of ``.stApp`` is around y=893; the
fixed footer overlays y=1017..1050; the strip in between (124px
at this resolution) is ``body`` painting white through, because
``.stApp``, ``stSidebar`` and ``stMain`` are all shorter than
the viewport.
- The previous "min-height: 100vh/0.85" rule targeted the legacy
``data-testid="stAppViewBlockContainer"``. The current Streamlit
release renamed that testid to ``stMainBlockContainer`` — so the
rule was a no-op for months. Verified the new testid by walking
the live DOM.
Fix: stretch ``.stApp``, ``[data-testid="stSidebar"]`` and
``[data-testid="stMain"]`` with ``min-height: calc(100vh / 0.85)``
so they fill the visible viewport. Keep the block-container's 2rem
``padding-bottom`` (now matching both the new and legacy testids in
case Streamlit rolls it back).
Verified at 1920x1050: sidebar gray extends to y=1050, content area
extends to y=1050, footer overlays the bottom 33px, no white band
between content and footer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "white bar" was the footer's near-white background painting
over the bottom of the sidebar. The footer is fixed at body level
with ``left: 0; right: 0`` so it spans the full viewport — its
``rgba(255, 255, 255, 0.97)`` background renders as essentially
white over the sidebar's ``rgb(240, 242, 246)`` gray, producing a
visibly different strip at the bottom of the sidebar (this is what
the diagnostic GREEN tint marked as ``stAppViewContainer``-shaped
because that is the element directly behind it).
Pixel-sampled the bottom row to confirm:
y=860 over sidebar → (240, 242, 246) (gray)
y=870 over sidebar → (255, 255, 255) (footer-painted white)
Fix: in the iframe JS that mounts the footer on ``<body>``, measure
``[data-testid="stSidebar"].getBoundingClientRect().right`` and set
the footer's (and help popover's) ``left`` to that offset with
``setProperty(..., 'important')`` so it beats the ``left:0!important``
fallback in CSS. A ``ResizeObserver`` on the sidebar plus a
``window.resize`` listener keep the offset in sync when the sidebar
collapses or expands.
Sidebar collapsed (width 0 or off-screen) clamps to 0 → footer goes
flush-left as before. Also dropped the no-op ``min-height`` on the
view container from the previous attempt; ``stAppViewContainer`` is
transparent, so stretching it never painted anything.
Verified by injecting the same offset on the live page: bottom row
at y=890 is now ``(240,242,246)`` over the sidebar and only turns
white at x=255 where the content area begins.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
``use_container_width`` is being removed after 2025-12-31. Streamlit
log was flooding the terminal with the deprecation notice on every
rerun. Mechanical sweep:
use_container_width=True → width="stretch"
use_container_width=False → width="content"
51 call sites across 11 page files + ``app_demo.py``. Also renamed
the ``local_download_button`` helper's ``use_container_width`` kwarg
to ``width`` (default ``"stretch"``); it has no external callers
passing the old name, so this is a safe rename.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Color-tag diagnostic confirmed the bottom-of-viewport strip was
painted by ``stAppViewContainer`` (it showed GREEN), not by the
block container as the previous two attempts assumed. ``.stApp``
has ``zoom: 0.85`` so 100vh visually renders at 85% — apply
``min-height: calc(100vh / 0.85)`` to the view container itself so
it spans the full visible viewport and there is no gap for its own
background to leak through as a "white bar". Reverts the diagnostic
tints (RED/BLUE/GREEN/GOLD); keeps the 2rem block-container
padding-bottom that reserves room for the fixed footer overlay.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Option 2 (stretching the block container with ``min-height``) did
not close the white gap. Either the rule isn't applying, or the
block container isn't the element that fills the visible bottom of
the page. Tint every plausible container so the eye can tell us
instantly which one paints the bar:
- RED ``stAppViewBlockContainer`` (still has min-height applied)
- BLUE ``stMain`` / ``section[stMain]`` (with its own min-height)
- GREEN ``stAppViewContainer``
- GOLD ``.stApp`` (zoomed)
User reload + report which color shows where the "white bar"
previously was — that names the target.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Option 1 (tightening ``padding-bottom`` from 3rem to 2rem) did not
eliminate the gap. The remaining gap is ``.stApp``'s solid white
background showing through the area below the block container's
natural (content-sized) bottom edge — visible because the home
page's content is shorter than the viewport.
Stretch the block container with ``min-height: calc(100vh / 0.85)``
so the container itself fills the visible viewport. Now the area
between the last finding card and the fixed footer is the block
container's own background, not ``.stApp`` showing through —
visually continuous with the content above.
The ``/0.85`` compensates for ``.stApp { zoom: 0.85 }`` (defined in
``_HIDE_CHROME_CSS``): inside a zoomed container, ``100vh`` renders
at 85% of true viewport height, leaving a 15% gap if used raw.
``box-sizing: border-box`` keeps the 2rem padding part of the
total height instead of stacking onto it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Diagnostics confirmed the "white bar" the user has been describing is
not a separate element — it's ``[data-testid=stApp]``'s solid white
background (``rgb(255,255,255)``, viewport-locked) showing through the
gap between where page content ends and where the fixed Help/Close
footer overlay begins. ``stApp`` stays put while content scrolls
inside it, which is why the bar "doesn't change when scrolling".
The gap exists because ``render_sticky_footer`` overrides the block
container's ``padding-bottom`` to ``3rem`` (48px) to reserve clear
room for the fixed footer. The footer is only ~32-33px tall (min-
height 32px + 0.25rem top/bottom padding), so ~16px of that reserve
was pure visible white space sitting above the buttons.
Reduce ``padding-bottom`` to ``2rem`` (~32px) — just enough to
prevent content from rendering under the footer overlay, no more.
Eliminates the visible gap without exposing text to clipping.
Also remove the diagnostic banner + click-to-inspect iframe from
the home page now that the bar is identified.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User reported the previous diagnostic was too cluttered to read,
and the white bar showed no outline anyway — meaning the flat
``querySelectorAll('body *')`` walker missed it (likely inside an
iframe's contentDocument, which the script didn't recurse into).
New approach: a single red button "CLAUDE: click here, then click
the white bar" in the top-right. Clicking the button arms an
inspect handler. The next click anywhere on the page reports the
full element stack at that point via ``elementsFromPoint`` AND
recursively descends into any same-origin iframe at the click
location, so iframe contents are no longer invisible.
A black report panel lists every element in the stack with its
tag/id/testid/class, position, z-index, background color, and
bounding rect — TOP element highlighted in red. User clicks the
white bar exactly once and we know what it is.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous diagnostic only outlined fixed/sticky elements; user
confirmed the offending white bar isn't one of those. Cast a much
wider net:
- Outline every element whose visible rect intersects the bottom
200px of the viewport, regardless of position.
- Border style encodes position: solid=fixed, dashed=sticky,
dotted=absolute, thin=static/relative.
- Render a readable list in a top-right panel showing each element's
tag/id/testid/class, position, z-index, height, and background.
- Skip fully transparent + un-positioned elements (those can't
actually overlay anything).
With this, scroll to the bottom and the panel + colored outlines
will identify exactly which element is the white bar — fixed or
not. The user can paste the panel list (or just name the colored
box) so we know what to remove.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User reports: TEST #3 marker sits at the true bottom of the home
page's main content, but when scrolled the test text "goes behind"
an opaque white bar — confirming the bar is fixed/sticky (overlays
scrolling content). Our CSS only declares ONE fixed element near
the bottom (``#datatools-sticky-footer``), which the user already
ruled out. So something else — Streamlit native chrome, a third-
party widget, or a fixed element we haven't enumerated — is
overlaying the content.
Inject a small diagnostic iframe whose JS, running against the
parent document, walks every element on the page and outlines each
``position: fixed`` or ``position: sticky`` node with a distinct
color + a top-left label showing ``tagName#id[data-testid] pos=…
h=…px bg=…``. Re-runs after initial paint, on a couple of delays
(for late-mounting components), and on every scroll.
This is read-only — no DOM mutations beyond outline styles and
labels — so it's safe to ship even if I miss removing it.
The user can now visually identify which colored box is the
offending white bar and report its label.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User reported the previous TEST #2 banner appeared at the *top* of
the main content area instead of the bottom. Root cause: on the home
page, ``render_sticky_footer()`` is called at line 107 — before
``st.title()`` — so anything that function injects in document flow
lands at the top of ``stAppViewBlockContainer``. Other pages call
``render_sticky_footer()`` at the end of their script, so the flow
content lands at the bottom there.
Remove the marker from ``render_sticky_footer`` and add it directly
at the very end of ``_home._home_page()`` — after the findings
panels. If this banner lines up with the offending white strip when
scrolled to the bottom, the strip is something rendered at the tail
of the page (likely an iframe wrapper from ``render_findings_panel``
or the block container's ``padding-bottom``).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User confirmed the previous marker landed inside the Help/Close
sticky footer — which is NOT the offending white bar. They want the
sticky footer kept; the white strip they're trying to remove sits
*above* the footer in the main content area.
Move the marker out of ``#datatools-sticky-footer`` and render it
via ``st.markdown`` immediately before the ``st.iframe`` call that
injects the footer. That places it at the very bottom of
``stAppViewBlockContainer`` — exactly where the iframe wrapper
(``stElementContainer``) and the block container's
``padding-bottom: 3rem`` reservation live.
Styled as a red dashed banner so it's unmistakable. If it lines up
with the white strip clipping text on scroll, one of those two is
the culprit and the next commit can target it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The user reports a "white bar/box" at the bottom of the main content
area that clips text when scrolling. The DOM inspector found only one
fixed-position white element near the viewport bottom —
``#datatools-sticky-footer`` (bg ``rgba(255,255,255,0.97)``,
~33px tall) — so this is my best candidate for what they're seeing.
Append a red marker span "◀ CLAUDE TEST: is this the white bar you
want removed? ▶" inside the footer div so the user can visually
confirm. If the text shows up where they see the offending white
bar, the footer is the right target; if the bar is somewhere else,
this confirms it's a different element.
Temporary — to be reverted in the next commit either way.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reported: on the Home page after uploading data files, the Remove
buttons "on the right side" did nothing — the file kept showing up in
the list. That was the file_uploader widget's BUILT-IN ✕ icons (the
ones inside the uploader's chrome, on the right of each file row),
not our custom "Remove" buttons further down — the custom ones have
worked correctly since 84e4665.
Cause: ``_home_page`` deliberately treated the widget as add-only and
never honored widget-side removals. The reasoning, per the prior
comment, was that navigation can remount the widget with value ``[]``
— a render-time sync would then wipe ``home_uploads``. Real, but the
side effect was that the widget's own ✕ appeared to do nothing: the
file vanished from the widget chrome, stayed in ``home_uploads``, and
re-rendered immediately in the custom list below.
Fix: hook the file_uploader's ``on_change`` callback to reconcile
``home_uploads`` against the widget's current value. Streamlit's
``on_change`` fires ONLY on user-initiated value changes; the
remount-induced ``[]`` reset doesn't trigger it, so the stash still
survives navigation. Removals from the callback also drop the file's
findings entry and clear the singular ``home_uploaded_*`` keys when
the active upload was removed — matching the custom-button path.
The custom "Remove" buttons further down keep working unchanged; the
existing AppTest path through ``_home_remove_<sha1>`` still removes
exactly the file clicked. 2220 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same wrong-testid bug as the Close click handler: the CSS rule
that's supposed to position the hidden ``st.page_link`` off-screen
was selecting ``a[data-testid="stPageLink"]``, but the bare
``stPageLink`` testid is on the OUTER wrapper div — the anchor
uses ``stPageLink-NavLink``. ``:has(a[data-testid="stPageLink"]...)``
matched nothing, so the helper rendered as a full-size visible
row at the bottom of every page (the "large white bar blocking
content" the user reported).
Fix: switch both the ``:has()`` rule and the no-:has() fallback
to ``a[data-testid="stPageLink-NavLink"][href*="close"]``. The
``href*="close"`` form also works for base-path deployments
(``/myapp/close``), matching the click handler's selector.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs combined to make the footer Close a no-op:
1. The helper page_link's anchor carries
``data-testid="stPageLink-NavLink"`` — the bare
``stPageLink`` testid is on the OUTER WRAPPER div, not the
anchor. The old selector ``a[data-testid="stPageLink"]``
matched nothing, so ``helper`` was always ``null``.
2. The fallback ``window.location.href = './close'`` ran inside
the component iframe, so it only navigated the (invisible)
srcdoc iframe. The main app stayed put.
End result: click → nothing visible → shutdown_app never runs →
farewell-script's ``window.close()`` attempt never happens →
user sees the Close button as broken.
Fixes:
- Selector → ``a[data-testid="stPageLink-NavLink"][href*="close"]``.
``href*="close"`` covers both root (/close) and base-path
(/myapp/close) deployments.
- Fallback → resolve the parent window via
``doc.defaultView`` (the parent doc's window) with a
``window.top`` fallback, so the hard-nav navigates the whole
app instead of just the iframe.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Streamlit logs a deprecation notice on every render:
Please replace ``st.components.v1.html`` with ``st.iframe``.
``st.components.v1.html`` will be removed after 2026-06-01.
Replace all 9 call sites (6 tool pages + 3 in ``_legacy.py``).
Both APIs feed ``srcdoc`` to the underlying iframe so the
HTML/JS payload and the cross-frame DOM access pattern
(``window.parent.document``) are unchanged.
``st.iframe`` rejects ``height=0`` (raises ``StreamlitInvalid
HeightError``), so bump every zero-height call to ``height=1``.
1px is effectively invisible — these are script-only iframes, no
visible payload — and avoids the validator.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Footer Close was using ``<a href="./close">`` which triggers a
browser hard-nav. That's a visible page-reload flash, websocket
churn, and slower shutdown than the previous sidebar Close —
which used ``st.navigation``'s soft nav.
Restore the soft-nav path:
- ``render_sticky_footer`` now renders a hidden ``st.page_link``
pointing at ``pages/99_Close.py``. Positioned off-screen via
CSS (``stElementContainer:has(a[data-testid=stPageLink]
[href$=/close])``) so it occupies no layout space but stays in
the DOM, reachable + clickable.
- Footer's Close <button> click handler now dispatches a
programmatic click on that hidden page_link. Streamlit's React
handler picks it up and runs the soft nav (same code path the
old sidebar entry used). Falls back to ``window.location.href``
if the helper link hasn't rendered yet so the button is never
a no-op.
- The page_link call is wrapped in try/except: ``AppTest`` doesn't
populate the page-nav session keys it needs and raises
``KeyError('url_pathname')``. Failure costs only the soft-nav
optimization — Close still works via the hard-nav fallback.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-ups to the prior sidebar/footer cleanup:
- The "_hidden" section header was still visible in the sidebar
because Streamlit renders ``stNavSectionHeader`` as a sibling of
``stNavSection``, not a child — so the ``:has()`` rule on the
section was hiding the items list but leaving the header
(and its collapse/drilldown marker) behind. Move Activate +
Close into the unlabeled section (key ``""``) alongside Home so
there is no header to leak in the first place, then hide just
the two links via ``stSidebarNavLinkContainer:has(...)`` (with
a defensive ``a[href$=...]`` fallback for browsers without
``:has()`` support).
- The sticky footer was missing on ``pages/_Activate.py`` because
the page never called ``render_sticky_footer`` — added the
call so the Help / Close bar persists when the user follows
the popover's Activate / Manage link.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Collapse the Account section: Activate now lives in the same
hidden sidebar section as Close (single ``_hidden`` group). Both
pages stay registered with ``st.navigation`` so /activate and
/close remain URL-routable for the Help-popover / Close-button
links — only the sidebar entries + their section header are
hidden via CSS.
- Help popover always exposes a license-management link now:
``Activate now →`` when the license is inactive, ``Manage
license →`` when it is active and valid. Both point at
``./activate``.
- Extend the sidebar-hide CSS to also match ``a[href$="/activate"]``
and the section that contains it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Bump version to 3.0 (src/__init__.py).
- Switch support address to support@unalogix.com.
- Help popover now includes a License section that reads
``src.license.current_state()``:
* When activated + valid: name + expiry date + days remaining.
* Otherwise: "Not activated" + an ``Activate now →`` link
pointing at ``./activate``.
License-state queries are wrapped so a corrupted license file
can't take the footer down — it falls through to the inactive
branch.
- Popover HTML is now built in Python (so the license branch
lives in one place) and passed to the JS as a single string.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The sticky footer was only wired into the 9 tool pages — the home
page (``_home.py``) called ``hide_streamlit_chrome`` but never
``render_sticky_footer``, so the app-level Close+Help bar was
missing whenever the user was on the home page. Add the call.
Also drop the home page's now-redundant trailing
``st.divider() + st.caption(t("chrome.footer"))`` block — same
"blank white bar above the sticky footer" symptom that motivated
removing the per-page version from the tool pages.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three small follow-ups to the sticky-footer rework:
- Left-justify the footer buttons (and reposition the Help popover
to anchor at the left edge so it lines up with its trigger).
- Remove the per-page ``st.divider() + st.caption("Runs locally…")``
trailing block from all 9 tool pages. The new sticky footer
covers that text, so it was rendering as an empty white bar at
the bottom of each tool page.
- Hide the Close entry from the sidebar nav via CSS. The page stays
registered with st.navigation so /close is still routable for the
sticky-footer Close button — only the sidebar link + its section
header are hidden (via :has() on stNavSection).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The duplicate full-width Back-to-Home button at the bottom of every
tool page was reading as a "huge footer." Replace it with a real
slim sticky footer holding two controls:
- Close: <a href="./close"> to the Close page (which shuts down).
Full-page nav is fine here — the process is terminating, so the
session-state-loss concern that retired the previous sticky
footer doesn't apply.
- Help: JS-toggled popover showing version + support@datatools.app.
No navigation, no state loss.
Top-of-page Back-to-Home stays (uses st.switch_page, preserves
state). Add footer.* i18n keys for en + es.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User pulled d9e32e5 (async-writer audit log + re-enabled diagnostics
sidebar) and still sees blank pages. The synchronous-write theory
from the previous round was at most a partial explanation; something
ELSE in the audit-log code path is also taking the page render down
on the user's machine.
Restore the kill switch so the user has a working app while we
diagnose:
- ``src/audit.py``: ``_DISABLED = True`` re-introduced at module
top, each of ``log_event`` / ``log_session_start`` /
``log_page_open`` / ``flush_audit_log`` early-returns. The async
writer thread is never started.
- ``hide_streamlit_chrome``: ``_render_diagnostics_sidebar()`` call
re-gated behind ``if False:``.
The async writer code stays in place — easier to flip the flag back
when we identify the real cause than to rewrite a third time. The
shutdown-flush call in ``shutdown_app`` also stays; it early-returns
on the kill switch and is harmless.
Diagnostic plan for the next session: ask the user for the launcher
terminal output (the new stderr "DataTools audit: writes failing..."
message would tell us if the writer thread DID start and DID fail),
and whether ``~/.datatools/logs/`` is being created at all.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reported earlier: synchronous file writes in ``log_event`` blocked
the GUI render thread on hostile filesystems (Windows antivirus on
``~/.datatools/logs/`` is the prime suspect). A blocking ``open``
call doesn't raise — try/except can't recover from it — so the
only safe re-enable is to take file I/O off the render path.
Refactor:
- ``log_event`` and friends push events onto a ``deque(maxlen=5000)``
via ``put_nowait`` and return in microseconds.
- A single daemon thread (``datatools-audit-writer``) drains the
queue and writes batches. Holds the queue lock only long enough to
snapshot + clear, then does I/O outside the lock so producers can
keep enqueueing.
- ``audit_log_path()`` is now pure path arithmetic — no ``mkdir``
no ``open``. The writer thread does the directory creation off
the request path, so any hang there only affects the writer.
- Bounded queue means an unwritable disk doesn't unbounded-grow
memory; the queue caps at 5000 and overflow drops OLDEST events
so the most-recent (most-diagnostic) ones survive.
- First write failure prints once to stderr; subsequent failures
are silent so logs don't drown the launcher terminal.
- ``flush_audit_log(timeout_s=0.5)`` drains the queue and signals
the writer to exit; bounded so a stuck disk can't delay shutdown.
Other changes in this commit:
- ``shutdown_app`` now emits a "Session ending" event and calls
``flush_audit_log`` before kicking the os._exit timer, so the
closing session's events make it to disk.
- The Diagnostics sidebar in ``hide_streamlit_chrome`` is
re-enabled (the ``if False:`` gate is removed). Wrapped in
try/except defensively — render errors print to stderr, never
blank the page.
- ``_DISABLED`` kill-switch is gone. The async design IS the
safety mechanism now.
Tests in ``tests/test_audit.py``:
- log_event burst of 1000 events completes in well under 1s
(proves non-blocking).
- Events queued before flush land on disk with the expected JSON
shape; session_start renders; idempotent.
- Pointing the audit dir at a file (so mkdir fails) doesn't hang
or crash the producer.
- Non-JSON extras are str()-coerced rather than dropped.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reported: after the sticky-footer href fix (be7191a) the back-to-home
click worked but the home-page upload list disappeared. Full-page
navigation via ``<a href>`` doesn't preserve ``st.session_state`` on
the user's Streamlit build.
Trade-off forced: pick visible-from-anywhere sticky footer OR state
preservation. Can't have both because ``st.switch_page`` (soft nav,
preserves state) needs a real Streamlit button widget, and Streamlit
widgets can't be reliably CSS-positioned to the viewport bottom —
Streamlit owns the widget DOM and remounts it on every rerun.
State preservation wins. Going back to the pre-sticky design:
- ``render_sticky_footer()`` becomes a no-op shim. Kept as a callable
so the call sites in every tool page don't have to be touched in
this commit; the original implementation is preserved as
``_render_sticky_footer_DISABLED`` if we ever decide to revisit.
- Every Ready/Coming-Soon tool page (1-9) gets ``back_to_home_link()``
reinstated near the top of the page (visible at scroll-top) AND
``back_to_home_link(key="_back_to_home_link_bottom")`` reinstated
near the bottom of the page (visible at scroll-bottom). Both
instances call ``st.switch_page`` via the existing helper — soft
nav, no full reload, ``st.session_state["home_uploads"]`` and
every other session-state key survive.
User trades the "always-visible while scrolling" sticky behavior for
the upload-list-survives-navigation behavior. The two-button pattern
(top + bottom) was what we had before the sticky-footer experiment;
on short pages both are visible at once, on long pages the user has
one in reach at either end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reported: clicking Back to Home in the sticky footer surfaced
Streamlit's "Page not found — Running the app's main page" message
in the user's build.
Root cause: ``url_path="home"`` on the home page's ``st.Page``
registration is treated as an alias for the default page in some
Streamlit minor versions, but the user's build doesn't honour the
alias for the page that ALSO has ``default=True``. The default page
is served at the root URL ``/``; ``/home`` is treated as a missing
page on that build.
Switch the footer anchor's href from ``"home"`` (which resolved to
``/home`` from any tool-page URL) to ``"./"`` (resolves to the
current document's directory, which on a single-segment URL is the
server root → default page → Home). Robust across Streamlit minor
versions regardless of how the url_path alias is interpreted.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User confirmed: with the audit-log kill switch (1caedbb) in place,
pages render. So the hang was 100% in the audit-log file writes —
``open()`` blocking on Windows somewhere — not in the chrome
additions disabled during bisection.
Two of those three additions are pure UI and have no filesystem
exposure, so they're safe to re-enable now:
- **Sticky footer**: pure CSS + a components-html iframe whose JS
appends a div to ``parent.document.body``. No disk touch. The
user just reported losing the Back-to-Home button to the
bisection commit — restoring this brings it back.
- **Compact-spacing CSS layer**: gap reductions on stVerticalBlock
/ stHorizontalBlock, slim heading margins, slim hr / caption /
expander / button / metric padding. Pure CSS.
What stays disabled:
- **Audit-log writes** (``src/audit.py:_DISABLED = True``). Any
resumption needs an async-write design with a hard timeout so a
stuck filesystem can't hang the GUI render.
- **Diagnostics sidebar**: it calls ``audit_log_path()`` which
itself does a ``mkdir()`` — and a hanging mkdir would re-introduce
the same blank-pages symptom. Will re-enable once the audit log
is rewritten not to block.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reported: uploading multiple files on the home page and clicking Run
analysis blew up with
StreamlitDuplicateElementKey: key='_findings_open_02_text_cleaner'
when two uploaded files both had Clean Text findings.
Root cause: ``render_findings_panel`` is invoked once per uploaded
file from ``_home.py``, but the per-tool jump button used a
filename-agnostic key:
key=f"_findings_open_{tool_id}"
Two files both flagging Clean Text → two buttons with identical keys
→ Streamlit rejects the second one.
Fix:
- Add ``key_namespace: str = ""`` to ``render_findings_panel``. The
helper hashes it (sha1 truncated to 8 chars) and appends to every
button key, so different namespaces produce different keys but the
same namespace stays stable across reruns.
- The home page now passes the filename:
``render_findings_panel(findings, header=f"📄 {name}", key_namespace=name)``.
- The single-call site in ``upload_and_analyze_section`` (the legacy
helper, only used outside the new home-page path) keeps the default
empty namespace, which is fine because that path renders findings
for ONE file at a time.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reported: bisection commit c0bfd4d that disabled the sticky footer,
diagnostics sidebar, and compact-CSS didn't fix the blank-page
symptom. User adds that Ctrl+C also can't kill the launcher.
Ctrl+C-doesn't-work + every-page-blank together points at a hang in
the Python process, not an exception. The most likely hang point in
the chrome path is the audit log's file I/O — ``open()`` inside the
``with`` block in ``log_event`` blocks on a stuck filesystem (Windows
antivirus quarantining ``~/.datatools/logs/datatools-*.jsonl`` on
every write is a plausible culprit on the user's machine). A blocking
``open`` call does NOT raise — try/except can't recover from it —
which is why our prior defensive wrap didn't help.
Add a module-level ``_DISABLED = True`` kill switch. ``log_event``,
``log_session_start``, and ``log_page_open`` each early-return at
the very top of the function when the flag is set, before any
file-system call. Path resolution (``audit_log_path``) still works
since it's needed for the diagnostics sidebar (still disabled in
c0bfd4d, but kept harmless).
If pages render after this commit, file I/O from the audit log is
confirmed as the culprit; we'll redesign with an async writer
queue and a tighter timeout. If they still don't, the cause is
somewhere we haven't bisected yet and we move to a hard revert.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reported: every page renders empty in the main body even after the
audit-log defensive-wrap commit (59c6d0f). Close button also doesn't
trigger shutdown — that page is blank too. Sidebar nav still renders,
so the chrome path that runs on every page is the suspect.
Three chrome additions land all at once and are temporarily turned
off so the user can see whether bare chrome restores rendering:
1. **Sticky footer (``render_sticky_footer``)**: short-circuited with
``return`` at the top of the function. The CSS-injection +
components-html iframe mechanic is the highest-suspicion item —
if the iframe script throws or the CSS interacts badly with the
user's Streamlit / Python build, the side effects can be
page-killing on theirs while invisible on ours. The original body
is preserved as ``_render_sticky_footer_DISABLED`` so re-enabling
is a one-line change.
2. **Diagnostics sidebar (``_render_diagnostics_sidebar``)**: call
site in ``hide_streamlit_chrome`` is gated by ``if False:``.
Wrapping in try/except (the previous commit) caught exceptions
but didn't help — silent partial renders inside
``with st.sidebar: with st.expander: ...`` can still leave the
render stack in a bad state on some Streamlit versions.
3. **Compact-spacing CSS layer**: the ``gap: 0.5rem !important;`` on
``stVerticalBlock`` / ``stHorizontalBlock``, the slim heading
margins, the slim hr / caption / expander / button / metric
rules — all stripped back to the pre-compact ``_HIDE_CHROME_CSS``.
The ``gap`` rule in particular is a suspect: if the user's
Streamlit version doesn't render stVerticalBlock as a flex
container, the rule is harmless; if it does and interacts badly
with overflow, content could be clipped.
What's deliberately KEPT enabled:
- The audit-log calls (already wrapped from 59c6d0f).
- ``log_page_open`` calls in tool pages (already wrapped internally).
- All UI changes pre-compact (the unified tool-page layout, the
download-button helper, etc.).
If pages render after this commit, we know it's one of the three
disabled items above and can bisect further. If they still don't
render, the cause is in code that pre-dated the audit-log work and
the bisection has to keep going.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reported: after pulling commit c73d716 (audit log) the main body of
every page showed empty. Sidebar nav still worked.
Diagnosis: the most likely path is that something inside the audit
calls — ``_render_diagnostics_sidebar()`` calling ``audit_log_path()``,
or ``log_session_start()`` itself — raises during ``hide_streamlit_chrome``
on the user's environment (Python 3.14 on Windows, a less-tested
combo than the test environment). Streamlit's script runner sees the
exception, and on some chrome paths it eats it without surfacing an
error block, leaving the page body empty.
The audit log is best-effort by design. Make that contract real:
1. ``hide_streamlit_chrome`` now wraps both ``log_session_start()``
and ``_render_diagnostics_sidebar()`` in try/except. Errors print
to stderr (so the developer running ``python -m src.gui`` sees
them in the launcher's console) but never bubble up to kill the
page render.
2. ``audit_log_path()`` already had a tempdir fallback for the
primary mkdir failure, but the SECOND mkdir wasn't protected
either. Restructured to a two-level fallback: configured dir →
tempdir → ``/dev/null`` (or ``NUL`` on Windows). The last fallback
ensures the function never raises; ``log_event``'s own try/except
handles the eventual unwritable-file case.
3. ``log_page_open(slug)`` now has an outer try/except so it cannot
raise either — protecting every tool page's render path.
If a user reports the same symptom again, the launcher terminal will
now show a real traceback explaining what's wrong, and the GUI will
still render normally.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New ``docs/FUTURE-TOOLS.md`` captures post-launch tool ideas with a
consistent shape — What / Why / Can we ship now / Approach / GUI
sketch / Effort / Risks / Ship criteria. Resting place for things
the new-tool freeze in ``PLAN.md`` §2.1 refuses to build but that
keep coming up.
First entry: **#10 PDF → CSV extractor** (bank statements et al.).
Key facts captured:
- **Current state**: no PDF infrastructure exists. Zero PDF
dependencies in requirements.txt; zero PDF-touching code under
``src/``. The only "PDF" string in the codebase is the planned-
output copy for the Quality Check tool, unrelated to extraction.
- **Library picks**: pdfplumber as the extraction core (BSD-3,
no native compiler, gives coordinate-aware text), Tesseract via
pytesseract as the OCR fallback for scanned PDFs,
streamlit-drawable-canvas as the region-picker component.
- **GUI sketch**: user draws a header strip + a row template on a
rendered page; the tool applies that template across N pages,
saves the template by layout fingerprint for next month's
statement, emits CSV.
- **Effort phased A–E**: 3–4 weeks for a text-only MVP; 6–10
weeks for a polished version with multi-page template recall;
+2–3 weeks if scanned-PDF OCR is required.
- **Difficulty**: medium-hard. The pieces are well-trodden; the
combination (region selection that persists across pages and
across documents with similar layouts) is where the engineering
goes.
- **Ship criteria**: ≥1 paying customer + ≥3 paid or ≥5 demo
emails asking for PDF extraction + the bookkeeper niche
converting at least one customer first. None have fired.
Cross-references added:
- ``docs/REQUIREMENTS.md`` §11: pointer to FUTURE-TOOLS.md for
parked tool ideas, with a one-paragraph summary of #10.
- ``docs/PLAN.md`` §2.1: notes that the freeze parks future tools
in FUTURE-TOOLS.md and explicitly names #10 as the current
highest-pressure entry.
- ``docs/NEXT-STEPS.md`` Phase 5 "what NOT to build" table: a new
row for the PDF tool tied to the same ship-trigger language.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New ``src/audit.py`` module records GUI actions to a per-session
JSONL file under ``~/.datatools/logs/`` (overrideable via
``DATATOOLS_AUDIT_DIR``). The file is human-readable (one JSON
object per line, each with a ``message`` field) AND trivially
machine-parseable — the support flow is "client mails the file,
we read it and explain what went wrong."
Format example::
{"ts":"2026-05-17T05:30:00.123+00:00","level":"info","category":"session",
"session":"a1b2c3d4","message":"Session started",
"platform":"Windows 11","python":"3.14.0","user":"Michael Dombaugh",
"log_file":"C:\\Users\\Michael Dombaugh\\.datatools\\logs\\datatools-...jsonl"}
{"ts":"...","category":"upload","message":"Uploaded customers.csv",
"filename":"customers.csv","bytes":24813}
{"ts":"...","category":"analyze","message":"Analyzed customers.csv (3 findings)",
"filename":"customers.csv","findings":3,"rows":120,"cols":8}
{"ts":"...","category":"tool_run","message":"Clean Text run",
"page":"2_Text_Cleaner"}
{"ts":"...","category":"error","level":"error",
"message":"analyze(weird.csv): EmptyDataError: No columns to parse",
"filename":"weird.csv","outcome":"empty_after_repair"}
Public API:
- ``log_event(category, message, **extra)``
- ``log_session_start()`` — idempotent banner with platform info
- ``log_page_open(slug)`` — emit a ``nav`` event, deduplicated per
Streamlit session so reruns don't spam the log
- ``log_exception(where, exc, **extra)`` — convenience wrapper
- ``audit_log_path()`` / ``audit_log_dir()`` — for the UI
Wired in at:
- ``hide_streamlit_chrome``: stamps session start, mounts a small
"🩺 Diagnostics" expander in the sidebar with the log path and
an "Open log folder" button so the user can grab the file to
attach to a support email.
- Home page: ``upload`` event on every new file, ``upload`` event
on per-file remove, ``analyze`` event with file count when
Run-analysis fires.
- ``_run_analysis_on_upload``: ``analyze`` event with rows / cols /
findings count per file, plus ``error`` events on every caught
exception (empty upload, empty after repair, pandas EmptyDataError,
generic Exception).
- Every Ready tool page (1, 2, 3, 4, 5, 9): ``tool_run`` event
immediately after the primary action stashes its result.
- Every tool page (1-9): ``log_page_open(slug)`` on render — deduped
via session state so we don't get one event per Streamlit rerun.
Safety:
- ``log_event`` wraps every write in try/except. A broken audit
log must NOT crash the GUI.
- Non-JSON-serializable extras are ``str()``-coerced before writing.
- File CONTENTS are never logged. We capture filename, byte count,
and (in the analyzer) a 12-char sha1 fingerprint of the bytes so
the same file re-uploaded gets the same trace.
- License keys, session cookies, etc. are not logged.
- ``DATATOOLS_AUDIT_DIR`` env var lets tests redirect writes into a
tmp dir.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two reported issues addressed together because they're the same UX
flow (home findings panel → jump to relevant tool).
(1) Format-Standardizer recommendations weren't firing.
Reported: uploading a file from the format-cleaner test corpus
(``24_format_dates.csv``, ``25_format_phones.csv``,
``29_format_currencies.csv``, ``30_format_integration.csv``) showed
zero "Standardize Formats" recommendations even though the columns
clearly mixed multiple date / phone / currency formats.
Two underlying causes:
- ``_detect_inconsistent_date_format`` required two MATCHES per
distinct format. A test column with N rows each in a different
format had ≤1 match per format and was silently passed over.
Loosened to "≥1 match per format" — the inconsistency signal is
the presence of ≥2 distinct formats, not their volume.
- Only date inconsistency was detected. Phones, currency, and
booleans (the other format-standardizer fix categories) had no
detector at all.
Added three new detectors:
- ``_detect_inconsistent_phone_format``: nine phone-format regexes
(plain-10, US paren / dash / dot / space, +country, extension,
intl plus). Fires when a column is ≥35% phone-shaped AND mixes
≥2 formats.
- ``_detect_inconsistent_currency_format``: thirteen currency regexes
covering US ($1,234.56 / $1234.56), EU (€1.234,56), India lakh
notation, Swiss apostrophe, trailing-symbol, parens-negative,
prefix-currency-code, suffix-currency-code, and negative variants.
Same fire criteria as phone.
- ``_detect_inconsistent_boolean_format``: column is ≥80% boolean
tokens (yes/no/y/n/true/false/1/0) AND uses ≥3 distinct surface
forms (e.g. yes / Y / true / 1 mixed together).
Verified on every file in ``test-cases/format-cleaner-corpus/``:
24_format_dates, 25_format_phones, 29_format_currencies all now
produce a format-standardizer Finding. The integration test file
flags all three.
The threshold loosening (from 50% to 35% of values format-shaped) is
still strict enough to avoid false-positives on free-text comment
columns where a few cells happen to look phone- or date-shaped.
(2) The "Open <Tool>" jump links blended into the page.
Reported: the per-tool jump links inside the home findings panel
were too subtle to notice.
Replaced ``st.page_link`` with ``st.button(type="primary")`` so the
buttons render in Streamlit's primary-action red colour, matching the
"Clean Text" / "Find Duplicates" / etc. run buttons. Click handler
delegates to ``st.switch_page(page_slug)`` so it's still a soft
in-app navigation (no full reload).
2220 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reported: the sticky footer rendered, but the Back to Home button
inside it wasn't visible.
Likely cause: ``st.markdown`` inserts the footer div inside Streamlit's
content tree, which sits under ``.stApp { zoom: 0.85 }`` (our compact
scaler) and several nested padding/positioning contexts. Streamlit's
own ``<a>`` styling rules can also colour-collide with our anchor.
Switch the mount strategy. Two passes:
1. CSS rules go to the parent document via ``st.markdown`` as before,
but every property carries ``!important`` and the selectors key on
``#datatools-sticky-footer`` (id, not class) plus a dedicated
``.datatools-sticky-footer-link`` class on the anchor — so
Streamlit's default ``<a>`` styles can't override colour or
padding. ``z-index: 2147483646`` keeps the footer above
anything else in the page.
2. The footer DOM node itself is created by a script inside a
zero-height ``streamlit.components.v1.html`` iframe. The script
does ``window.parent.document.body.appendChild(...)`` so the div
lives as a direct child of ``<body>`` — outside ``.stApp``,
outside every Streamlit container, free of every parent's
``zoom`` / ``transform`` / ``overflow`` rules.
If the cross-frame access ever fails (Streamlit sandbox config
change), the script falls through to appending inside the
iframe's own document — degraded but still visible.
Each rerun replaces any prior ``#datatools-sticky-footer`` so we
don't accumulate stacked footers on every script pass.
2220 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two unrelated UX issues addressed in one sweep across all nine tool
pages because they share the same edit surface.
(1) Sticky footer replaces the top + bottom back-link buttons.
Reported: a big white empty footer space at the bottom of every page;
the Back to Home button at the top scrolled out of view on long pages.
New ``render_sticky_footer()`` helper in ``components/_legacy.py``
injects a fixed-position bar at ``bottom: 0`` of the viewport with:
- A border-top so it visually reads as a non-movable bar.
- A semi-transparent background (rgba 0.96 + ``backdrop-filter: blur``)
so content underneath shows through faintly when the user scrolls.
- A styled ``<a href="home">`` anchor (not an ``st.button``) because
Streamlit widgets can't be CSS-positioned reliably — Streamlit owns
the widget's DOM container and re-mounts it on every rerun. A real
anchor sits exactly where the CSS puts it and triggers Streamlit's
URL routing to the home page.
- ``padding-bottom: 3.5rem`` on the main container so the last widget
isn't hidden behind the bar.
Called once per tool page, immediately after ``hide_streamlit_chrome()``
so it renders even on pages that ``st.stop()`` early before any other
content runs. The old top-and-bottom ``back_to_home_link()`` calls are
removed from every tool page; their entry/exit points were dropping
the button when the script short-circuited.
(2) Tool-page headers now localize.
Reported: switching the sidebar language picker to Spanish left the
tool page's title + caption in English. Root cause: every page had
hard-coded ``st.title("✂️ Clean Text")`` / ``st.caption("Trim
whitespace...")`` strings.
Added per-tool ``tools.<id>.page_title`` and
``tools.<id>.page_caption`` keys to ``en.json`` and ``es.json`` for
all nine tools. Routed each page's title/caption call through ``t()``.
Verified: with ``ui_lang=es`` set, the Clean Text page now renders
"✂️ Limpiar texto" + the Spanish caption.
Updated ``tests/gui/test_smoke.py::EXPECTED_SUBSTRINGS`` so the
``es`` column for each tool page asserts the actual Spanish string
(was a duplicate of the English string back when the page bodies
were English-only).
2220 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reported: the "✕" buttons on the uploaded file list removed files
inconsistently — some clicks took, some didn't.
Two compounding causes:
1. ``key=f"_home_remove_{name}"`` embedded the raw filename in the
Streamlit widget key. Streamlit's widget-identity machinery
normalizes keys differently across reruns when they contain
spaces, dots, brackets, or non-ASCII characters, so a button's
identity could shift between the render where the user clicked
it and the rerun that should have processed the click. The click
was registered, but the post-rerun render produced a new widget
under a new effective key, and the original click was "lost".
2. The handler mutated ``home_uploads`` mid-loop while subsequent
iterations were still creating buttons. ``st.rerun()`` raises
synchronously, but if ANOTHER button's state changed in the same
pass (e.g. a stale click held over from a fast double-tap), the
ordering of state-mutation vs widget-key-update vs rerun could
race.
Fixes:
- Stable widget keys: ``f"_home_remove_{sha1(name)[:10]}"``. The
hash is identifier-safe regardless of spaces / dots / Unicode in
the filename. Verified across "sample with spaces.csv",
"sample.csv", and "日本語.csv" — three sequential Remove clicks
each remove exactly one file with no clicks lost.
- Two-phase capture: the loop collects the target ``to_remove``
filename, finishes rendering every other row at consistent widget
identity, THEN mutates state once and reruns. No more mid-loop
``del`` racing other widgets' click handlers.
- Wider click target: column ratio ``[8, 1]`` (was ``[12, 1]``) and
``use_container_width=True`` on the Remove button so the click
surface fills the entire column. Label changed to "Remove" for
the same reason — "✕" is a thin glyph that compressed the
hit-test region.
2220 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reported: too much whitespace between widgets, dividers, and headings.
Compact-spacing CSS layer added to ``_HIDE_CHROME_CSS`` (so it applies
on every page that calls ``hide_streamlit_chrome``):
- ``[data-testid="stVerticalBlock"]`` and ``stHorizontalBlock`` gap
trimmed from Streamlit's default ~1rem to 0.5rem.
- Heading margins (h1-h4) tightened — h1/h2/h3 used to leave 1-1.5rem
above; now 0.25-0.5rem.
- ``hr`` (``st.divider()``) drops from 1rem above+below to 0.4rem.
- Markdown paragraphs and captions: 0.25rem bottom margin instead of
the default 1rem.
- Expander summary padding reduced (0.35rem top/bottom).
- File-uploader, button, and metric tiles: trimmed internal padding.
Also slimmed the main-container padding from 1rem top / Streamlit
default bottom (~6rem) to 0.5rem top / 0.75rem bottom.
The existing ``zoom: 0.85`` on ``.stApp`` is kept — the user wanted
*less white space*, not *smaller content*, and dropping zoom would
shrink type alongside everything else.
2220 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reported: user asked whether we can send Alt+F4 / Ctrl+W to the
browser from JavaScript to force-close a tab.
Honest answer that's now baked into the hint message: NO. Synthesized
keyboard events from page JS only reach DOM event listeners, not the
browser chrome or the OS. There is no flag, API, or trick that lets
a page close a tab the user opened themselves. The page CAN close a
window it opened (window.opener trail) or one whose display-mode is
``standalone`` (Chrome/Edge ``--app=URL``) — that's what
``python -m src.gui`` arranges, and that's the path that actually
closes the window without a manual Ctrl+W.
Improvements landed:
1. ``isStandalone(win)`` detects Chrome --app windows up front
(``matchMedia('(display-mode: standalone)').matches``). In a
regular tab the manual hint surfaces immediately on the
"Close this window" click; in --app mode we only show it if the
close attempt actually fails.
2. ``fallbackToBlank(win)`` navigates the tab to ``about:blank``
via ``location.replace`` (no history pollution) so the user
sees a clean empty tab instead of the farewell overlay frozen
over Streamlit's connection-error banner. They still have to
Ctrl+W the blank tab, but the screen is no longer a misleading
"did it close or not?" mess. Fires 250 ms after a failed close
in --app mode (very rare path), or 1.5 s in a regular tab so
the user has time to read the hint.
3. Hint message rewritten in en + es to explain WHY the close is
blocked (browser security — not something we can override), to
acknowledge the Alt+F4 / Ctrl+W question directly (those don't
work either, for the same reason), and to point at
``python -m src.gui`` as the path that gives a clean auto-close.
2220 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>