Reported: clicking "Open Downloads folder" did nothing visible. The
previous implementation called ``os.startfile(folder)`` on Windows,
which is known to silently no-op or open Explorer behind the active
window in some configurations (Streamlit running headless, no
foreground rights inherited by the click handler thread, etc.).
Switch to the more reliable ``explorer /select,<file>`` form:
- Opens Explorer with the just-saved file pre-highlighted instead of
just navigating to the folder — better UX than the old behavior.
- explorer.exe is a real GUI process that's spawned in the user's
session with foreground rights, so it shows up on top.
- Fallback chain on Windows: ``/select`` first, then plain
``explorer <folder>``, then ``os.startfile`` as a last resort.
macOS upgraded the same way: ``open -R <file>`` reveals in Finder
rather than opening the directory.
Linux: no reliable cross-distro reveal, so ``xdg-open <folder>``.
Plus user feedback at the call site:
- On successful dispatch: ``st.toast("Opening <folder>", icon="📂")``
— confirms we tried, in case the window comes up behind the
browser.
- On dispatch failure: ``st.warning`` with the full path the user
can copy/paste into their file manager manually.
2220 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch the download mechanic from "browser <a download> with a data:
URL" to "write the bytes directly to the user's Downloads folder and
show them the exact path". DataTools runs as a local Streamlit app,
so the "server" IS the user's machine — there's no reason to go
through the browser save dialog at all.
Flow:
1. Click "Download <something>" button (rendered as a regular
``st.button``, so no widget-collision issues).
2. Bytes are written to ``Path.home() / "Downloads" / file_name``
(overwriting any same-named file).
3. The page reruns and renders a success caption with the absolute
path the file landed at.
4. An "📂 Open Downloads folder" button appears. Clicking it pops the
OS file manager via ``os.startfile`` (Windows), ``open`` (macOS),
or ``xdg-open`` (Linux).
Why this is better than the previous HTML-data-URL helper:
- Unambiguous about where the file went — user sees the full path,
not "wherever your browser was configured to save".
- The data: URL approach base64-inflated the page payload by 33% and
bloated for large outputs; server-side write is byte-for-byte.
- No more browser-side widget collision class of bug.
- The save action is a real Streamlit button, so the existing widget
semantics (disabled, help tooltip, key isolation) work without
workarounds.
API surface unchanged. New canonical name ``local_download_button``;
``html_download_button`` is kept as a back-compat alias that points
at the same implementation — every existing call site continues to
work without edits.
Tests are protected from polluting the developer's home dir via a
``DATATOOLS_DOWNLOADS_DIR`` env var override returned by the new
``_downloads_dir()`` helper. Smoke verified end-to-end via AppTest:
click → file appears in tmp dir → success banner shows path →
open-folder button renders.
2220 tests pass, 91 skipped, 35 s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reported: uploading 13_non_latin_scripts.csv made the home page bubble
a ``pandas.errors.EmptyDataError`` traceback up through the page
chrome instead of surfacing as a per-file error. In a multi-file
analysis run that kills every other file's results too, which is
worse than the symptom itself.
Wrap ``_run_analysis_on_upload`` in proper error handling:
- Empty bytes ``getvalue() == b""`` short-circuits with a synthetic
error Finding telling the user the upload was zero-byte and to
re-upload.
- Empty ``repair.repaired_bytes`` (file was all NULs / BOM / stripped
to nothing) likewise surfaces as a synthetic Finding rather than
reaching pd.read_csv.
- ``pd.errors.EmptyDataError`` from pandas is caught and rendered as
a Finding that names the file, its byte size, and suggests opening
it in a text editor to verify the header row matches the data row
delimiter.
- Any other exception during read/analyze is caught and surfaces as
a Finding via ``format_for_user`` so the user gets a clean message,
not a Python traceback.
Each file in a multi-file run now stands alone: a bad file produces
one red banner in its own card, every other file analyzes normally.
The 13_non_latin_scripts.csv corpus file is 249 bytes of valid UTF-8
on disk and parses cleanly under the same code path locally — the
user's specific symptom is likely a zero-byte upload (browser /
network / Python 3.14 + Streamlit edge case). The new ``empty_upload``
finding will name the bytes count so they can confirm.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There is no JavaScript override for browser tab-close security:
``window.close()`` only succeeds on windows JS opened (Chrome --app
windows qualify; a regular browser tab does not). What we can do is
make the --app path easier to hit and the failure case more
actionable.
Three changes:
1. ``src/gui/__main__.py`` — extend browser detection. PATH lookup
now also looks for ``msedge`` / ``microsoft-edge``; Windows install
candidates include the Edge install path; macOS candidates include
Edge and Chromium. Edge is Chromium-based, supports ``--app``, and
ships on every Windows 10+ machine — so users without Chrome no
longer fall through to the regular browser tab. When the fallback
IS hit, print a warning to stderr explaining why Close-from-page
will require Ctrl+W. Renamed ``_find_chrome`` to
``_find_app_browser`` to reflect the broader scope.
2. ``_FAREWELL_SCRIPT_TEMPLATE`` in ``components/_legacy.py`` —
factor close attempts into a ``tryClose`` helper that runs three
escalating tries: standard ``win.close()``, the
``win.open('', '_self')`` history-rewrite trick (no-op in modern
Chrome but free), and ``win.top.close()``. Auto-close on paint AND
the manual button now both call this helper. Skip the manual hint
if the close eventually succeeded between the click and the 250 ms
timeout.
3. ``quit.close_hint`` in en/es i18n packs — rewrite the message to
tell the user honestly that this is a browser security restriction,
tell them the Ctrl+W keystroke that works, and point them at
``python -m src.gui`` for the auto-closing app-mode experience.
2008 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reported symptom: only the FIRST download button in a multi-button
row pops the browser save dialog. The second and third do nothing on
click. Affects every tool page that exposes (cleaned + audit + config)
downloads.
Root cause is ``st.download_button`` itself — when several render in
the same script pass, the click-to-bytes wiring on the browser side
mis-routes and only one button's data is actually exposed. Explicit
``key`` arguments don't fix it; ``use_container_width=True`` doesn't
help either; we confirmed this in the Text Cleaner reverts.
Replace the widget with a real ``<a download="file" href="data:...">``
anchor rendered via ``st.markdown(..., unsafe_allow_html=True)``.
Bypasses Streamlit's widget machinery entirely; behaves identically to
a native browser download. Side benefit: clicking it does NOT trigger
a script rerun, so other in-flight UI state survives.
New helper ``html_download_button`` lives in
``src/gui/components/_legacy.py`` (exported from ``components``). API:
html_download_button(
label, data,
*, file_name, mime="application/octet-stream",
disabled=False, help=None, use_container_width=True,
)
Translation pattern applied across every tool page (and shared
``results_summary`` / ``config_panel`` widgets in ``_legacy.py``):
- ``st.download_button(`` -> ``html_download_button(``
- ``data=foo_bytes`` kwarg -> positional second arg
- ``key="..."`` -> dropped (helper has no widget identity)
- ``use_container_width=True`` -> dropped (default)
- ``disabled=`` and ``help=`` pass through unchanged
- Pre-computed byte buffers kept where they were
Total: 17 sites replaced (3 in Text Cleaner, 3 in Format
Standardizer, 3 in Fix Missing Values, 3 in Map Columns, 3 in
Automated Workflows, 2 in Find Duplicates page + 4 in shared
_legacy.py widgets used by Find Duplicates).
Caveat: data: URLs balloon by 33% (base64). Fine for tool output
sizes we ship; if a future result topped a few hundred MB we'd want a
Blob-URL fallback.
The marketing demo at src/gui/app_demo.py keeps its single
st.download_button — single button, no collision, no need to switch.
2008 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The farewell overlay already attempted ``window.top.close()`` after a
Close click — but browsers only honour that for tabs that JS opened
(Chrome --app windows qualify; a regular browser tab does not). For
users whose Chrome wasn't auto-detected and who fall back to
``webbrowser.open``, the overlay stays put and they had no in-page
way to close.
Add to the overlay HTML:
- A "Close this window" button (uses the user-gesture path, which has
slightly looser browser rules than auto-close).
- A hidden hint paragraph that reveals itself 250 ms after the
button is clicked IF the window is still here, telling the user to
press Ctrl+W (⌘W on Mac).
Wired through the existing _farewell_script template + ``_js_html_safe``
escaping so neither label can break out of the JS string literal.
New i18n keys (en + es): ``quit.close_window_button`` and
``quit.close_hint``.
The existing auto-close attempt remains — Chrome --app users still get
their window closed without touching the button.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Multi-file workflow: a user uploads several files on Home, clicks
"Open <Tool>" on one file's findings, lands on a tool page. The
sidebar lets them get back to Home, but a top-of-page back affordance
is more discoverable and keeps the hand in the same screen region as
the upload list they're working through.
- New ``back_to_home_link()`` helper in components/_legacy.py renders
a secondary button that calls ``st.switch_page("app.py")`` — under
``st.navigation`` that routes to the default (Home) page.
- Wired into every tool page (1-9) directly after
``hide_streamlit_chrome()`` and BEFORE the license gate so a Lite
user who lands on a locked tool can navigate away without paying.
- New i18n key ``nav.back_to_home`` ("← Back to Home" /
"← Volver al inicio") in en/es packs.
2008 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Close is now a direct shutdown trigger: visiting the Close page (the
sidebar entry) fires shutdown_app() immediately — no confirm step, no
intermediate body. The farewell overlay paints and os._exit(0) lands
~1s later from a daemon thread.
Layout: Close moved into its own bottom-of-sidebar section so the
destructive action is visually separated from Account/Activate.
- New shutdown_app() in components/_legacy.py replaces quit_button.
os._exit thread is skipped when "pytest" is in sys.modules so the
test suite doesn't suicide on rendering 99_Close.
- pages/99_Close.py shrinks to set_page_config + chrome + shutdown_app.
- app.py nav grows a new "Close" section header (new
nav.section_close key in en/es packs) pinned at the bottom of the
navigation dict.
Tests updated:
- TestQuitButtonRenders → TestClosePageShutsDownImmediately.
Assert the shutdown caption renders + no confirm button exists.
- test_smoke EXPECTED_SUBSTRINGS["99_Close"] now pins
"Shutting down" / "Cerrando" (the visible page body) instead of
the removed page title.
2008 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Home is now the only entry point: the "Run analysis" button on the
upload section IS the review step (findings render inline via
render_findings_panel). Tool pages no longer gate on a passed
normalization — running the analyzer is sufficient context.
Removed:
- src/gui/pages/0_Review.py
- src/gui/components/gate.py (re-export seam)
- require_normalization_gate() in src/gui/components/_legacy.py
- "review" section enum in tools_registry.py
- Data Review entry in app.py navigation
- require_normalization_gate() calls + imports in all nine tool pages
- tests/gui/test_gate.py (whole file)
- TestReviewWorkflow in tests/gui/test_workflows.py
- 0_Review entry in tests/gui/test_smoke.py PAGE_SLUGS
- stash_upload's normalization_result+normalization_for stashing
- stash_upload_without_gate (was the gate's negative-path helper)
2017 tests pass (16 retired with the gate flow).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two coupled hardening upgrades.
1. Asymmetric signatures (HMAC → Ed25519)
The previous HMAC scheme used a symmetric secret that any motivated
reverse engineer could pull out of the shipped binary and use to
mint blobs for any tier / name / email. With Ed25519, the binary
ships only the public verification key; the signing key never
leaves the seller's environment, so binary compromise no longer
yields forgery.
- src/license/crypto.py rewritten around
cryptography.hazmat.primitives.asymmetric.ed25519. Same public
API surface (sign/verify/encode_blob/decode_blob), same canonical
JSON encoding — drop-in for the manager / cli / GUI layers.
- DATATOOLS_LICENSE_PRIVKEY (seller-side) and
DATATOOLS_LICENSE_PUBKEY (build-time) env vars supply the keys;
the in-source dev keypair (src/license/_dev_keypair.py)
deterministically derives from a seed phrase for repro builds and
tests.
- Blob prefix bumped DTLIC1: → DTLIC2:. Decoding a DTLIC1 blob
surfaces a clear "old format" error rather than a confusing
signature mismatch.
- scripts/generate_keypair.py mints fresh production keypairs for
the seller (run once, stash the private key offline). Adds
cryptography>=41,<46 to requirements.txt (was an undeclared
transitive dep).
2. Production-safe tripwire
assert_production_safe() refuses to boot a frozen / shipped build
when either:
- DATATOOLS_DEV_MODE=1 is set (would unconditionally bypass every
license check — fine in source/test but catastrophic in a buyer
install).
- The active verification key is still the embedded dev key (the
build pipeline forgot to set DATATOOLS_LICENSE_PUBKEY).
No-op in source / pytest runs (sys.frozen is unset) so test
fixtures and dev workflows keep working without ceremony. Called
from src/cli_license_guard.guard() and from hide_streamlit_chrome
— so it fires on every CLI invocation and every GUI page load.
Tests: 49 license-layer unit tests (was 40); added Ed25519
wrong-key rejection, dev-keypair seed pin, blob v2 prefix, v1
rejection with clear message, and four production-safe scenarios
(no-op in source, fires on DEV_MODE in frozen, fires on dev key in
frozen, passes in frozen with prod pubkey). Total: 2024 → 2033.
Docs (REQUIREMENTS §17a, DEVELOPER licensing recipe, DECISIONS
§9b + decision log) updated with the new threat-model write-up,
key-storage workflow, and tripwire behaviour.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two coupled changes:
1. Lite tier
- New Tier.LITE in src/license/schema.py.
- FEATURES_BY_TIER[Tier.LITE] = {Deduplicator, Text Cleaner,
Format Standardizer}. The three universally-useful tools that
cover the most common bookkeeping / RevOps / Klaviyo prep
workflows. Other six tools require Core.
- i18n: license.tier_lite, license.feature_locked_title,
license.feature_locked_body, license.upgrade_link,
license.status_locked (en + es).
- Per-tool feature gate at every GUI tool page
(require_feature_or_render_upgrade) and every tool CLI
(guard(feature=...)). A locked tool renders an upgrade
prompt + Manage-license button (GUI) or exits with code 2
(CLI).
- Home grid: tool cards the user's tier doesn't unlock get a
red 🔒 Locked badge in place of green Ready.
2. Trial removed
- Activation form's "Start 1-year trial" button removed.
- license_cli's `trial` subcommand removed.
- activation.trial_button / activation.trial_help i18n keys
dropped (pack parity test stays green).
- Tier.TRIAL stays in the enum (back-compat with any field-
tested trial licenses); LicenseManager._mint stays internal
for tests and the seller's key generator.
- Decision logged in DECISIONS §9b: a 1-year all-features
trial undercuts paid Lite; paid-only keeps tier economics
clean.
Tests (+29 net): +17 Lite-tier unit/guard tests + 13 Lite-tier
GUI tests + 1 trial-absent assertion - 2 trial CLI tests - 1
trial GUI button test. Total: 1995 → 2024.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A complete offline licensing layer (no internet at any step):
Core
- src/license/ — schema (License, Tier, FeatureFlag), HMAC crypto,
JSON storage, LicenseManager singleton with activate/renew/
deactivate/issue_trial. Tier-scaffolded so future SKUs can carve
per-tool feature sets without consumer-code edits.
- scripts/generate_license.py — creator-only key generator. Mints a
DTLIC1: blob the buyer pastes into the activation page.
GUI
- New activation form component (src/gui/components/activation.py).
- hide_streamlit_chrome() now inline-renders the activation form when
no valid license is present (every page short-circuits to the form
until activated).
- Sidebar shows tier + days remaining; renewal warning under 30 days.
- New pages/_Activate.py for revisiting the form after activation.
CLI
- src/license_cli.py — activate / renew / status / trial / deactivate
commands. Exempt from the guard.
- src/cli_license_guard.py — drop-in guard call added to every tool
CLI's main(). Lets --help through; respects DATATOOLS_DEV_MODE.
i18n
- New activation.* and license.* keys in en.json + es.json
(page title, form labels, status badges, renewal warnings, error
messages). Pack parity test stays green.
Test infrastructure
- tests/conftest.py autouse fixture sets DATATOOLS_DEV_MODE=1 so the
existing 1916 tests continue to pass.
- isolated_license_path / activated_license_manager /
unactivated_license_manager fixtures for tests that want to drive
the real check.
Tests (+79)
- tests/test_license.py (40): schema, crypto roundtrip, blob
encode/decode, tier→feature mapping, activation flow, name/email
mismatch rejection, tamper detection, expiration, renewal,
dev-mode bypass.
- tests/test_license_cli.py (26): every license_cli command +
subprocess tests confirming every tool CLI refuses to run without
a license, --help always works, DEV_MODE bypasses.
- tests/gui/test_activation.py (13): gate blocks without license,
passes with trial, activation form submission unlocks the gate,
sidebar status, renewal warning, i18n.
Total: 1916 → 1995 tests. All pass under the strict warning filter.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduces ``src/i18n`` with a tiny JSON-backed t() lookup, an in-session
language preference, and a sidebar selector wired through
``hide_streamlit_chrome`` so every page picks up the same picker. Covers
home, tool cards, findings panel, gate, shutdown, and pickup banner
strings. Tests pin pack parity and the farewell-overlay JS escape so
future packs can't silently regress.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Update the Close page intro, the shutdown overlay, and the toast so
they all read "you can close this window" — clearer for users running
the app in a dedicated browser window rather than a tab.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the data:-URL navigation (blocked by Chrome since v60 for
top-frame navigation) with a direct DOM-append of a full-screen
overlay onto the parent document. Uses z-index 2147483647 so it sits
above Streamlit's connection-error banner when the websocket drops.
Note: still doesn't fully suppress the connection-error banner in
testing — the next iteration will render the overlay through
Streamlit's own page rather than via a component iframe.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move the shutdown control out of the inline sidebar widget and into
its own page (pages/99_Close.py), so it appears in the sidebar nav
alongside the tool pages. An explicit confirm button on the page
prevents accidental nav clicks from killing a live session.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signalling the process with SIGTERM/SIGINT didn't reliably shut Streamlit
down — its tornado/asyncio loop swallowed or deferred the signal, so the
browser saw the websocket drop ("Connection error") while the python
process kept running. Replace the signal with a daemon-thread
``os._exit(0)`` after a short delay so the current rerun can paint the
"shutting down" message before the process is hard-killed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The footer placement was easy to miss (below all tool cards) and only
rendered on the home page. Hook the button into hide_streamlit_chrome()
so every page that hides default chrome — home + all 9 tool pages — gets
the Quit button at the bottom of the sidebar without per-page edits.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The chrome-hiding CSS was removing the Streamlit header wholesale,
which also took the sidebar's expand chevron with it — a collapsed
sidebar became unreopenable. Make the header transparent instead and
explicitly preserve the sidebar collapsed-control.
Also add a Quit button in the app footer that signals the Streamlit
server (SIGTERM, falling back to SIGINT) so closing the GUI returns
the shell prompt cleanly instead of leaving Python hung.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Streamlit's default file_uploader footer reads "Limit 200MB per file —
CSV, TSV, XLSX, XLS" which contradicts the 1 GB efficiency target shipped
in 438bc0f and codified in docs/REQUIREMENTS.md §1.1.
Three changes:
1. .streamlit/config.toml — set [server] maxUploadSize = 1024. Footer
now reads "Limit 1024MB per file".
2. upload_and_analyze_section (home page) — adds an explicit caption
above the uploader stating size limit, supported formats, the four
auto-detected delimiters, and the 13 auto-detected encodings (with
the Review-page override as the safety net).
3. pickup_or_upload (every tool page that falls back to its own
uploader when no home-page upload is present) — same caption,
only rendered when the upload accepts CSV/TSV/XLSX/XLS so JSON
schema / config uploaders aren't decorated.
Test suite: 765 passed, 17 xfailed (no regressions). Home + Review +
Deduplicator pages all serve HTTP 200 under the new config.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two low-risk seam moves to enable selling per-tool subsets without
breaking the existing all-in-one bundle. Behaviour identical; every
existing import still resolves; full pytest suite + every page returns
HTTP 200.
1. **Tool registry** (src/gui/tools_registry.py) — replaces the
inline dict-of-dicts in app.py with a Tool dataclass and a TOOLS
list. Adds a tier field ("core" today, "pro" / "enterprise" later)
and tools_for_tier() / tool_by_id() / display_name() helpers. A
per-tool build slices TOOLS at import time without code changes.
2. **components package** (src/gui/components/) — converts the former
single components.py into a package with:
_legacy.py — original file, unchanged.
__init__.py — re-exports the legacy surface; existing
"from src.gui.components import …" calls
continue to work.
shared.py — hide_streamlit_chrome, pickup_or_upload
(every build needs these).
gate.py — require_normalization_gate (Pro / Suite SKUs).
findings.py — analyzer-finding widgets (drops out of a
standalone-Dedup build).
dedup_review.py — match-group cards + apply pipeline (drops out
of a non-dedup build).
The seam modules are narrow re-exports today. As code migrates out
of _legacy.py into the focused modules, the public import path
stays stable via the shim.
E2E: 765 passed, 17 xfailed (unchanged); home page + all 9 tool pages
+ Review page render HTTP 200; full pipeline (analyze → auto_fix →
apply_decisions → output bytes) round-trips on the kitchen-sink
fixture with zero high-confidence findings remaining post-fix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>