datatools-dev

Author	SHA1	Message	Date
Michael	ae9d4a2db5	fix(home): defensive analysis errors don't crash the whole page Reported: uploading 13_non_latin_scripts.csv made the home page bubble a ``pandas.errors.EmptyDataError`` traceback up through the page chrome instead of surfacing as a per-file error. In a multi-file analysis run that kills every other file's results too, which is worse than the symptom itself. Wrap ``_run_analysis_on_upload`` in proper error handling: - Empty bytes ``getvalue() == b""`` short-circuits with a synthetic error Finding telling the user the upload was zero-byte and to re-upload. - Empty ``repair.repaired_bytes`` (file was all NULs / BOM / stripped to nothing) likewise surfaces as a synthetic Finding rather than reaching pd.read_csv. - ``pd.errors.EmptyDataError`` from pandas is caught and rendered as a Finding that names the file, its byte size, and suggests opening it in a text editor to verify the header row matches the data row delimiter. - Any other exception during read/analyze is caught and surfaces as a Finding via ``format_for_user`` so the user gets a clean message, not a Python traceback. Each file in a multi-file run now stands alone: a bad file produces one red banner in its own card, every other file analyzes normally. The 13_non_latin_scripts.csv corpus file is 249 bytes of valid UTF-8 on disk and parses cleanly under the same code path locally — the user's specific symptom is likely a zero-byte upload (browser / network / Python 3.14 + Streamlit edge case). The new ``empty_upload`` finding will name the bytes count so they can confirm. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:22:10 +00:00
Michael	ef9f8b5de4	fix(close): Edge fallback + better tryClose + honest hint There is no JavaScript override for browser tab-close security: ``window.close()`` only succeeds on windows JS opened (Chrome --app windows qualify; a regular browser tab does not). What we can do is make the --app path easier to hit and the failure case more actionable. Three changes: 1. ``src/gui/__main__.py`` — extend browser detection. PATH lookup now also looks for ``msedge`` / ``microsoft-edge``; Windows install candidates include the Edge install path; macOS candidates include Edge and Chromium. Edge is Chromium-based, supports ``--app``, and ships on every Windows 10+ machine — so users without Chrome no longer fall through to the regular browser tab. When the fallback IS hit, print a warning to stderr explaining why Close-from-page will require Ctrl+W. Renamed ``_find_chrome`` to ``_find_app_browser`` to reflect the broader scope. 2. ``_FAREWELL_SCRIPT_TEMPLATE`` in ``components/_legacy.py`` — factor close attempts into a ``tryClose`` helper that runs three escalating tries: standard ``win.close()``, the ``win.open('', '_self')`` history-rewrite trick (no-op in modern Chrome but free), and ``win.top.close()``. Auto-close on paint AND the manual button now both call this helper. Skip the manual hint if the close eventually succeeded between the click and the 250 ms timeout. 3. ``quit.close_hint`` in en/es i18n packs — rewrite the message to tell the user honestly that this is a browser security restriction, tell them the Ctrl+W keystroke that works, and point them at ``python -m src.gui`` for the auto-closing app-mode experience. 2008 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:17:18 +00:00
Michael	aeead05e4c	fix(downloads): swap st.download_button for an HTML <a download> helper Reported symptom: only the FIRST download button in a multi-button row pops the browser save dialog. The second and third do nothing on click. Affects every tool page that exposes (cleaned + audit + config) downloads. Root cause is ``st.download_button`` itself — when several render in the same script pass, the click-to-bytes wiring on the browser side mis-routes and only one button's data is actually exposed. Explicit ``key`` arguments don't fix it; ``use_container_width=True`` doesn't help either; we confirmed this in the Text Cleaner reverts. Replace the widget with a real ``<a download="file" href="data:...">`` anchor rendered via ``st.markdown(..., unsafe_allow_html=True)``. Bypasses Streamlit's widget machinery entirely; behaves identically to a native browser download. Side benefit: clicking it does NOT trigger a script rerun, so other in-flight UI state survives. New helper ``html_download_button`` lives in ``src/gui/components/_legacy.py`` (exported from ``components``). API: html_download_button( label, data, *, file_name, mime="application/octet-stream", disabled=False, help=None, use_container_width=True, ) Translation pattern applied across every tool page (and shared ``results_summary`` / ``config_panel`` widgets in ``_legacy.py``): - ``st.download_button(`` -> ``html_download_button(`` - ``data=foo_bytes`` kwarg -> positional second arg - ``key="..."`` -> dropped (helper has no widget identity) - ``use_container_width=True`` -> dropped (default) - ``disabled=`` and ``help=`` pass through unchanged - Pre-computed byte buffers kept where they were Total: 17 sites replaced (3 in Text Cleaner, 3 in Format Standardizer, 3 in Fix Missing Values, 3 in Map Columns, 3 in Automated Workflows, 2 in Find Duplicates page + 4 in shared _legacy.py widgets used by Find Duplicates). Caveat: data: URLs balloon by 33% (base64). Fine for tool output sizes we ship; if a future result topped a few hundred MB we'd want a Blob-URL fallback. The marketing demo at src/gui/app_demo.py keeps its single st.download_button — single button, no collision, no need to switch. 2008 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:13:41 +00:00
Michael	6415be8bf4	feat(tools): unified post-run UX across all Ready tool pages Apply the Clean Text page's post-run UX pattern to every other Ready tool page (Find Duplicates, Standardize Formats, Fix Missing Values, Map Columns, Automated Workflows) for consistency and ease of use. Per page: 1. Preview wrapped in ``st.expander(f"Preview: {filename}", expanded=not _has_result)``. Open before a result exists, folded afterwards. 2. Options / configuration controls wrapped in ``st.expander("Options", expanded=not _has_result)``. Inner sub-expanders preserved (Streamlit 1.36+ supports nesting). 3. After the primary action stashes the result, set a one-shot ``_<tool>_scroll_to_results`` flag in session state and call ``st.rerun()`` so the preview + options expanders see the new state on the next pass and collapse themselves. 4. ``<div id="<tool>-results-anchor" style="height:1px">`` placed immediately before the Results subheader. 5. End-of-page: pop the scroll flag and inject a tiny ``streamlit.components.v1.html`` iframe whose ``<script>`` calls ``scrollIntoView`` on the parent document's anchor. One-shot, so unrelated reruns (toggling Show-hidden, etc.) don't yank the viewport. 6. Download buttons hardened against the multi-button Streamlit footgun: byte buffers pre-computed outside the column scopes, explicit unique ``key="<tool>_dl_<purpose>"`` per button, ``use_container_width=True``, and previously-conditional buttons now render unconditionally with ``disabled=True`` + a help tooltip when the underlying data is empty so layout stays steady. Per-page judgment calls (already noted in agent reports): - Find Duplicates: sheet picker and delimiter selector kept OUTSIDE expanders (the user still needs to see them when a file fails to parse). - Fix Missing Values: missingness profile wrapped INSIDE the Options expander together with Strategy — the Results section already shows a before/after missingness comparison that supersedes the static input profile. - Map Columns: all three subsections (Target schema, Strategy, Mapping) wrapped under one outer Options expander, matching the Text Cleaner pattern. - Automated Workflows: inner "Recommended tool order" expander stays nested inside the outer Options wrap; Run button stays outside Options so the user can re-run after tweaking the (collapsed) editor. 2008 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:04:37 +00:00
Michael	d1aaf3c2b9	feat(quit): close-window button + manual hint on the farewell overlay The farewell overlay already attempted ``window.top.close()`` after a Close click — but browsers only honour that for tabs that JS opened (Chrome --app windows qualify; a regular browser tab does not). For users whose Chrome wasn't auto-detected and who fall back to ``webbrowser.open``, the overlay stays put and they had no in-page way to close. Add to the overlay HTML: - A "Close this window" button (uses the user-gesture path, which has slightly looser browser rules than auto-close). - A hidden hint paragraph that reveals itself 250 ms after the button is clicked IF the window is still here, telling the user to press Ctrl+W (⌘W on Mac). Wired through the existing _farewell_script template + ``_js_html_safe`` escaping so neither label can break out of the JS string literal. New i18n keys (en + es): ``quit.close_window_button`` and ``quit.close_hint``. The existing auto-close attempt remains — Chrome --app users still get their window closed without touching the button. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 20:59:17 +00:00
Michael	27f0648093	fix(text-cleaner): make all three download buttons actually fire Only "Download cleaned CSV" was working; "Download changes audit" and "Download config JSON" did nothing on click. The symptom is the classic Streamlit footgun for multiple ``st.download_button`` widgets in adjacent columns: without an explicit ``key`` argument the auto-derived widget IDs can collide, especially when one button is conditionally rendered, and only the first button in source order actually fires on click. Same goes for unstable ``data`` bytes recomputed inside the ``with col:`` block — the widget identity can drift between renders. Robustness pattern applied: - Compute all three byte buffers up front, outside the columns, so the ``data`` parameter is the same object across reruns. - Pass an explicit unique ``key`` ("textclean_dl_cleaned" / "textclean_dl_changes" / "textclean_dl_config") to each button. - Render the changes button unconditionally with ``disabled=True`` and a help tooltip when ``result.changes.empty`` — instead of hiding it. Layout stays steady and the empty case is self-explanatory. - ``use_container_width=True`` so the three buttons size identically inside their columns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 20:56:52 +00:00
Michael	0a61d52200	feat(text-cleaner): collapse options + auto-scroll to Results on run After clicking Clean Text the user was left at the bottom of the script with the Options block still expanded and no viewport movement — they had to scroll to find the Results. - Wrap the whole Options block in an outer ``st.expander("Options", expanded=not _has_result)``. After the Clean Text rerun, both Preview AND Options collapse, leaving the primary action button + Results as the only prominent elements above the fold. The inner Advanced-options expander is preserved as a nested expander (supported in Streamlit 1.36+; this repo pins 1.35+). - Add a 1px anchor div ``#textclean-results-anchor`` immediately before the Results subheader. - On Clean Text click, set a one-shot ``_textclean_scroll_to_results`` flag in session state; on the next render, pop the flag and inject a tiny ``st.components.v1.html`` iframe whose ``<script>`` calls ``scrollIntoView`` on the parent document's anchor. One-shot so re-renders triggered by other widgets (Show-hidden toggle, etc.) don't jerk the viewport back to the top of Results. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 20:50:43 +00:00
Michael	ca14ce2952	feat(text-cleaner): collapse preview on run + full hidden-char audit Two small UX fixes on the Clean Text page: 1. The input preview is now wrapped in an ``st.expander`` whose default-expanded state is ``not has_result``. Clicking the "Clean Text" primary button stashes the result and calls ``st.rerun()`` so the next pass sees the result in session state and the expander folds — the Results section becomes the primary visual focus. User can re-expand manually to re-inspect the source. 2. The Examples (changes audit) table's Before/After columns were calling ``visualize_hidden_html`` WITHOUT ``mark_outer_whitespace``, so leading/trailing whitespace — which is exactly what the cleaner most often removes — was invisible. Pass ``mark_outer_whitespace=True`` to match the input-preview rendering. Column-name cell now mirrors that flag too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 20:43:52 +00:00
Michael	502a72cd46	feat(nav): ← Back to Home link on every tool page Multi-file workflow: a user uploads several files on Home, clicks "Open <Tool>" on one file's findings, lands on a tool page. The sidebar lets them get back to Home, but a top-of-page back affordance is more discoverable and keeps the hand in the same screen region as the upload list they're working through. - New ``back_to_home_link()`` helper in components/_legacy.py renders a secondary button that calls ``st.switch_page("app.py")`` — under ``st.navigation`` that routes to the default (Home) page. - Wired into every tool page (1-9) directly after ``hide_streamlit_chrome()`` and BEFORE the license gate so a Lite user who lands on a locked tool can navigate away without paying. - New i18n key ``nav.back_to_home`` ("← Back to Home" / "← Volver al inicio") in en/es packs. 2008 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 20:38:01 +00:00
Michael	604debb9a9	revert(home): keep per-tool grouping for per-file findings Restoring ``render_findings_panel`` on the home page. Previous commit (`c575efd`) inlined a flat renderer that dropped the per-tool grouping and the "Open <Tool>" jump links — that was an over-correction. The user only wanted the bottom tool-card grid gone (already removed in `ff2eaeb`). The grouping inside the findings panel is what lets a user land on a specific finding and one-click into the cleaner that fixes it; without it they'd have to guess which sidebar entry to open. Tool-card grid stays removed. Sidebar nav is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 20:31:36 +00:00
Michael	c575efd26e	fix(home): render findings flat — drop per-tool grouping The home page was calling ``render_findings_panel``, which groups findings by tool into expanders and renders an "Open <Tool>" page link under each. After uploading a file, the user still saw a tool list (just under a different shape) — defeating the earlier cleanup that removed the tool-cards grid. Inline a flat renderer in ``_home_page``: per uploaded file, render the filename header + severity summary + a flat list of findings via ``_render_one_finding`` directly. No expanders, no tool names as section headers, no per-tool page-link buttons. Tool discovery happens in the sidebar. ``render_findings_panel`` itself is unchanged — it still groups by tool and remains tested via the findings-panel harness, but is no longer used on the home page. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 20:22:20 +00:00
Michael	175389219f	fix(gui): translate sidebar tool names when language changes The sidebar nav was passing ``tool.name`` (the registry's English field) to ``st.Page``, so the tool entries stayed in English even after the user picked Spanish from the language selector. Section headers were already i18n-driven; tool entries were not. Switch to ``tool_name(tool_id)`` which routes through ``t(...)`` and picks up the active language from session state. Verified: with ``ui_lang=es`` the sidebar renders Buscar duplicados / Limpiar texto / Mapear columnas / etc. instead of the English fallbacks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 20:19:15 +00:00
Michael	c568aec8a7	feat(gui): one-click Close in its own bottom sidebar section Close is now a direct shutdown trigger: visiting the Close page (the sidebar entry) fires shutdown_app() immediately — no confirm step, no intermediate body. The farewell overlay paints and os._exit(0) lands ~1s later from a daemon thread. Layout: Close moved into its own bottom-of-sidebar section so the destructive action is visually separated from Account/Activate. - New shutdown_app() in components/_legacy.py replaces quit_button. os._exit thread is skipped when "pytest" is in sys.modules so the test suite doesn't suicide on rendering 99_Close. - pages/99_Close.py shrinks to set_page_config + chrome + shutdown_app. - app.py nav grows a new "Close" section header (new nav.section_close key in en/es packs) pinned at the bottom of the navigation dict. Tests updated: - TestQuitButtonRenders → TestClosePageShutsDownImmediately. Assert the shutdown caption renders + no confirm button exists. - test_smoke EXPECTED_SUBSTRINGS["99_Close"] now pins "Shutting down" / "Cerrando" (the visible page body) instead of the removed page title. 2008 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 20:17:14 +00:00
Michael	ff2eaeb6c4	feat(home): multi-file upload + per-file analysis, drop tool grid Home is now upload + analysis only. The page accepts multiple files in one go, analyzes each independently, and renders findings grouped by filename in bordered containers. The 3-section tool-card grid is gone — discovery happens via the sidebar now. Mechanics: - file_uploader uses accept_multiple_files=True. Each file's findings cache in session_state["home_findings_by_file"] keyed by filename so removing a file via Streamlit's "x" button drops its findings too, and re-clicking Run only re-analyzes pending files. - The first uploaded file is mirrored into the singular home_uploaded_{name,bytes,size} keys so tool pages continue to pick up an "active" upload through pickup_or_upload — no tool-page changes. - New i18n keys: upload.intro_multi, upload.uploader_label_multi, upload.clear_results, upload.empty_state. upload.heading text is updated to "Upload one or more files to start" (EN + ES). Dropped tests pinning the tool grid: - TestHomeToolGridLocalization (test_chrome.py) - test_home_tool_card_uses_es_name (test_smoke.py) - TestLiteHomeGridBadges (test_lite_tier.py — locked-card lock-badge assertions; locking is still enforced per-tool-page via require_feature_or_render_upgrade) 2009 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 20:12:48 +00:00
Michael	dad744f17f	refactor(gui): drop Review page + normalization gate Home is now the only entry point: the "Run analysis" button on the upload section IS the review step (findings render inline via render_findings_panel). Tool pages no longer gate on a passed normalization — running the analyzer is sufficient context. Removed: - src/gui/pages/0_Review.py - src/gui/components/gate.py (re-export seam) - require_normalization_gate() in src/gui/components/_legacy.py - "review" section enum in tools_registry.py - Data Review entry in app.py navigation - require_normalization_gate() calls + imports in all nine tool pages - tests/gui/test_gate.py (whole file) - TestReviewWorkflow in tests/gui/test_workflows.py - 0_Review entry in tests/gui/test_smoke.py PAGE_SLUGS - stash_upload's normalization_result+normalization_for stashing - stash_upload_without_gate (was the gate's negative-path helper) 2017 tests pass (16 retired with the gate flow). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 20:04:33 +00:00
Michael	fc6c22c6a7	feat(review): inline file uploader instead of redirect home When a user lands on Review without an upload, show a file uploader on the page itself and auto-run the analyzer once a file is picked, rather than bouncing them to the home page with a "Back to home" button. Auto-analyze is the right default here: the user is already on the Review page, so they've implicitly committed to a scan. Stashing the bytes in the same session-state keys the home page uses keeps the rest of the flow (encoding picker, gate, tool pages) unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 19:57:01 +00:00
Michael	db5ec084da	docs+code: rename tool labels everywhere Sweep follow-up to `93e43fc`. Display labels now consistent across docs, landing pages, CLI output, code comments, docstrings, and test prose. Five parallel surfaces touched: - docs (EN + ES): README, USER-GUIDE, CLI-REFERENCE, and 11 internal design/planning docs - landing pages: index + bookkeeper/revops/shopify-pet - src: CLI module docstrings, _TOOL_DISPLAY dicts in cli_analyze.py and gui/components/_legacy.py, core module headers, every tool page's module docstring - tests: class/method/module docstrings and section-header comments - test-cases READMEs Page slugs (1_Deduplicator etc.), tool_id strings (01_deduplicator etc.), Python class names (TestDeduplicatorWorkflow, FeatureFlag.*), URL paths, anchor IDs, CSS classes, and asset filenames were left intact since they're code identifiers / structural references. All 2033 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 19:50:09 +00:00
Michael	93e43fc0d9	feat(gui): sidebar sections + non-technical tool labels Sidebar nav now groups tools under Data Review / Data Cleaners / Transformations / Automations via st.navigation, replacing the flat auto-discovered list. Tool display names switch to action-first phrasing (Find Duplicates, Fix Missing Values, Find Unusual Values, Standardize Formats, Clean Text, Quality Check, Map Columns, Combine Files, Automated Workflows) in EN + ES packs and on each page's H1. The Data Cleaners section follows the requested order: Missing Values → Outliers → Text Cleaner → Format Standardizer → Deduplicator → Quality Check. (Text Cleaner kept inside cleaners since the request didn't list it but the tool still ships.) Registry now carries a section field; helpers added: tools_in_section(), section_label(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 19:36:01 +00:00
Michael	e534fb4989	sec(license): Ed25519 sigs + production-safe tripwire Two coupled hardening upgrades. 1. Asymmetric signatures (HMAC → Ed25519) The previous HMAC scheme used a symmetric secret that any motivated reverse engineer could pull out of the shipped binary and use to mint blobs for any tier / name / email. With Ed25519, the binary ships only the public verification key; the signing key never leaves the seller's environment, so binary compromise no longer yields forgery. - src/license/crypto.py rewritten around cryptography.hazmat.primitives.asymmetric.ed25519. Same public API surface (sign/verify/encode_blob/decode_blob), same canonical JSON encoding — drop-in for the manager / cli / GUI layers. - DATATOOLS_LICENSE_PRIVKEY (seller-side) and DATATOOLS_LICENSE_PUBKEY (build-time) env vars supply the keys; the in-source dev keypair (src/license/_dev_keypair.py) deterministically derives from a seed phrase for repro builds and tests. - Blob prefix bumped DTLIC1: → DTLIC2:. Decoding a DTLIC1 blob surfaces a clear "old format" error rather than a confusing signature mismatch. - scripts/generate_keypair.py mints fresh production keypairs for the seller (run once, stash the private key offline). Adds cryptography>=41,<46 to requirements.txt (was an undeclared transitive dep). 2. Production-safe tripwire assert_production_safe() refuses to boot a frozen / shipped build when either: - DATATOOLS_DEV_MODE=1 is set (would unconditionally bypass every license check — fine in source/test but catastrophic in a buyer install). - The active verification key is still the embedded dev key (the build pipeline forgot to set DATATOOLS_LICENSE_PUBKEY). No-op in source / pytest runs (sys.frozen is unset) so test fixtures and dev workflows keep working without ceremony. Called from src/cli_license_guard.guard() and from hide_streamlit_chrome — so it fires on every CLI invocation and every GUI page load. Tests: 49 license-layer unit tests (was 40); added Ed25519 wrong-key rejection, dev-keypair seed pin, blob v2 prefix, v1 rejection with clear message, and four production-safe scenarios (no-op in source, fires on DEV_MODE in frozen, fires on dev key in frozen, passes in frozen with prod pubkey). Total: 2024 → 2033. Docs (REQUIREMENTS §17a, DEVELOPER licensing recipe, DECISIONS §9b + decision log) updated with the new threat-model write-up, key-storage workflow, and tripwire behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 17:34:48 +00:00
Michael	d32b58e61a	feat(license): add Lite SKU; remove user-facing free trial Two coupled changes: 1. Lite tier - New Tier.LITE in src/license/schema.py. - FEATURES_BY_TIER[Tier.LITE] = {Deduplicator, Text Cleaner, Format Standardizer}. The three universally-useful tools that cover the most common bookkeeping / RevOps / Klaviyo prep workflows. Other six tools require Core. - i18n: license.tier_lite, license.feature_locked_title, license.feature_locked_body, license.upgrade_link, license.status_locked (en + es). - Per-tool feature gate at every GUI tool page (require_feature_or_render_upgrade) and every tool CLI (guard(feature=...)). A locked tool renders an upgrade prompt + Manage-license button (GUI) or exits with code 2 (CLI). - Home grid: tool cards the user's tier doesn't unlock get a red 🔒 Locked badge in place of green Ready. 2. Trial removed - Activation form's "Start 1-year trial" button removed. - license_cli's `trial` subcommand removed. - activation.trial_button / activation.trial_help i18n keys dropped (pack parity test stays green). - Tier.TRIAL stays in the enum (back-compat with any field- tested trial licenses); LicenseManager._mint stays internal for tests and the seller's key generator. - Decision logged in DECISIONS §9b: a 1-year all-features trial undercuts paid Lite; paid-only keeps tier economics clean. Tests (+29 net): +17 Lite-tier unit/guard tests + 13 Lite-tier GUI tests + 1 trial-absent assertion - 2 trial CLI tests - 1 trial GUI button test. Total: 1995 → 2024. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 17:19:30 +00:00
Michael	e435103113	feat(license): registration + 1-year licenses + tier scaffolding A complete offline licensing layer (no internet at any step): Core - src/license/ — schema (License, Tier, FeatureFlag), HMAC crypto, JSON storage, LicenseManager singleton with activate/renew/ deactivate/issue_trial. Tier-scaffolded so future SKUs can carve per-tool feature sets without consumer-code edits. - scripts/generate_license.py — creator-only key generator. Mints a DTLIC1: blob the buyer pastes into the activation page. GUI - New activation form component (src/gui/components/activation.py). - hide_streamlit_chrome() now inline-renders the activation form when no valid license is present (every page short-circuits to the form until activated). - Sidebar shows tier + days remaining; renewal warning under 30 days. - New pages/_Activate.py for revisiting the form after activation. CLI - src/license_cli.py — activate / renew / status / trial / deactivate commands. Exempt from the guard. - src/cli_license_guard.py — drop-in guard call added to every tool CLI's main(). Lets --help through; respects DATATOOLS_DEV_MODE. i18n - New activation.* and license.* keys in en.json + es.json (page title, form labels, status badges, renewal warnings, error messages). Pack parity test stays green. Test infrastructure - tests/conftest.py autouse fixture sets DATATOOLS_DEV_MODE=1 so the existing 1916 tests continue to pass. - isolated_license_path / activated_license_manager / unactivated_license_manager fixtures for tests that want to drive the real check. Tests (+79) - tests/test_license.py (40): schema, crypto roundtrip, blob encode/decode, tier→feature mapping, activation flow, name/email mismatch rejection, tamper detection, expiration, renewal, dev-mode bypass. - tests/test_license_cli.py (26): every license_cli command + subprocess tests confirming every tool CLI refuses to run without a license, --help always works, DEV_MODE bypasses. - tests/gui/test_activation.py (13): gate blocks without license, passes with trial, activation form submission unlocks the gate, sidebar status, renewal warning, i18n. Total: 1916 → 1995 tests. All pass under the strict warning filter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 16:54:23 +00:00
Michael	c4ce86bd64	feat(i18n): add language-pack scaffold with English and Spanish Introduces ``src/i18n`` with a tiny JSON-backed t() lookup, an in-session language preference, and a sidebar selector wired through ``hide_streamlit_chrome`` so every page picks up the same picker. Covers home, tool cards, findings panel, gate, shutdown, and pickup banner strings. Tests pin pack parity and the farewell-overlay JS escape so future packs can't silently regress. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 15:11:30 +00:00
Michael	ea89c4d399	ui(gui): say 'window' instead of 'browser tab' in shutdown copy Update the Close page intro, the shutdown overlay, and the toast so they all read "you can close this window" — clearer for users running the app in a dedicated browser window rather than a tab. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 13:51:32 +00:00
Michael	701108c9d5	fix(gui): inject farewell overlay into parent DOM on shutdown Replaces the data:-URL navigation (blocked by Chrome since v60 for top-frame navigation) with a direct DOM-append of a full-screen overlay onto the parent document. Uses z-index 2147483647 so it sits above Streamlit's connection-error banner when the websocket drops. Note: still doesn't fully suppress the connection-error banner in testing — the next iteration will render the overlay through Streamlit's own page rather than via a component iframe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 13:49:48 +00:00
Michael	340614e642	feat(gui): promote Quit to a 'Close' menu item in the sidebar nav Move the shutdown control out of the inline sidebar widget and into its own page (pages/99_Close.py), so it appears in the sidebar nav alongside the tool pages. An explicit confirm button on the page prevents accidental nav clicks from killing a live session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 13:38:02 +00:00
Michael	58c0195def	fix(gui): make Quit button actually terminate the server Signalling the process with SIGTERM/SIGINT didn't reliably shut Streamlit down — its tornado/asyncio loop swallowed or deferred the signal, so the browser saw the websocket drop ("Connection error") while the python process kept running. Replace the signal with a daemon-thread ``os._exit(0)`` after a short delay so the current rerun can paint the "shutting down" message before the process is hard-killed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 13:36:36 +00:00
Michael	30e257cc44	fix(gui): move Quit button to sidebar so it shows on every page The footer placement was easy to miss (below all tool cards) and only rendered on the home page. Hook the button into hide_streamlit_chrome() so every page that hides default chrome — home + all 9 tool pages — gets the Quit button at the bottom of the sidebar without per-page edits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 13:33:32 +00:00
Michael	0c25d80146	fix(gui): keep sidebar reopenable + add clean Quit button The chrome-hiding CSS was removing the Streamlit header wholesale, which also took the sidebar's expand chevron with it — a collapsed sidebar became unreopenable. Make the header transparent instead and explicitly preserve the sidebar collapsed-control. Also add a Quit button in the app footer that signals the Streamlit server (SIGTERM, falling back to SIGINT) so closing the GUI returns the shell prompt cleanly instead of leaving Python hung. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 13:30:10 +00:00
Michael	966af8ef94	feat: 3 new tools, format streaming, distribution-ready demo + landing pages Tools shipped this batch (4 → 6 of 9 Ready): 04 Missing Value Handler src/core/missing.py + cli_missing.py + GUI 05 Column Mapper src/core/column_mapper.py + cli_column_map.py + GUI 09 Pipeline Runner src/core/pipeline.py + cli_pipeline.py + GUI with soft tool-dependency graph (recommended, not enforced) and JSON save/load for repeatable weekly cleanups. Format Standardizer reworked for 1 GB international files: • Vectorised dispatch + LRU cache over phone/date/currency/boolean/email • Per-row country / address columns drive parsing • Audit cap (default 10 k rows, ~50 MB RAM) • standardize_file(): chunked streaming entry point (~165 k rows/sec) • currency_decimal="auto" for EU comma-decimal locales • R$ / kr / zł multi-char currency prefixes • cli_format.py with auto-stream above 100 MB inputs Encoding detection arbiter + language-aware probe: Closes the last 4 xfails (cp1250 / mac_iceland / shift_jis_2004 / lying-BOM) via tied-confidence arbiter + Cyrillic / EE-Latin coverage probes. Distribution-readiness assets: • streamlit_app.py — Streamlit Community Cloud entry shim • src/gui/app_demo.py — single-page demo, ?p=<persona> routing, 100-row cap + watermark, free-vs-paid boundary enforced at surface • samples/demo/ — 3 niche datasets + pre-tuned pipeline JSONs • landing/ — 4 static HTML pages (apex chooser + 3 niche), shared CSS, deploy.py URL-substitution script, auto-generated robots.txt + sitemap.xml + 404.html + favicon • docs/PLAN.md, DEMO-PLAN.md, DEPLOYMENT.md, POST-LAUNCH.md, NEXT-STEPS.md — full strategy + measurement + deployment + master checklist Test counts: before: 1,520 passed · 4 skipped · 17 xfailed after: 1,729 passed · 0 skipped · 0 xfailed Tier-1 corpora added: • missing-corpus 3 use cases + 16 edge cases • column-mapper-corpus 3 use cases + 5 edge cases • format-cleaner intl 20-row 13-country stress fixture Engine hardening flushed out by the corpora: • interpolate guards against object-dtype columns • mean/median skip all-NaN columns (silences numpy warning) • fillna runs under future.no_silent_downcasting (silences pandas warning) • mojibake test no longer skips when ftfy installed (monkeypatch path) • drop-row threshold semantics: strict-greater (consistent across rows / cols) • currency_decimal validator allow-set updated for "auto" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 22:31:26 +00:00
Michael	26b9771625	feat(errors): structured error hierarchy + helpful messages everywhere Introduces src/core/errors.py with a small structured error hierarchy that every public entry point now uses. Each error carries the context a user needs to fix it and the context a maintainer needs to trace it. The hierarchy: DataToolsError (base — formats path, column, operation, suggestion) InputValidationError (extends ValueError — bad arg / wrong type) ConfigError (extends ValueError — bad config / options) FileFormatError (extends ValueError — file is not what we expected) FileAccessError (extends OSError — file I/O failure) Subclassing the stdlib bases means existing `except OSError` / `except ValueError` handlers still catch them — no breaking change. Helpers: - ensure_dataframe(value, function=...) — uniform DataFrame guard - ensure_choice(value, name=, choices=) — uniform enum/literal guard - wrap_file_read(path, op, exc) — tag OSError with hint + path - wrap_file_write(path, op, exc) — same, with Windows-aware tip - format_for_user(exc, context=) — user-facing string for st.error / stderr Library hardening: - io.read_file: missing files surface FileAccessError listing whether the parent directory exists, and the suggestion to check the path. - io.read_file: chunk_size <= 0 now raises InputValidationError with a positive-integer suggestion. - io._read_excel: openpyxl BadZipFile / InvalidFileException / pandas ValueError ("sheet not found") wrapped as FileFormatError listing the path and a "list sheets with list_sheets()" hint. - io._detect_excel_header_row: bare except narrowed to specific openpyxl exceptions; falls back gracefully and logs at debug so the real error surfaces from pd.read_excel. - io.write_file: OSError / PermissionError on to_csv/to_excel wrapped with file path and Windows-aware "file may be open in another program" hint. - dedup._parse_date: bare `except Exception` narrowed to (TypeError, ValueError, OutOfBoundsDatetime); failed values logged at debug for survivor-selection forensics. - dedup._select_survivor: KEEP_MOST_RECENT now raises InputValidationError instead of silently falling back to keep_first. - dedup.deduplicate: input validation errors are InputValidationError with operation/column/suggestion fields. - format_standardize.from_dict: invalid FieldType for a column raises ConfigError naming the column AND the bad value AND listing valid values; same for date_order / phone_format / etc. - format_standardize.from_file: OSError / JSON decode wrapped with path AND line/column where parsing failed. - format_standardize.to_file: TypeError on json.dumps wrapped as ConfigError with the suspected source (extra_abbreviations). - format_standardize._apply_field_type: dispatcher's "unknown field type" branch now raises AssertionError (it's an internal invariant, not user error — a new enum value was added without a branch). - format_standardize._resolve_column_types: missing-column error now InputValidationError with a "check for typos / unparsed header" suggestion. - format_standardize.standardize_dataframe: ensure_dataframe at entry. - text_clean.clean_dataframe: ensure_dataframe at entry. - config.to_strategies: invalid Algorithm/NormalizerType wrapped as ConfigError naming the strategy index AND the column. - config.to_survivor_rule: invalid SurvivorRule wrapped as ConfigError listing valid values. - config.from_file: OSError / JSON decode wrapped (mirror of StandardizeOptions.from_file). - fixes.repair_mojibake: ImportError on ftfy now logged at info level with the underlying ImportError so a corrupt-package vs not-installed distinction is visible in the logs. - normalizers.normalize_phone: phonenumbers.NumberParseException now logged at debug when the digits-only fallback drops extension / country-code information — gives a trail when matching results look wrong. GUI / CLI surfaces: - All 9 page handlers (`except Exception as e: st.error(...)`) now use format_for_user(), which renders DataToolsError fields nicely and falls back to "ClassName: message" for unrecognized errors. - 2_Text_Cleaner and 3_Format_Standardizer additionally distinguish UnicodeDecodeError with an "re-save as UTF-8" suggestion before the generic handler. - cli.py's "Error reading file" handler now uses format_for_user() and includes the input path in the prefix. Tests: - tests/test_errors.py — 22 new tests covering: base class formatting, stdlib inheritance, ensure_dataframe / ensure_choice helpers, wrap_file_read / wrap_file_write, format_for_user behavior, and end-to-end integration (missing file, missing dir, bad JSON, bad algorithm, bad enum, missing column). - tests/test_audit_fixes.py + tests/test_io.py — updated 4 tests for the new exception types (InputValidationError replaces TypeError, FileAccessError extends OSError). Full project suite: 1230 passed, 4 skipped, 17 xfailed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 02:35:42 +00:00
Michael	4adeb5c7f3	feat(format): per-cell standardizers + 199-row buyer corpus Adds src/core/format_standardize.py — a per-cell standardizer for dates, phones, emails, addresses, names, currencies, booleans — wired through StandardizeOptions / standardize_dataframe with FieldType registry. Includes: - Date parser handles ISO/US/EU/longform/excel-serial/unix-timestamp/ partial-precision/quarter notation; opt-in French/German/Spanish month dictionaries via month_locales. - Phone via libphonenumber with extension preservation (;ext=N), 001 international prefix handling, error sentinels for placeholders / multi-number cells. - Email lowercase/trim/mailto/angle-bracket strip with optional --gmail-canonical mode. - Address USPS abbreviation expansion or compression (expand=False per corpus § 6.3), state-name → 2-letter conversion, multi-line collapse, PO Box normalization, state-code preservation regardless of input case. - Name handler: Mc/Mac/O'/D' inner caps, hyphen segments, particle lowercasing (von/van/de/da), comma-format reversal, period stripping for titles/suffixes/initials, PhD/MD acronym preservation, conservative mode for mixed-case input. - Currency: auto-detect EU vs US separators, space-thousands, Swiss apostrophe, accounting parens, optional ISO code preservation, error sentinels for percentages/ranges/word-values/ambiguous separators. - Per-domain error_policy ("passthrough" \| "sentinel") for surfacing malformed values as <error: reason> per corpus § 0.3. Test corpus from Business/DataTools/test-cases-format-cleaner copied to test-cases/format-cleaner-corpus/ — 7 fixtures plus FORMATS-CASES.md. tests/test_format_standardize_corpus.py drives all 199 rows through the per-cell standardizers; 0 xfailed. Wires the GUI page (3_Format_Standardizer.py) to "Ready" status. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 02:11:24 +00:00
Michael	3f007ef3d6	feat(gui): 1 GB upload cap + delimiter / encoding diversity caption Streamlit's default file_uploader footer reads "Limit 200MB per file — CSV, TSV, XLSX, XLS" which contradicts the 1 GB efficiency target shipped in `438bc0f` and codified in docs/REQUIREMENTS.md §1.1. Three changes: 1. .streamlit/config.toml — set [server] maxUploadSize = 1024. Footer now reads "Limit 1024MB per file". 2. upload_and_analyze_section (home page) — adds an explicit caption above the uploader stating size limit, supported formats, the four auto-detected delimiters, and the 13 auto-detected encodings (with the Review-page override as the safety net). 3. pickup_or_upload (every tool page that falls back to its own uploader when no home-page upload is present) — same caption, only rendered when the upload accepts CSV/TSV/XLSX/XLS so JSON schema / config uploaders aren't decorated. Test suite: 765 passed, 17 xfailed (no regressions). Home + Review + Deduplicator pages all serve HTTP 200 under the new config. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 21:23:21 +00:00
Michael	f891c6116d	refactor(gui): tool registry + components package for per-tool builds Two low-risk seam moves to enable selling per-tool subsets without breaking the existing all-in-one bundle. Behaviour identical; every existing import still resolves; full pytest suite + every page returns HTTP 200. 1. Tool registry (src/gui/tools_registry.py) — replaces the inline dict-of-dicts in app.py with a Tool dataclass and a TOOLS list. Adds a tier field ("core" today, "pro" / "enterprise" later) and tools_for_tier() / tool_by_id() / display_name() helpers. A per-tool build slices TOOLS at import time without code changes. 2. components package (src/gui/components/) — converts the former single components.py into a package with: _legacy.py — original file, unchanged. __init__.py — re-exports the legacy surface; existing "from src.gui.components import …" calls continue to work. shared.py — hide_streamlit_chrome, pickup_or_upload (every build needs these). gate.py — require_normalization_gate (Pro / Suite SKUs). findings.py — analyzer-finding widgets (drops out of a standalone-Dedup build). dedup_review.py — match-group cards + apply pipeline (drops out of a non-dedup build). The seam modules are narrow re-exports today. As code migrates out of _legacy.py into the focused modules, the public import path stays stable via the shim. E2E: 765 passed, 17 xfailed (unchanged); home page + all 9 tool pages + Review page render HTTP 200; full pipeline (analyze → auto_fix → apply_decisions → output bytes) round-trips on the kitchen-sink fixture with zero high-confidence findings remaining post-fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 20:56:21 +00:00
Michael	82d7fef21e	feat(gate): CSV-normalization gate with confidence-tiered findings Adds a Review & Normalize page that sits between upload and every tool page. The analyzer now tags each finding with confidence (high/medium/low) and a fix_action; the gate auto-applies high-confidence fixes, surfaces medium/low ones for user review, and blocks tool pages on error-level findings until resolved or waived. Core (src/core/): - analyze.py: Finding gains confidence, fix_action, pre_applied; new detectors for encoding_uncertain, encoding_decode_failed; new top- level encoding_override parameter. - fixes.py: registry of fix algorithms keyed by fix_action id. - normalize.py: auto_fix(), apply_decisions(), is_normalized(), and the NormalizationResult / Decision dataclasses the gate consumes. - io.py: detect_encoding tries strict UTF-8 first; repair_bytes now transcodes UTF-16/32 to UTF-8 before NUL-strip (fixes UTF-16 corruption) and normalizes line endings (fixes bare-CR parser crash); empty file handled gracefully instead of EmptyDataError traceback. GUI (src/gui/): - pages/0_Review.py: gate page with per-finding decision controls, encoding override picker (16 codepages + custom), and Advanced output options (encoding, delimiter, line terminator) on the download. - components.py: require_normalization_gate() helper. - pages/1-9: gate guard wired on every tool page. Test corpora: - test-cases/encodings-corpus/: 31 encoded CSV fixtures + 9 reference UTF-8 files + manifest, synced from Business/DataTools. - test-cases/text-cleaner-corpus/test_data/17: synced malformed input (unquoted $1,500.00) for the unquoted-delimiter detector. Tests (94 new): - test_normalize.py (48): finding fields, fix registry, auto_fix scope, decision paths, gate idempotency, output-options helper. - test_encodings_corpus.py (90, 16 xfailed): parametric detection + decode + analyzer-no-crash sweep against the manifest. - test_analyze.py: encoding override + encoding_uncertain detectors. - test_corpus.py: pre-parse repair in the strict reader. run_tests.py: new aliases --tool normalize, --tool encodings, --tool gate; encodings corpus added to --fixtures category. Docs: USER-GUIDE §3.3 covers the gate workflow, encoding override, and output options; TECHNICAL §10.2.1-10.2.4 documents the analyzer schema, gate API, Review page, and pre-parse repair pipeline; CLI-REFERENCE adds the analyzer JSON schema with the new fields; README links to all of it. Suite: 765 passed, 17 xfailed (was 458 passed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 20:35:27 +00:00
Michael	e9c490ae1b	feat(gui): hidden-char-aware preview tables in Text Cleaner The Text Cleaner had two st.dataframe previews — the initial upload preview ("Preview: filename") and the post-clean "Cleaned preview" table — that both rendered cells with the same browser-collapses- whitespace, hides-invisibles problem the analyzer findings panel had before commit `1049c03`. components.render_hidden_aware_preview(df, n_rows, caption) renders a DataFrame as an HTML table where: - every cell uses visualize_hidden_html(mark_outer_whitespace=True), so leading/trailing ASCII spaces appear as per-character "·" badges - white-space: pre-wrap on every cell preserves internal multi-space runs and embedded newlines visually - headers route through the same visualizer so dirty column names (NBSP padding, ZWSP, smart quotes) show their badges too - NaN cells render as a faint "NaN" placeholder - rows are sticky-headed and scrollable inside a 26rem capped container so a 10-row preview doesn't push the rest of the UI off screen 2_Text_Cleaner.py wires it into both previews: - The upload preview gains its own "Show hidden characters in preview" toggle (default on). - The cleaned preview reuses the existing show_hidden toggle that already governs the Examples changes table, so one switch controls the whole results section. Either toggle off falls back to the original st.dataframe view. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 16:26:30 +00:00
Michael	1049c033cb	feat(gui): visualize leading/trailing whitespace in analyzer findings The analyzer's "Run Analysis" panel rendered sample cells via st.dataframe, which (a) silently collapses leading/trailing ASCII whitespace and (b) displays NBSP/ZWSP/control chars as nothing. The user couldn't see the exact pollution they were being told about. visualize_hidden_html gains a mark_outer_whitespace=True option that wraps each leading and trailing ASCII space/tab in its own badge with a "SP LEAD" / "SP TRAIL" tooltip. The badges are per-character so the user can count exactly how much padding the cleaner will strip. components.render_findings_panel now: - injects hidden_char_css() once at the top of the panel - replaces st.dataframe(samples) with a custom HTML table - renders the value column with mark_outer_whitespace=True - applies white-space: pre-wrap on value cells so any internal ASCII whitespace also stays visible (browsers collapse runs by default) Four new tests cover: leading+trailing badge counts, default-off behaviour, leading tab badge, all-whitespace string treated entirely as leading. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 16:21:39 +00:00
Michael	e12615357d	fix(gui): use page paths relative to streamlit entrypoint st.page_link resolves paths from the directory of the entrypoint file (src/gui/app.py), so the existing "src/gui/{page_slug}" prefix doubled up and produced StreamlitPageNotFoundError on first upload + analysis (reproducible on Windows; the stack trace from a Windows install surfaced the bug). The _TOOL_PAGE_PATHS map already stores the correct relative form ("pages/2_Text_Cleaner.py"); just pass the slug straight to st.page_link. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 16:17:50 +00:00
Michael	90ceada2d1	feat(text_clean): visualize hidden characters in the cleaner GUI The whole point of the cleaner is to remove characters the user can't see — which makes the "before / after" preview nearly useless by default. A cell with NBSP padding looks identical to a cell with regular spaces. Two new helpers in src.core.text_clean: visualize_hidden_text(s) Plain-text rendering: each invisible/control/smart character is replaced by a glyph + [LABEL] (e.g. "·[NBSP]", "→[TAB]", "∅[ZWSP]", """[L DQUOTE]"). Suitable for terminal output, CSV exports, anywhere HTML is wrong. Unmapped C0 controls render as [U+XXXX]. visualize_hidden_html(s) + hidden_char_css() HTML rendering: every flagged character is wrapped in a <span> with a CSS class and a tooltip showing the codepoint and label. Pair with hidden_char_css() to inject the matching styles. Three colour bands (whitespace, special, control) so the user can scan an audit table and spot what's being changed at a glance. Mapping covers: ASCII tab/LF/CR, every NBSP variant (U+00A0, U+202F, U+2009, …), zero-width family (ZWSP/ZWNJ/ZWJ/WJ/BOM/SHY), bidi marks (LRM/RLM), all smart quotes, en/em dashes, ellipsis, prime/double-prime, and guillemets. ASCII printable text passes through; HTML output also escapes &/</> . GUI wiring (src/gui/pages/2_Text_Cleaner.py) The "Examples" changes table now defaults to a hidden-char-rendered HTML view: every NBSP/ZWSP/smart-quote/control char is shown with its badge and codepoint tooltip. A "Show hidden characters" toggle lets the user fall back to the raw st.dataframe view if they prefer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 16:14:14 +00:00
Michael	794d4cda94	feat(gui): tool pages pick up the home-page upload via session_state Closes the last UX gap from the analyzer review: each tool page had its own st.file_uploader, so users had to upload the same file twice (once on the home page for analysis, once on each tool page). components.pickup_or_upload(label, key, types) returns either: - a _StashedUpload shim wrapping the home-page bytes (when present and the user hasn't asked for a different file on this page), or - the standard st.file_uploader (when nothing is stashed or the user clicked "Use a different file"). _StashedUpload duck-types Streamlit's UploadedFile (.name, .size, .getvalue(), .read()) so existing tool-page code consumes it without changes. A "Use a different file" button per page sets a session-state override flag; a "Switch back to upload-screen file" button clears it. Wired into 2_Text_Cleaner.py and 1_Deduplicator.py — the two pages with working uploaders today. The remaining stub pages adopt it when they're implemented; the helper is the public surface they'll use. Verified by smoke-launching streamlit headless and curling the home, text-cleaner, and deduplicator routes — all return 200 with no errors in the server log. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 16:09:51 +00:00
Michael	a8943f29eb	feat(gui): wire analyzer into home page with findings panel and tool badges Home page (src/gui/app.py) gains an upload + analyze section above the tool grid: file uploader, "Run analysis" / "Skip" buttons, and a findings panel grouped by destination tool. Tool cards now carry a "N findings" badge when the active session's findings reference that tool, so the user sees at a glance which tools their just-uploaded file would benefit from. src/gui/components.py adds the shared GUI surface: - TOOL_DISPLAY_NAMES + tool_display_name() — single source of truth for GUI labels, keeping detector tool ids decoupled from the UI. - render_findings_panel(findings) — severity icons, expander per tool, open-tool page link, sample-cells dataframe. - upload_and_analyze_section() — the home-page widget; stashes file bytes and findings in session_state so future tool pages can pick up the existing upload instead of re-prompting. - findings_count_for_tool(tool_id) — used by app.py to badge cards. CSV/TSV uploads run through repair_bytes() before analysis, so the user also sees csv_bom_stripped / csv_smart_quotes_folded findings synthesized from the pre-parse repair pass. Excel uploads skip that step. The Text Cleaner tool card flips from "Coming Soon" to "Ready" — that has been true since the v3.0 implementation and the home page just hadn't been updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 15:53:22 +00:00
Michael	54f92ae47e	feat: implement text cleaner (script 02) with CLI, GUI, and tests Builds 02_text_cleaner.py from stub to working: character-level hygiene for CSV/Excel inputs covering trim, whitespace collapse, smart-character folding, Unicode NFC/NFKC, BOM strip, zero-width strip, control-char strip, line-ending normalization, and per-column case conversion. Three presets (minimal/excel-hygiene/paranoid) keep the buyer surface small. - src/core/text_clean.py: pure helpers + CleanOptions/CleanResult + clean_dataframe with dtype-safe column selection - src/cli_text_clean.py: Typer CLI mirroring the dedup CLI shape (dry-run by default, --apply writes cleaned + changes audit, JSON config save/load) - src/gui/pages/2_Text_Cleaner.py: real Streamlit page with preset picker, advanced toggles, preview, before/after metrics, and three download buttons - tests/test_text_clean.py + test_cli_text_clean.py: 92 new tests covering edge cases E1-E50 from the spec - samples/messy_text.csv: demo dataset surfacing UC1, UC3, UC6, UC10 in 10 rows - test-cases/uc16-uc26 + ec05-ec09: per-use-case and per-edge-case fixtures Docs: TECHNICAL.md §10.2 (full Tier 1/2/3 spec), DECISIONS.md v1.7 entry locking the spec, CLI-REFERENCE.md gains the text cleaner section, README.md gains a top-level Text Cleaner block, USER-GUIDE.md status row 02 promoted Skeleton -> Working. 200/200 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 15:14:15 +00:00
Michael	b2ca04e6f4	fix: scale app content to 85% zoom Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-29 01:30:58 +00:00
Michael	223148283d	revert: remove 75% zoom, 100% fits correctly with chrome hidden Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-29 01:29:42 +00:00
Michael	1c609214b0	fix: scale app content to 75% to fit window Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-29 01:28:12 +00:00
Michael	dc48578c7e	feat: launch Chrome in app mode for chromeless window python -m src.gui now opens Chrome with --app flag, hiding the address bar, tabs, and bookmarks bar. Falls back to default browser if Chrome is not found. Headless flag passed via CLI so streamlit run directly still auto-opens normally. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-29 01:24:54 +00:00
Michael	35ea21ad33	feat: hide Streamlit chrome for app-like appearance Add shared hide_streamlit_chrome() helper that removes header bar, hamburger menu, footer, and deploy button via CSS injection. Called on every page. Add .streamlit/config.toml with minimal toolbar mode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-29 01:20:54 +00:00
Michael	f2fdc10af7	feat: refactor GUI to multi-page Streamlit app with 9 tool pages Convert single-page deduplicator into a multi-page suite. Home page shows tool card grid. Deduplicator extracted to its own page (fully working). 8 stub pages added for Text Cleaner, Format Standardizer, Missing Values, Column Mapper, Outlier Detector, Multi-File Merger, Validator & Reporter, and Pipeline Runner — each with functional file upload and coming-soon UI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-29 01:16:12 +00:00
Michael	27fe87c4fe	fix: simplify upload placeholder text Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-29 00:56:32 +00:00
Michael	8f1fb690ae	chore: bump version to v3.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-29 00:54:37 +00:00
Michael	ec9f100e67	feat: add custom delimiter input and update subtitle text Delimiter dropdown now includes "Other" option with a text input for custom delimiter characters. Subtitle updated to mention delimited text. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-29 00:46:12 +00:00

1 2 3

109 Commits