Two unrelated UX issues addressed in one sweep across all nine tool
pages because they share the same edit surface.
(1) Sticky footer replaces the top + bottom back-link buttons.
Reported: a big white empty footer space at the bottom of every page;
the Back to Home button at the top scrolled out of view on long pages.
New ``render_sticky_footer()`` helper in ``components/_legacy.py``
injects a fixed-position bar at ``bottom: 0`` of the viewport with:
- A border-top so it visually reads as a non-movable bar.
- A semi-transparent background (rgba 0.96 + ``backdrop-filter: blur``)
so content underneath shows through faintly when the user scrolls.
- A styled ``<a href="home">`` anchor (not an ``st.button``) because
Streamlit widgets can't be CSS-positioned reliably — Streamlit owns
the widget's DOM container and re-mounts it on every rerun. A real
anchor sits exactly where the CSS puts it and triggers Streamlit's
URL routing to the home page.
- ``padding-bottom: 3.5rem`` on the main container so the last widget
isn't hidden behind the bar.
Called once per tool page, immediately after ``hide_streamlit_chrome()``
so it renders even on pages that ``st.stop()`` early before any other
content runs. The old top-and-bottom ``back_to_home_link()`` calls are
removed from every tool page; their entry/exit points were dropping
the button when the script short-circuited.
(2) Tool-page headers now localize.
Reported: switching the sidebar language picker to Spanish left the
tool page's title + caption in English. Root cause: every page had
hard-coded ``st.title("✂️ Clean Text")`` / ``st.caption("Trim
whitespace...")`` strings.
Added per-tool ``tools.<id>.page_title`` and
``tools.<id>.page_caption`` keys to ``en.json`` and ``es.json`` for
all nine tools. Routed each page's title/caption call through ``t()``.
Verified: with ``ui_lang=es`` set, the Clean Text page now renders
"✂️ Limpiar texto" + the Spanish caption.
Updated ``tests/gui/test_smoke.py::EXPECTED_SUBSTRINGS`` so the
``es`` column for each tool page asserts the actual Spanish string
(was a duplicate of the English string back when the page bodies
were English-only).
2220 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two issues, same fix surface.
(1) Reported crash on Back-to-Home:
StreamlitAPIException: Could not find page: app.py.
``st.switch_page("app.py")`` doesn't work under ``st.navigation`` —
the entry script is the nav manager itself and is not a registered
page. The fix needs to pass an ``st.Page`` object whose script
identity matches one registered in the nav.
First-pass attempt (``from src.gui.app import _home_page``) hit a
worse failure: importing ``app.py`` from inside a tool-page render
re-executes the nav setup with the WRONG "main script" context, so
every ``st.Page("pages/N_foo.py", ...)`` call in ``_build_navigation``
fails with "file could not be found".
Extract the home renderer into its own module ``src/gui/_home.py``
which has no top-level Streamlit side effects. Both the nav manager
and the back-link helper import ``_home_page`` from there. The Page
object built at click time has the same callable identity as the one
registered, so ``st.switch_page`` resolves it.
(2) Reported UX: the back button scrolled out of view on long pages.
Add a second ``back_to_home_link(key="_back_to_home_link_bottom")``
call near the footer of every tool page (1-9). The unique key avoids
widget-id collision with the top instance. Coming-Soon stubs get it
unconditionally; Ready tools render it only after a result exists
because the page short-circuits with ``st.stop()`` before then —
when no result is on screen the page is short enough that the top
link is sufficient.
2220 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reported symptom: only the FIRST download button in a multi-button
row pops the browser save dialog. The second and third do nothing on
click. Affects every tool page that exposes (cleaned + audit + config)
downloads.
Root cause is ``st.download_button`` itself — when several render in
the same script pass, the click-to-bytes wiring on the browser side
mis-routes and only one button's data is actually exposed. Explicit
``key`` arguments don't fix it; ``use_container_width=True`` doesn't
help either; we confirmed this in the Text Cleaner reverts.
Replace the widget with a real ``<a download="file" href="data:...">``
anchor rendered via ``st.markdown(..., unsafe_allow_html=True)``.
Bypasses Streamlit's widget machinery entirely; behaves identically to
a native browser download. Side benefit: clicking it does NOT trigger
a script rerun, so other in-flight UI state survives.
New helper ``html_download_button`` lives in
``src/gui/components/_legacy.py`` (exported from ``components``). API:
html_download_button(
label, data,
*, file_name, mime="application/octet-stream",
disabled=False, help=None, use_container_width=True,
)
Translation pattern applied across every tool page (and shared
``results_summary`` / ``config_panel`` widgets in ``_legacy.py``):
- ``st.download_button(`` -> ``html_download_button(``
- ``data=foo_bytes`` kwarg -> positional second arg
- ``key="..."`` -> dropped (helper has no widget identity)
- ``use_container_width=True`` -> dropped (default)
- ``disabled=`` and ``help=`` pass through unchanged
- Pre-computed byte buffers kept where they were
Total: 17 sites replaced (3 in Text Cleaner, 3 in Format
Standardizer, 3 in Fix Missing Values, 3 in Map Columns, 3 in
Automated Workflows, 2 in Find Duplicates page + 4 in shared
_legacy.py widgets used by Find Duplicates).
Caveat: data: URLs balloon by 33% (base64). Fine for tool output
sizes we ship; if a future result topped a few hundred MB we'd want a
Blob-URL fallback.
The marketing demo at src/gui/app_demo.py keeps its single
st.download_button — single button, no collision, no need to switch.
2008 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply the Clean Text page's post-run UX pattern to every other Ready
tool page (Find Duplicates, Standardize Formats, Fix Missing Values,
Map Columns, Automated Workflows) for consistency and ease of use.
Per page:
1. Preview wrapped in ``st.expander(f"Preview: {filename}",
expanded=not _has_result)``. Open before a result exists, folded
afterwards.
2. Options / configuration controls wrapped in
``st.expander("Options", expanded=not _has_result)``. Inner
sub-expanders preserved (Streamlit 1.36+ supports nesting).
3. After the primary action stashes the result, set a one-shot
``_<tool>_scroll_to_results`` flag in session state and call
``st.rerun()`` so the preview + options expanders see the new
state on the next pass and collapse themselves.
4. ``<div id="<tool>-results-anchor" style="height:1px">`` placed
immediately before the Results subheader.
5. End-of-page: pop the scroll flag and inject a tiny
``streamlit.components.v1.html`` iframe whose ``<script>`` calls
``scrollIntoView`` on the parent document's anchor. One-shot, so
unrelated reruns (toggling Show-hidden, etc.) don't yank the
viewport.
6. Download buttons hardened against the multi-button Streamlit
footgun: byte buffers pre-computed outside the column scopes,
explicit unique ``key="<tool>_dl_<purpose>"`` per button,
``use_container_width=True``, and previously-conditional buttons
now render unconditionally with ``disabled=True`` + a help
tooltip when the underlying data is empty so layout stays steady.
Per-page judgment calls (already noted in agent reports):
- Find Duplicates: sheet picker and delimiter selector kept OUTSIDE
expanders (the user still needs to see them when a file fails to
parse).
- Fix Missing Values: missingness profile wrapped INSIDE the Options
expander together with Strategy — the Results section already
shows a before/after missingness comparison that supersedes the
static input profile.
- Map Columns: all three subsections (Target schema, Strategy,
Mapping) wrapped under one outer Options expander, matching the
Text Cleaner pattern.
- Automated Workflows: inner "Recommended tool order" expander stays
nested inside the outer Options wrap; Run button stays outside
Options so the user can re-run after tweaking the (collapsed)
editor.
2008 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Multi-file workflow: a user uploads several files on Home, clicks
"Open <Tool>" on one file's findings, lands on a tool page. The
sidebar lets them get back to Home, but a top-of-page back affordance
is more discoverable and keeps the hand in the same screen region as
the upload list they're working through.
- New ``back_to_home_link()`` helper in components/_legacy.py renders
a secondary button that calls ``st.switch_page("app.py")`` — under
``st.navigation`` that routes to the default (Home) page.
- Wired into every tool page (1-9) directly after
``hide_streamlit_chrome()`` and BEFORE the license gate so a Lite
user who lands on a locked tool can navigate away without paying.
- New i18n key ``nav.back_to_home`` ("← Back to Home" /
"← Volver al inicio") in en/es packs.
2008 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Home is now the only entry point: the "Run analysis" button on the
upload section IS the review step (findings render inline via
render_findings_panel). Tool pages no longer gate on a passed
normalization — running the analyzer is sufficient context.
Removed:
- src/gui/pages/0_Review.py
- src/gui/components/gate.py (re-export seam)
- require_normalization_gate() in src/gui/components/_legacy.py
- "review" section enum in tools_registry.py
- Data Review entry in app.py navigation
- require_normalization_gate() calls + imports in all nine tool pages
- tests/gui/test_gate.py (whole file)
- TestReviewWorkflow in tests/gui/test_workflows.py
- 0_Review entry in tests/gui/test_smoke.py PAGE_SLUGS
- stash_upload's normalization_result+normalization_for stashing
- stash_upload_without_gate (was the gate's negative-path helper)
2017 tests pass (16 retired with the gate flow).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sidebar nav now groups tools under Data Review / Data Cleaners /
Transformations / Automations via st.navigation, replacing the flat
auto-discovered list. Tool display names switch to action-first
phrasing (Find Duplicates, Fix Missing Values, Find Unusual Values,
Standardize Formats, Clean Text, Quality Check, Map Columns, Combine
Files, Automated Workflows) in EN + ES packs and on each page's H1.
The Data Cleaners section follows the requested order: Missing
Values → Outliers → Text Cleaner → Format Standardizer → Deduplicator
→ Quality Check. (Text Cleaner kept inside cleaners since the request
didn't list it but the tool still ships.) Registry now carries a
section field; helpers added: tools_in_section(), section_label().
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two coupled changes:
1. Lite tier
- New Tier.LITE in src/license/schema.py.
- FEATURES_BY_TIER[Tier.LITE] = {Deduplicator, Text Cleaner,
Format Standardizer}. The three universally-useful tools that
cover the most common bookkeeping / RevOps / Klaviyo prep
workflows. Other six tools require Core.
- i18n: license.tier_lite, license.feature_locked_title,
license.feature_locked_body, license.upgrade_link,
license.status_locked (en + es).
- Per-tool feature gate at every GUI tool page
(require_feature_or_render_upgrade) and every tool CLI
(guard(feature=...)). A locked tool renders an upgrade
prompt + Manage-license button (GUI) or exits with code 2
(CLI).
- Home grid: tool cards the user's tier doesn't unlock get a
red 🔒 Locked badge in place of green Ready.
2. Trial removed
- Activation form's "Start 1-year trial" button removed.
- license_cli's `trial` subcommand removed.
- activation.trial_button / activation.trial_help i18n keys
dropped (pack parity test stays green).
- Tier.TRIAL stays in the enum (back-compat with any field-
tested trial licenses); LicenseManager._mint stays internal
for tests and the seller's key generator.
- Decision logged in DECISIONS §9b: a 1-year all-features
trial undercuts paid Lite; paid-only keeps tier economics
clean.
Tests (+29 net): +17 Lite-tier unit/guard tests + 13 Lite-tier
GUI tests + 1 trial-absent assertion - 2 trial CLI tests - 1
trial GUI button test. Total: 1995 → 2024.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduces src/core/errors.py with a small structured error hierarchy
that every public entry point now uses. Each error carries the
context a user needs to fix it and the context a maintainer needs to
trace it.
The hierarchy:
DataToolsError (base — formats path, column, operation, suggestion)
InputValidationError (extends ValueError — bad arg / wrong type)
ConfigError (extends ValueError — bad config / options)
FileFormatError (extends ValueError — file is not what we expected)
FileAccessError (extends OSError — file I/O failure)
Subclassing the stdlib bases means existing `except OSError` /
`except ValueError` handlers still catch them — no breaking change.
Helpers:
- ensure_dataframe(value, function=...) — uniform DataFrame guard
- ensure_choice(value, name=, choices=) — uniform enum/literal guard
- wrap_file_read(path, op, exc) — tag OSError with hint + path
- wrap_file_write(path, op, exc) — same, with Windows-aware tip
- format_for_user(exc, context=) — user-facing string for st.error / stderr
Library hardening:
- io.read_file: missing files surface FileAccessError listing whether
the parent directory exists, and the suggestion to check the path.
- io.read_file: chunk_size <= 0 now raises InputValidationError with
a positive-integer suggestion.
- io._read_excel: openpyxl BadZipFile / InvalidFileException / pandas
ValueError ("sheet not found") wrapped as FileFormatError listing
the path and a "list sheets with list_sheets()" hint.
- io._detect_excel_header_row: bare except narrowed to specific
openpyxl exceptions; falls back gracefully and logs at debug so
the real error surfaces from pd.read_excel.
- io.write_file: OSError / PermissionError on to_csv/to_excel wrapped
with file path and Windows-aware "file may be open in another
program" hint.
- dedup._parse_date: bare `except Exception` narrowed to
(TypeError, ValueError, OutOfBoundsDatetime); failed values
logged at debug for survivor-selection forensics.
- dedup._select_survivor: KEEP_MOST_RECENT now raises
InputValidationError instead of silently falling back to keep_first.
- dedup.deduplicate: input validation errors are InputValidationError
with operation/column/suggestion fields.
- format_standardize.from_dict: invalid FieldType for a column raises
ConfigError naming the column AND the bad value AND listing valid
values; same for date_order / phone_format / etc.
- format_standardize.from_file: OSError / JSON decode wrapped with
path AND line/column where parsing failed.
- format_standardize.to_file: TypeError on json.dumps wrapped as
ConfigError with the suspected source (extra_abbreviations).
- format_standardize._apply_field_type: dispatcher's "unknown field
type" branch now raises AssertionError (it's an internal invariant,
not user error — a new enum value was added without a branch).
- format_standardize._resolve_column_types: missing-column error now
InputValidationError with a "check for typos / unparsed header"
suggestion.
- format_standardize.standardize_dataframe: ensure_dataframe at entry.
- text_clean.clean_dataframe: ensure_dataframe at entry.
- config.to_strategies: invalid Algorithm/NormalizerType wrapped as
ConfigError naming the strategy index AND the column.
- config.to_survivor_rule: invalid SurvivorRule wrapped as ConfigError
listing valid values.
- config.from_file: OSError / JSON decode wrapped (mirror of
StandardizeOptions.from_file).
- fixes.repair_mojibake: ImportError on ftfy now logged at info level
with the underlying ImportError so a corrupt-package vs not-installed
distinction is visible in the logs.
- normalizers.normalize_phone: phonenumbers.NumberParseException now
logged at debug when the digits-only fallback drops extension /
country-code information — gives a trail when matching results
look wrong.
GUI / CLI surfaces:
- All 9 page handlers (`except Exception as e: st.error(...)`) now
use format_for_user(), which renders DataToolsError fields nicely
and falls back to "ClassName: message" for unrecognized errors.
- 2_Text_Cleaner and 3_Format_Standardizer additionally distinguish
UnicodeDecodeError with an "re-save as UTF-8" suggestion before
the generic handler.
- cli.py's "Error reading file" handler now uses format_for_user()
and includes the input path in the prefix.
Tests:
- tests/test_errors.py — 22 new tests covering: base class formatting,
stdlib inheritance, ensure_dataframe / ensure_choice helpers,
wrap_file_read / wrap_file_write, format_for_user behavior, and
end-to-end integration (missing file, missing dir, bad JSON, bad
algorithm, bad enum, missing column).
- tests/test_audit_fixes.py + tests/test_io.py — updated 4 tests for
the new exception types (InputValidationError replaces TypeError,
FileAccessError extends OSError).
Full project suite: 1230 passed, 4 skipped, 17 xfailed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a Review & Normalize page that sits between upload and every tool
page. The analyzer now tags each finding with confidence (high/medium/low)
and a fix_action; the gate auto-applies high-confidence fixes, surfaces
medium/low ones for user review, and blocks tool pages on error-level
findings until resolved or waived.
Core (src/core/):
- analyze.py: Finding gains confidence, fix_action, pre_applied; new
detectors for encoding_uncertain, encoding_decode_failed; new top-
level encoding_override parameter.
- fixes.py: registry of fix algorithms keyed by fix_action id.
- normalize.py: auto_fix(), apply_decisions(), is_normalized(), and
the NormalizationResult / Decision dataclasses the gate consumes.
- io.py: detect_encoding tries strict UTF-8 first; repair_bytes now
transcodes UTF-16/32 to UTF-8 before NUL-strip (fixes UTF-16 corruption)
and normalizes line endings (fixes bare-CR parser crash); empty file
handled gracefully instead of EmptyDataError traceback.
GUI (src/gui/):
- pages/0_Review.py: gate page with per-finding decision controls,
encoding override picker (16 codepages + custom), and Advanced output
options (encoding, delimiter, line terminator) on the download.
- components.py: require_normalization_gate() helper.
- pages/1-9: gate guard wired on every tool page.
Test corpora:
- test-cases/encodings-corpus/: 31 encoded CSV fixtures + 9 reference
UTF-8 files + manifest, synced from Business/DataTools.
- test-cases/text-cleaner-corpus/test_data/17: synced malformed input
(unquoted $1,500.00) for the unquoted-delimiter detector.
Tests (94 new):
- test_normalize.py (48): finding fields, fix registry, auto_fix scope,
decision paths, gate idempotency, output-options helper.
- test_encodings_corpus.py (90, 16 xfailed): parametric detection +
decode + analyzer-no-crash sweep against the manifest.
- test_analyze.py: encoding override + encoding_uncertain detectors.
- test_corpus.py: pre-parse repair in the strict reader.
run_tests.py: new aliases --tool normalize, --tool encodings, --tool gate;
encodings corpus added to --fixtures category.
Docs: USER-GUIDE §3.3 covers the gate workflow, encoding override, and
output options; TECHNICAL §10.2.1-10.2.4 documents the analyzer schema,
gate API, Review page, and pre-parse repair pipeline; CLI-REFERENCE adds
the analyzer JSON schema with the new fields; README links to all of it.
Suite: 765 passed, 17 xfailed (was 458 passed).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the last UX gap from the analyzer review: each tool page had its
own st.file_uploader, so users had to upload the same file twice (once
on the home page for analysis, once on each tool page).
components.pickup_or_upload(label, key, types) returns either:
- a _StashedUpload shim wrapping the home-page bytes (when present and
the user hasn't asked for a different file on this page), or
- the standard st.file_uploader (when nothing is stashed or the user
clicked "Use a different file").
_StashedUpload duck-types Streamlit's UploadedFile (.name, .size,
.getvalue(), .read()) so existing tool-page code consumes it without
changes. A "Use a different file" button per page sets a session-state
override flag; a "Switch back to upload-screen file" button clears it.
Wired into 2_Text_Cleaner.py and 1_Deduplicator.py — the two pages with
working uploaders today. The remaining stub pages adopt it when they're
implemented; the helper is the public surface they'll use.
Verified by smoke-launching streamlit headless and curling the home,
text-cleaner, and deduplicator routes — all return 200 with no errors
in the server log.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add shared hide_streamlit_chrome() helper that removes header bar,
hamburger menu, footer, and deploy button via CSS injection. Called
on every page. Add .streamlit/config.toml with minimal toolbar mode.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Convert single-page deduplicator into a multi-page suite. Home page shows
tool card grid. Deduplicator extracted to its own page (fully working).
8 stub pages added for Text Cleaner, Format Standardizer, Missing Values,
Column Mapper, Outlier Detector, Multi-File Merger, Validator & Reporter,
and Pipeline Runner — each with functional file upload and coming-soon UI.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>