Introduces ``src/i18n`` with a tiny JSON-backed t() lookup, an in-session
language preference, and a sidebar selector wired through
``hide_streamlit_chrome`` so every page picks up the same picker. Covers
home, tool cards, findings panel, gate, shutdown, and pickup banner
strings. Tests pin pack parity and the farewell-overlay JS escape so
future packs can't silently regress.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds src/core/format_standardize.py — a per-cell standardizer for dates,
phones, emails, addresses, names, currencies, booleans — wired through
StandardizeOptions / standardize_dataframe with FieldType registry.
Includes:
- Date parser handles ISO/US/EU/longform/excel-serial/unix-timestamp/
partial-precision/quarter notation; opt-in French/German/Spanish month
dictionaries via month_locales.
- Phone via libphonenumber with extension preservation (;ext=N), 001
international prefix handling, error sentinels for placeholders /
multi-number cells.
- Email lowercase/trim/mailto/angle-bracket strip with optional
--gmail-canonical mode.
- Address USPS abbreviation expansion or compression (expand=False per
corpus § 6.3), state-name → 2-letter conversion, multi-line collapse,
PO Box normalization, state-code preservation regardless of input case.
- Name handler: Mc/Mac/O'/D' inner caps, hyphen segments, particle
lowercasing (von/van/de/da), comma-format reversal, period stripping
for titles/suffixes/initials, PhD/MD acronym preservation, conservative
mode for mixed-case input.
- Currency: auto-detect EU vs US separators, space-thousands, Swiss
apostrophe, accounting parens, optional ISO code preservation, error
sentinels for percentages/ranges/word-values/ambiguous separators.
- Per-domain error_policy ("passthrough" | "sentinel") for surfacing
malformed values as <error: reason> per corpus § 0.3.
Test corpus from Business/DataTools/test-cases-format-cleaner copied to
test-cases/format-cleaner-corpus/ — 7 fixtures plus FORMATS-CASES.md.
tests/test_format_standardize_corpus.py drives all 199 rows through the
per-cell standardizers; 0 xfailed.
Wires the GUI page (3_Format_Standardizer.py) to "Ready" status.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two low-risk seam moves to enable selling per-tool subsets without
breaking the existing all-in-one bundle. Behaviour identical; every
existing import still resolves; full pytest suite + every page returns
HTTP 200.
1. **Tool registry** (src/gui/tools_registry.py) — replaces the
inline dict-of-dicts in app.py with a Tool dataclass and a TOOLS
list. Adds a tier field ("core" today, "pro" / "enterprise" later)
and tools_for_tier() / tool_by_id() / display_name() helpers. A
per-tool build slices TOOLS at import time without code changes.
2. **components package** (src/gui/components/) — converts the former
single components.py into a package with:
_legacy.py — original file, unchanged.
__init__.py — re-exports the legacy surface; existing
"from src.gui.components import …" calls
continue to work.
shared.py — hide_streamlit_chrome, pickup_or_upload
(every build needs these).
gate.py — require_normalization_gate (Pro / Suite SKUs).
findings.py — analyzer-finding widgets (drops out of a
standalone-Dedup build).
dedup_review.py — match-group cards + apply pipeline (drops out
of a non-dedup build).
The seam modules are narrow re-exports today. As code migrates out
of _legacy.py into the focused modules, the public import path
stays stable via the shim.
E2E: 765 passed, 17 xfailed (unchanged); home page + all 9 tool pages
+ Review page render HTTP 200; full pipeline (analyze → auto_fix →
apply_decisions → output bytes) round-trips on the kitchen-sink
fixture with zero high-confidence findings remaining post-fix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>