Tools shipped this batch (4 → 6 of 9 Ready):
04 Missing Value Handler src/core/missing.py + cli_missing.py + GUI
05 Column Mapper src/core/column_mapper.py + cli_column_map.py + GUI
09 Pipeline Runner src/core/pipeline.py + cli_pipeline.py + GUI
with soft tool-dependency graph (recommended,
not enforced) and JSON save/load for repeatable
weekly cleanups.
Format Standardizer reworked for 1 GB international files:
• Vectorised dispatch + LRU cache over phone/date/currency/boolean/email
• Per-row country / address columns drive parsing
• Audit cap (default 10 k rows, ~50 MB RAM)
• standardize_file(): chunked streaming entry point (~165 k rows/sec)
• currency_decimal="auto" for EU comma-decimal locales
• R$ / kr / zł multi-char currency prefixes
• cli_format.py with auto-stream above 100 MB inputs
Encoding detection arbiter + language-aware probe:
Closes the last 4 xfails (cp1250 / mac_iceland / shift_jis_2004 / lying-BOM)
via tied-confidence arbiter + Cyrillic / EE-Latin coverage probes.
Distribution-readiness assets:
• streamlit_app.py — Streamlit Community Cloud entry shim
• src/gui/app_demo.py — single-page demo, ?p=<persona> routing,
100-row cap + watermark, free-vs-paid boundary enforced at surface
• samples/demo/ — 3 niche datasets + pre-tuned pipeline JSONs
• landing/ — 4 static HTML pages (apex chooser + 3 niche),
shared CSS, deploy.py URL-substitution script,
auto-generated robots.txt + sitemap.xml + 404.html + favicon
• docs/PLAN.md, DEMO-PLAN.md, DEPLOYMENT.md, POST-LAUNCH.md, NEXT-STEPS.md
— full strategy + measurement + deployment + master checklist
Test counts:
before: 1,520 passed · 4 skipped · 17 xfailed
after: 1,729 passed · 0 skipped · 0 xfailed
Tier-1 corpora added:
• missing-corpus 3 use cases + 16 edge cases
• column-mapper-corpus 3 use cases + 5 edge cases
• format-cleaner intl 20-row 13-country stress fixture
Engine hardening flushed out by the corpora:
• interpolate guards against object-dtype columns
• mean/median skip all-NaN columns (silences numpy warning)
• fillna runs under future.no_silent_downcasting (silences pandas warning)
• mojibake test no longer skips when ftfy installed (monkeypatch path)
• drop-row threshold semantics: strict-greater (consistent across rows / cols)
• currency_decimal validator allow-set updated for "auto"
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
52 lines
2.0 KiB
Python
52 lines
2.0 KiB
Python
"""Streamlit Community Cloud entry point — public demo app.
|
|
|
|
This is the file Streamlit Community Cloud auto-discovers when you
|
|
deploy from this repository: leave the "Main file path" field at its
|
|
default (``streamlit_app.py``) and it just works.
|
|
|
|
Why this lives at the repo root, not in ``src/gui/``:
|
|
Streamlit auto-detects sibling files inside a ``pages/`` directory
|
|
next to the entry script and renders them as additional pages in
|
|
the sidebar. The full product GUI's pages live in
|
|
``src/gui/pages/`` — pointing the Cloud at ``src/gui/app_demo.py``
|
|
would inadvertently expose every paid-product page in the demo's
|
|
sidebar (or require URL-routing tricks to suppress them).
|
|
Anchoring the entry script at the repo root means there is no
|
|
``pages/`` neighbour and the demo stays single-page by
|
|
construction.
|
|
|
|
The actual demo UI is defined once in ``src/gui/app_demo.py`` so
|
|
local development still works the way it always did:
|
|
|
|
streamlit run src/gui/app_demo.py # local dev, identical UX
|
|
|
|
Cloud deploy uses this shim:
|
|
|
|
streamlit run streamlit_app.py # what Cloud invokes
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import sys
|
|
from pathlib import Path
|
|
|
|
# Put the repo root on sys.path so ``src.core`` and ``src.gui`` imports
|
|
# resolve cleanly. The demo module does this itself for the local-dev
|
|
# case, but the import order matters when this shim runs first on Cloud.
|
|
_HERE = Path(__file__).resolve().parent
|
|
if str(_HERE) not in sys.path:
|
|
sys.path.insert(0, str(_HERE))
|
|
|
|
# Executing the demo module top-to-bottom is the simplest way to share
|
|
# the UI between the two entry points without duplicating code or
|
|
# refactoring the demo into a function (Streamlit's idiom is
|
|
# script-as-page; converting it to a callable would fight the
|
|
# framework). ``runpy`` runs the file in this script's namespace so
|
|
# Streamlit's ``st.set_page_config`` / element registration sees the
|
|
# correct module.
|
|
import runpy
|
|
runpy.run_path(
|
|
str(_HERE / "src" / "gui" / "app_demo.py"),
|
|
run_name="__main__",
|
|
)
|