Tools shipped this batch (4 → 6 of 9 Ready):
04 Missing Value Handler src/core/missing.py + cli_missing.py + GUI
05 Column Mapper src/core/column_mapper.py + cli_column_map.py + GUI
09 Pipeline Runner src/core/pipeline.py + cli_pipeline.py + GUI
with soft tool-dependency graph (recommended,
not enforced) and JSON save/load for repeatable
weekly cleanups.
Format Standardizer reworked for 1 GB international files:
• Vectorised dispatch + LRU cache over phone/date/currency/boolean/email
• Per-row country / address columns drive parsing
• Audit cap (default 10 k rows, ~50 MB RAM)
• standardize_file(): chunked streaming entry point (~165 k rows/sec)
• currency_decimal="auto" for EU comma-decimal locales
• R$ / kr / zł multi-char currency prefixes
• cli_format.py with auto-stream above 100 MB inputs
Encoding detection arbiter + language-aware probe:
Closes the last 4 xfails (cp1250 / mac_iceland / shift_jis_2004 / lying-BOM)
via tied-confidence arbiter + Cyrillic / EE-Latin coverage probes.
Distribution-readiness assets:
• streamlit_app.py — Streamlit Community Cloud entry shim
• src/gui/app_demo.py — single-page demo, ?p=<persona> routing,
100-row cap + watermark, free-vs-paid boundary enforced at surface
• samples/demo/ — 3 niche datasets + pre-tuned pipeline JSONs
• landing/ — 4 static HTML pages (apex chooser + 3 niche),
shared CSS, deploy.py URL-substitution script,
auto-generated robots.txt + sitemap.xml + 404.html + favicon
• docs/PLAN.md, DEMO-PLAN.md, DEPLOYMENT.md, POST-LAUNCH.md, NEXT-STEPS.md
— full strategy + measurement + deployment + master checklist
Test counts:
before: 1,520 passed · 4 skipped · 17 xfailed
after: 1,729 passed · 0 skipped · 0 xfailed
Tier-1 corpora added:
• missing-corpus 3 use cases + 16 edge cases
• column-mapper-corpus 3 use cases + 5 edge cases
• format-cleaner intl 20-row 13-country stress fixture
Engine hardening flushed out by the corpora:
• interpolate guards against object-dtype columns
• mean/median skip all-NaN columns (silences numpy warning)
• fillna runs under future.no_silent_downcasting (silences pandas warning)
• mojibake test no longer skips when ftfy installed (monkeypatch path)
• drop-row threshold semantics: strict-greater (consistent across rows / cols)
• currency_decimal validator allow-set updated for "auto"
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3.5 KiB
3.5 KiB
| 1 | Lead ID | First Name | Last Name | Company | Title | Phone | Country | Source | Score | Last Activity | Tags | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | HUB-001 | Alice | Johnson | Acme Corp | VP Marketing | alice@acme.com | (415) 555-1234 | USA | HubSpot | 87 | 2025-12-04 | Enterprise |
| 3 | HUB-002 | bob | smith | Beta LLC | Director Growth | bob@beta.com | N/A | United States | HubSpot | N/A | 2025-11-22 | SMB |
| 4 | HUB-003 | Carlos | Garcia | Gamma Inc | CEO | carlos@gamma.io | +34 91 411 1111 | Spain | HubSpot | 82 | 2025-10-30 | Enterprise |
| 5 | HUB-004 | DIANA | LEE | Delta Co | Marketing Manager | diana@delta.com | 020 7946 0958 | United Kingdom | HubSpot | 74 | 2025-12-15 | Mid-Market |
| 6 | HUB-005 | Eve | Martinez | Epsilon Group | VP Ops | eve@epsilon.com | (none) | Mexico | HubSpot | (blank) | 2025-09-15 | SMB |
| 7 | LIN-006 | Alice | Johnson | Acme Corporation | VP of Marketing | Alice.Johnson@acme.com | 4155551234 | US | — | 2025-12-04 | Enterprise | |
| 8 | LIN-007 | Frank | Brown | Foxtrot Ltd | Head Sales | frank@foxtrot.de | +49 30 12345678 | Germany | 68 | 2025-12-01 | Mid-Market | |
| 9 | LIN-008 | Grace | Davis | Golf Industries | Marketing Lead | grace@golfind.com | +44 20 7946 0958 | UK | 79 | 2025-11-08 | Mid-Market | |
| 10 | LIN-009 | henry | wilson | Hotel Logistics | COO | henry@hotellog.com | +86 10 1234 5678 | China | 91 | 2025-12-12 | Enterprise | |
| 11 | LIN-010 | IVY CHEN | India Tech | CTO | ivy@indiatech.in | +91 11 2345 6789 | IN | 88 | 2025-11-30 | Enterprise | ||
| 12 | LIN-011 | Jack | Taylor | Juliet & Co | Founder | jack@juliet.co | unknown | United States | ? | (unknown) | SMB | |
| 13 | SCR-012 | Diana | Lee | Delta Company | Marketing Manager | diana@delta.com | 020-7946-0958 | UK | Manual Scrape | 74 | 12/15/2025 | Mid-Market |
| 14 | SCR-013 | kate | o'neil | Kilo Ventures | Partner | kate@kilo.vc | +1 415 555 2222 | USA | Manual Scrape | N/A | ? | Investor |
| 15 | SCR-014 | Carlos | García | Gamma Incorporated | CEO | Carlos@gamma.io | +34-91-411-1111 | Spain | Manual Scrape | 82 | Oct 30 2025 | Enterprise |
| 16 | SCR-015 | Liam | Park | Lima Solutions | Director Marketing | liam@limasol.kr | +82 2 2287 0114 | South Korea | Manual Scrape | 77 | 2025-11-20 | Enterprise |
| 17 | SCR-016 | Mia | nguyen | Mike Corp | VP Marketing | mia@mikecorp.com.au | 02 9374 4000 | Australia | Manual Scrape | 72 | 2025-10-05 | Mid-Market |
| 18 | SCR-017 | Noah | Brown | November Inc | Head of Growth | noah@november.com | (555) 444-5555 | US | Manual Scrape | — | #N/A | SMB |
| 19 | HUB-018 | Frank | Brown | Foxtrot | Head of Sales | Frank@Foxtrot.de | +49-30-12345678 | Germany | HubSpot | 68 | 2025-12-01 | Mid-Market |
| 20 | HUB-019 | Olivia | Rossi | Oscar Italia | CMO | olivia@oscar.it | +39 06 6982 | Italy | HubSpot | 85 | 2025-12-08 | Enterprise |
| 21 | HUB-020 | papa | wong | Papa Trading | Founder | papa@papatrading.hk | +852 2123 4567 | Hong Kong | HubSpot | 69 | 2025-11-15 | SMB |
| 22 | LIN-021 | Quinn | Reyes | Quebec Group | VP Sales | quinn@quebec.mx | +52 55 5555 0000 | Mexico | 80 | 2025-12-05 | Mid-Market | |
| 23 | LIN-022 | Robert | Tan | Romeo Logistics | Director | r.tan@romeo.sg | +65 6123 4567 | Singapore | 76 | 2025-11-28 | Mid-Market | |
| 24 | SCR-023 | Sara | Khan | Sierra Foods | Head Marketing | sara@sierra.in | +91-22-1234-5678 | India | Manual Scrape | 73 | 2025-12-02 | SMB |
| 25 | SCR-024 | bob | Smith | Beta | Director Growth | Bob@Beta.com | (none) | United States | Manual Scrape | (unknown) | (unknown) | SMB |
| 26 | HUB-025 | Tara | Levi | Tango Tech | VP Product | tara@tango.il | +972 3 6957 0000 | Israel | HubSpot | 82 | 2025-12-10 | Enterprise |
| 27 | HUB-026 | Uma | Patel | Uniform Health | CMO | uma at uniform dot com | +44 20 7946 8888 | United Kingdom | HubSpot | 71 | 2025-12-12 | Enterprise |
| 28 | LIN-027 | Victor | Lee | Victor Co | Director | victor@@victorco.com | +1 415 555 8888 | USA | 69 | 2025-11-30 | SMB | |
| 29 | SCR-028 | Wendy | Akin | Whiskey Inc | CMO | wendy@whiskey.tr | +90 212 252 1111 | Turkey | Manual Scrape | 77 | 2025-12-04 | Mid-Market |
| 30 | SCR-029 | Xander | Ng | Xray Group | Founder | xander@xray.sg | +65 6234 5678 | Singapore | Manual Scrape | 65 | 2025-11-15 | Suppressed |
| 31 | HUB-030 | Yara | Costa | Yankee Foods | Marketing Lead | yara@yankee.br | +55 11 3071 2222 | Brazil | HubSpot | — | 2025-12-15 | Opted Out |