Files
datatools-dev/samples/demo/agency_combined_leads.csv
Michael 966af8ef94 feat: 3 new tools, format streaming, distribution-ready demo + landing pages
Tools shipped this batch (4 → 6 of 9 Ready):
  04 Missing Value Handler   src/core/missing.py + cli_missing.py + GUI
  05 Column Mapper           src/core/column_mapper.py + cli_column_map.py + GUI
  09 Pipeline Runner         src/core/pipeline.py + cli_pipeline.py + GUI
                             with soft tool-dependency graph (recommended,
                             not enforced) and JSON save/load for repeatable
                             weekly cleanups.

Format Standardizer reworked for 1 GB international files:
  • Vectorised dispatch + LRU cache over phone/date/currency/boolean/email
  • Per-row country / address columns drive parsing
  • Audit cap (default 10 k rows, ~50 MB RAM)
  • standardize_file(): chunked streaming entry point (~165 k rows/sec)
  • currency_decimal="auto" for EU comma-decimal locales
  • R$ / kr / zł multi-char currency prefixes
  • cli_format.py with auto-stream above 100 MB inputs

Encoding detection arbiter + language-aware probe:
  Closes the last 4 xfails (cp1250 / mac_iceland / shift_jis_2004 / lying-BOM)
  via tied-confidence arbiter + Cyrillic / EE-Latin coverage probes.

Distribution-readiness assets:
  • streamlit_app.py — Streamlit Community Cloud entry shim
  • src/gui/app_demo.py — single-page demo, ?p=<persona> routing,
    100-row cap + watermark, free-vs-paid boundary enforced at surface
  • samples/demo/ — 3 niche datasets + pre-tuned pipeline JSONs
  • landing/ — 4 static HTML pages (apex chooser + 3 niche),
    shared CSS, deploy.py URL-substitution script,
    auto-generated robots.txt + sitemap.xml + 404.html + favicon
  • docs/PLAN.md, DEMO-PLAN.md, DEPLOYMENT.md, POST-LAUNCH.md, NEXT-STEPS.md
    — full strategy + measurement + deployment + master checklist

Test counts:
  before: 1,520 passed · 4 skipped · 17 xfailed
  after:  1,729 passed · 0 skipped · 0  xfailed

Tier-1 corpora added:
  • missing-corpus           3 use cases + 16 edge cases
  • column-mapper-corpus     3 use cases + 5 edge cases
  • format-cleaner intl      20-row 13-country stress fixture

Engine hardening flushed out by the corpora:
  • interpolate guards against object-dtype columns
  • mean/median skip all-NaN columns (silences numpy warning)
  • fillna runs under future.no_silent_downcasting (silences pandas warning)
  • mojibake test no longer skips when ftfy installed (monkeypatch path)
  • drop-row threshold semantics: strict-greater (consistent across rows / cols)
  • currency_decimal validator allow-set updated for "auto"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 22:31:26 +00:00

3.5 KiB

1Lead IDFirst NameLast NameCompanyTitleEmailPhoneCountrySourceScoreLast ActivityTags
2HUB-001AliceJohnsonAcme CorpVP Marketingalice@acme.com(415) 555-1234USAHubSpot872025-12-04Enterprise
3HUB-002bobsmithBeta LLCDirector Growthbob@beta.comN/AUnited StatesHubSpotN/A2025-11-22SMB
4HUB-003CarlosGarciaGamma IncCEOcarlos@gamma.io+34 91 411 1111SpainHubSpot822025-10-30Enterprise
5HUB-004DIANALEEDelta CoMarketing Managerdiana@delta.com020 7946 0958United KingdomHubSpot742025-12-15Mid-Market
6HUB-005EveMartinezEpsilon GroupVP Opseve@epsilon.com(none)MexicoHubSpot(blank)2025-09-15SMB
7LIN-006AliceJohnsonAcme CorporationVP of MarketingAlice.Johnson@acme.com4155551234USLinkedIn2025-12-04Enterprise
8LIN-007FrankBrownFoxtrot LtdHead Salesfrank@foxtrot.de+49 30 12345678GermanyLinkedIn682025-12-01Mid-Market
9LIN-008GraceDavisGolf IndustriesMarketing Leadgrace@golfind.com+44 20 7946 0958UKLinkedIn792025-11-08Mid-Market
10LIN-009henrywilsonHotel LogisticsCOOhenry@hotellog.com+86 10 1234 5678ChinaLinkedIn912025-12-12Enterprise
11LIN-010IVY CHENIndia TechCTOivy@indiatech.in+91 11 2345 6789INLinkedIn882025-11-30Enterprise
12LIN-011JackTaylorJuliet & CoFounderjack@juliet.counknownUnited StatesLinkedIn?(unknown)SMB
13SCR-012DianaLeeDelta CompanyMarketing Managerdiana@delta.com020-7946-0958UKManual Scrape7412/15/2025Mid-Market
14SCR-013kateo'neilKilo VenturesPartnerkate@kilo.vc+1 415 555 2222USAManual ScrapeN/A?Investor
15SCR-014CarlosGarcíaGamma IncorporatedCEOCarlos@gamma.io+34-91-411-1111SpainManual Scrape82Oct 30 2025Enterprise
16SCR-015LiamParkLima SolutionsDirector Marketingliam@limasol.kr+82 2 2287 0114South KoreaManual Scrape772025-11-20Enterprise
17SCR-016MianguyenMike CorpVP Marketingmia@mikecorp.com.au02 9374 4000AustraliaManual Scrape722025-10-05Mid-Market
18SCR-017NoahBrownNovember IncHead of Growthnoah@november.com(555) 444-5555USManual Scrape#N/ASMB
19HUB-018FrankBrownFoxtrotHead of SalesFrank@Foxtrot.de+49-30-12345678GermanyHubSpot682025-12-01Mid-Market
20HUB-019OliviaRossiOscar ItaliaCMOolivia@oscar.it+39 06 6982ItalyHubSpot852025-12-08Enterprise
21HUB-020papawongPapa TradingFounderpapa@papatrading.hk+852 2123 4567Hong KongHubSpot692025-11-15SMB
22LIN-021QuinnReyesQuebec GroupVP Salesquinn@quebec.mx+52 55 5555 0000MexicoLinkedIn802025-12-05Mid-Market
23LIN-022RobertTanRomeo LogisticsDirectorr.tan@romeo.sg+65 6123 4567SingaporeLinkedIn762025-11-28Mid-Market
24SCR-023SaraKhanSierra FoodsHead Marketingsara@sierra.in+91-22-1234-5678IndiaManual Scrape732025-12-02SMB
25SCR-024bobSmithBetaDirector GrowthBob@Beta.com(none)United StatesManual Scrape(unknown)(unknown)SMB
26HUB-025TaraLeviTango TechVP Producttara@tango.il+972 3 6957 0000IsraelHubSpot822025-12-10Enterprise
27HUB-026UmaPatelUniform HealthCMOuma at uniform dot com+44 20 7946 8888United KingdomHubSpot712025-12-12Enterprise
28LIN-027VictorLeeVictor CoDirectorvictor@@victorco.com+1 415 555 8888USALinkedIn692025-11-30SMB
29SCR-028WendyAkinWhiskey IncCMOwendy@whiskey.tr+90 212 252 1111TurkeyManual Scrape772025-12-04Mid-Market
30SCR-029XanderNgXray GroupFounderxander@xray.sg+65 6234 5678SingaporeManual Scrape652025-11-15Suppressed
31HUB-030YaraCostaYankee FoodsMarketing Leadyara@yankee.br+55 11 3071 2222BrazilHubSpot2025-12-15Opted Out