Files
datatools-dev/samples/demo/shopify_pet_customers.csv
Michael 966af8ef94 feat: 3 new tools, format streaming, distribution-ready demo + landing pages
Tools shipped this batch (4 → 6 of 9 Ready):
  04 Missing Value Handler   src/core/missing.py + cli_missing.py + GUI
  05 Column Mapper           src/core/column_mapper.py + cli_column_map.py + GUI
  09 Pipeline Runner         src/core/pipeline.py + cli_pipeline.py + GUI
                             with soft tool-dependency graph (recommended,
                             not enforced) and JSON save/load for repeatable
                             weekly cleanups.

Format Standardizer reworked for 1 GB international files:
  • Vectorised dispatch + LRU cache over phone/date/currency/boolean/email
  • Per-row country / address columns drive parsing
  • Audit cap (default 10 k rows, ~50 MB RAM)
  • standardize_file(): chunked streaming entry point (~165 k rows/sec)
  • currency_decimal="auto" for EU comma-decimal locales
  • R$ / kr / zł multi-char currency prefixes
  • cli_format.py with auto-stream above 100 MB inputs

Encoding detection arbiter + language-aware probe:
  Closes the last 4 xfails (cp1250 / mac_iceland / shift_jis_2004 / lying-BOM)
  via tied-confidence arbiter + Cyrillic / EE-Latin coverage probes.

Distribution-readiness assets:
  • streamlit_app.py — Streamlit Community Cloud entry shim
  • src/gui/app_demo.py — single-page demo, ?p=<persona> routing,
    100-row cap + watermark, free-vs-paid boundary enforced at surface
  • samples/demo/ — 3 niche datasets + pre-tuned pipeline JSONs
  • landing/ — 4 static HTML pages (apex chooser + 3 niche),
    shared CSS, deploy.py URL-substitution script,
    auto-generated robots.txt + sitemap.xml + 404.html + favicon
  • docs/PLAN.md, DEMO-PLAN.md, DEPLOYMENT.md, POST-LAUNCH.md, NEXT-STEPS.md
    — full strategy + measurement + deployment + master checklist

Test counts:
  before: 1,520 passed · 4 skipped · 17 xfailed
  after:  1,729 passed · 0 skipped · 0  xfailed

Tier-1 corpora added:
  • missing-corpus           3 use cases + 16 edge cases
  • column-mapper-corpus     3 use cases + 5 edge cases
  • format-cleaner intl      20-row 13-country stress fixture

Engine hardening flushed out by the corpora:
  • interpolate guards against object-dtype columns
  • mean/median skip all-NaN columns (silences numpy warning)
  • fillna runs under future.no_silent_downcasting (silences pandas warning)
  • mojibake test no longer skips when ftfy installed (monkeypatch path)
  • drop-row threshold semantics: strict-greater (consistent across rows / cols)
  • currency_decimal validator allow-set updated for "auto"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 22:31:26 +00:00

2.5 KiB

1Customer IDFirst NameLast NameEmailPhoneAddressCityStateZIPCountryTotal OrdersLifetime ValueLast Order DateTags
2SHOP-1001Alice Johnsonalice@petshop.com(415) 555-1234123 Main St., Apt 4BSan FranciscoCA94102US12$1240.502025-12-04VIP
3SHOP-1002BobSMITHBob@PetShop.com415.555.1234123 Main St, Apt 4BSan FranciscoCA94102US12$1,240.50N/AVIP
4SHOP-1003carlosgarciacarlos@petshop.com5559876543742 Evergreen TerraceSpringfieldIL62704US5420.0012/15/2025Wholesale
5SHOP-1004DianaLeediana@petshop.com(555) 222-3344PO Box 12, Sherwood ForestNottinghamNG1 5BAGB8£890.252025-10-30VIP|Wholesale
6SHOP-1005EVE MARTINEZeve.martinez@petshop.com555-9988Calle Mayor 45Madrid28013ES3€1802025-09-15
7SHOP-1006FrankBrownfrank@petshop.comBerlinBE10115DE15€2.41075(blank)Wholesale
8SHOP-1007GraceDavisgrace@petshop.com+1 555-111-1111888 Maple AveTorontoONM5V 3A8CA1$49.99#N/ANew
9SHOP-1008henrywilsonHenry@PetShop.com5551111111888 Maple AvenueTorontoONM5V 3A8CA1$49.992025-12-01New
10SHOP-1009IvyChenIVY@petshop.com+1 (555) 777-7777550 Elm Street, Suite 200BrooklynNY11201US4$320.50 10/12/2025
11SHOP-1010JackTaylorjack@petshop.com(none)550 elm street, suite 200brooklynNY11201US4$320.502025-10-12
12SHOP-1011kateo'neilkate.oneil@petshop.com415-555-222299 King's RdLondonSW3 4LXGB7£675.00?VIP
13SHOP-1012luisrodriguezLUIS@petshop.com+34 91 411 1111Avenida de la Paz 12, 3°DMadrid28013ES2€89,99unknown
14SHOP-1013MiaParkmia@petshop.com02-9374-4000Sydney Opera House DriveSydneyNSW2000AU9A$ 1,299.002025-11-20Wholesale
15SHOP-1014Noahnguyennoah@petshop.com+81 3 3210 7000丸の内 2-7-3Tokyo100-0005JP6¥750002025-12-10VIP
16SHOP-1015OliviaBrownOLIVIA@PETSHOP.COM(555) 333-4444742 evergreen terracespringfieldIL62704US3$180.00(none)
17SHOP-1016PavelNovakpavel@petshop.com+44 20 7946 123422 Baker StreetLondonW1U 6ABUnited Kingdom4£412.002025-11-18VIP
18SHOP-1017QuinnMurphyquinn@petshop.com+44 20 7946 56785 Princes StreetEdinburghEH2 2DAU.K.2£189.502025-12-09
19SHOP-1018RachelO'Brienrachel@petshop.com02-9374-9999100 George StreetSydneyNSW2000UK1£75.00?New
20SHOP-1019SamKleinsam@petshop.com+49 30 99887766Friedrichstraße 100Berlin10117Germany11€1.890,402025-12-11VIP|Wholesale
21SHOP-1020TaraGiannitara@petshop.com+39 06 6982 4567Via del Corso 250Roma00186Italia5€649,992025-12-03