Tools shipped this batch (4 → 6 of 9 Ready):
04 Missing Value Handler src/core/missing.py + cli_missing.py + GUI
05 Column Mapper src/core/column_mapper.py + cli_column_map.py + GUI
09 Pipeline Runner src/core/pipeline.py + cli_pipeline.py + GUI
with soft tool-dependency graph (recommended,
not enforced) and JSON save/load for repeatable
weekly cleanups.
Format Standardizer reworked for 1 GB international files:
• Vectorised dispatch + LRU cache over phone/date/currency/boolean/email
• Per-row country / address columns drive parsing
• Audit cap (default 10 k rows, ~50 MB RAM)
• standardize_file(): chunked streaming entry point (~165 k rows/sec)
• currency_decimal="auto" for EU comma-decimal locales
• R$ / kr / zł multi-char currency prefixes
• cli_format.py with auto-stream above 100 MB inputs
Encoding detection arbiter + language-aware probe:
Closes the last 4 xfails (cp1250 / mac_iceland / shift_jis_2004 / lying-BOM)
via tied-confidence arbiter + Cyrillic / EE-Latin coverage probes.
Distribution-readiness assets:
• streamlit_app.py — Streamlit Community Cloud entry shim
• src/gui/app_demo.py — single-page demo, ?p=<persona> routing,
100-row cap + watermark, free-vs-paid boundary enforced at surface
• samples/demo/ — 3 niche datasets + pre-tuned pipeline JSONs
• landing/ — 4 static HTML pages (apex chooser + 3 niche),
shared CSS, deploy.py URL-substitution script,
auto-generated robots.txt + sitemap.xml + 404.html + favicon
• docs/PLAN.md, DEMO-PLAN.md, DEPLOYMENT.md, POST-LAUNCH.md, NEXT-STEPS.md
— full strategy + measurement + deployment + master checklist
Test counts:
before: 1,520 passed · 4 skipped · 17 xfailed
after: 1,729 passed · 0 skipped · 0 xfailed
Tier-1 corpora added:
• missing-corpus 3 use cases + 16 edge cases
• column-mapper-corpus 3 use cases + 5 edge cases
• format-cleaner intl 20-row 13-country stress fixture
Engine hardening flushed out by the corpora:
• interpolate guards against object-dtype columns
• mean/median skip all-NaN columns (silences numpy warning)
• fillna runs under future.no_silent_downcasting (silences pandas warning)
• mojibake test no longer skips when ftfy installed (monkeypatch path)
• drop-row threshold semantics: strict-greater (consistent across rows / cols)
• currency_decimal validator allow-set updated for "auto"
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5.9 KiB
Landing pages
Three persona-tagged landing pages per docs/PLAN.md §2.3 and
docs/DEMO-PLAN.md §3 / §7. Static HTML, zero build step, ship to
Cloudflare Pages.
Structure
landing/
├── _shared/styles.css shared CSS (system fonts, no externals)
├── shopify-pet/index.html Shopify operator (priority: pet supplies)
├── bookkeeper/index.html bookkeeper / freelance accountant
├── revops/index.html marketing / RevOps agency
└── README.md this file
Each page:
- Inherits
landing/_shared/styles.css - Overrides the
--accentcolour variable in an inline<style>block so each persona has its own visual identity (Shopify = mint green, Bookkeeper = steel blue, RevOps = vivid violet) - Has a sticky buy bar with the Gumroad CTA tagged with
?from=<persona> - Embeds the live demo (Streamlit) via
<iframe>with a sandbox attribute - Carries persona-specific H1, sub-copy, use cases, FAQ, and a
ready-to-paste
terminalblock showing the CLI in action - Includes Open Graph + Schema.org
SoftwareApplicationJSON-LD for link-share previews and SEO
Pre-deploy URL substitutions — automated
The HTML carries placeholder URLs (the literal strings
https://demo.datatools.app, https://datatools.app,
https://gumroad.com/l/datatools, mailto:hello@datatools.app)
that must be replaced before deployment. A small Python script
does this for you — no global search-and-replace needed.
# 1) Copy the template and fill in your real URLs:
cp landing/deploy.config.example.json landing/deploy.config.json
edit landing/deploy.config.json
# 2) Build the deploy-ready bundle:
python3 landing/deploy.py
# → produces landing/dist/ with substitutions applied,
# plus robots.txt, sitemap.xml, 404.html, favicon.svg
landing/deploy.config.json is gitignored so your real URLs never
hit the repo. Re-run landing/deploy.py whenever you change a URL or
edit any HTML source.
Cloudflare Pages deployment
The simplest path — one Pages project pointed at landing/dist/:
# Option A: drag-and-drop the directory in the Cloudflare dashboard
# Pages → Create project → Direct Upload → drag landing/dist/
# Option B: Wrangler CLI (one command, scriptable)
wrangler pages deploy landing/dist
Configure the custom apex domain (datatools.app) in the Cloudflare
Pages project settings; sub-paths /shopify-pet/, /bookkeeper/,
/revops/ are served automatically because the directory layout
mirrors them. Cache rule defaults are fine (HTML 1 day, CSS 7 days).
If you want separate Pages projects per persona for independent
A/B testing, point three projects at the same landing/dist/ and
configure each with its own sub-domain (shopify.datatools.app, etc.)
and a Pages rule that rewrites the root to that persona's
sub-directory.
Telemetry wiring (per DEMO-PLAN §8)
The plan calls for event-only counters, no PII, no Google Analytics.
For each page, on Cloudflare Pages, attach a Worker (or use Cloudflare Web Analytics — it's privacy-friendly out of the box and zero config). Track:
page_viewper persona (auto from CF Web Analytics)cta_clicked— add a small inline<script>that fires a fetch to/api/event?event=cta_clicked&persona=<persona>when the buy button is clicked, then continues the navigation to Gumroad.demo.run_completedanddemo.cta_clickedare owned by the demo app, not the landing page.
Conversion (per DEMO-PLAN §8):
demo_engagement = demo.run_completed / page_view (target ≥ 30%)
purchase_intent = demo.cta_clicked / demo.run_completed (target ≥ 5%)
purchase_rate = gumroad.purchase / demo.cta_clicked (target ≥ 30%)
The Gumroad webhook captures ?from=<persona> so we can attribute
purchases back to the landing page that produced them.
Maintenance triggers (per DEMO-PLAN §9)
Refresh the page when:
| Trigger | Action |
|---|---|
cta_clicked / run_completed < 5% for 4 weeks |
The demo is working but the buyer isn't trusting the CTA. Add a screenshot of the network tab showing zero outbound calls. Soften the price callout. |
page_view → run_completed < 30% for 4 weeks |
The demo iframe isn't loading or visitors aren't engaging. Check the iframe URL. Move the demo above the fold if it's currently below. |
| New tool ships (06–09) | Add it to the persona's saved pipeline only if it fits — don't bloat the demo with every tool. |
| Pricing change | Update <meta> schema, the buybar .price-tag, the pricing card, and the FAQ. Search-and-replace $49 across the file. |
| New persona added (4th, 5th) | Copy shopify-pet/index.html, replace persona-specific copy, add to the footer cross-link block on the existing pages. |
Why static HTML
Per DECISIONS.md §5 and BUSINESS.md §7, the landing-page channel
must be:
- Async-friendly — Cloudflare Pages serves these with no operator involvement
- Cheap — Cloudflare Pages free tier is sufficient until well past
the $5k/mo MRR re-lock trigger (
DECISIONS.md §8) - Privacy-respecting — no third-party tracker means no cookie banner, which means no friction added to the conversion funnel
- Zero ongoing maintenance — no framework, no build, no upgrades. The CSS uses system fonts; no Google Fonts; no CDN dependency that could break the page when their TLS certificate rolls.
Anti-temptations (per DEMO-PLAN §11 + plan §5)
These pages deliberately exclude:
- No live chat widget. Locked by no-touch.
- No "schedule a demo with us" CTA. Same.
- No email capture before the demo. Friction kills conversion.
- No Google Analytics / Meta Pixel. Privacy story is a moat, not a checkbox to ignore.
- No SaaS-style "free trial / no credit card." This is a one-time download, not a subscription.
- No A/B-testing framework yet. Pre-PMF traffic doesn't reach statistical significance — ship the single-arm copy, iterate monthly.