Files
datatools-dev/landing/README.md
Michael 966af8ef94 feat: 3 new tools, format streaming, distribution-ready demo + landing pages
Tools shipped this batch (4 → 6 of 9 Ready):
  04 Missing Value Handler   src/core/missing.py + cli_missing.py + GUI
  05 Column Mapper           src/core/column_mapper.py + cli_column_map.py + GUI
  09 Pipeline Runner         src/core/pipeline.py + cli_pipeline.py + GUI
                             with soft tool-dependency graph (recommended,
                             not enforced) and JSON save/load for repeatable
                             weekly cleanups.

Format Standardizer reworked for 1 GB international files:
  • Vectorised dispatch + LRU cache over phone/date/currency/boolean/email
  • Per-row country / address columns drive parsing
  • Audit cap (default 10 k rows, ~50 MB RAM)
  • standardize_file(): chunked streaming entry point (~165 k rows/sec)
  • currency_decimal="auto" for EU comma-decimal locales
  • R$ / kr / zł multi-char currency prefixes
  • cli_format.py with auto-stream above 100 MB inputs

Encoding detection arbiter + language-aware probe:
  Closes the last 4 xfails (cp1250 / mac_iceland / shift_jis_2004 / lying-BOM)
  via tied-confidence arbiter + Cyrillic / EE-Latin coverage probes.

Distribution-readiness assets:
  • streamlit_app.py — Streamlit Community Cloud entry shim
  • src/gui/app_demo.py — single-page demo, ?p=<persona> routing,
    100-row cap + watermark, free-vs-paid boundary enforced at surface
  • samples/demo/ — 3 niche datasets + pre-tuned pipeline JSONs
  • landing/ — 4 static HTML pages (apex chooser + 3 niche),
    shared CSS, deploy.py URL-substitution script,
    auto-generated robots.txt + sitemap.xml + 404.html + favicon
  • docs/PLAN.md, DEMO-PLAN.md, DEPLOYMENT.md, POST-LAUNCH.md, NEXT-STEPS.md
    — full strategy + measurement + deployment + master checklist

Test counts:
  before: 1,520 passed · 4 skipped · 17 xfailed
  after:  1,729 passed · 0 skipped · 0  xfailed

Tier-1 corpora added:
  • missing-corpus           3 use cases + 16 edge cases
  • column-mapper-corpus     3 use cases + 5 edge cases
  • format-cleaner intl      20-row 13-country stress fixture

Engine hardening flushed out by the corpora:
  • interpolate guards against object-dtype columns
  • mean/median skip all-NaN columns (silences numpy warning)
  • fillna runs under future.no_silent_downcasting (silences pandas warning)
  • mojibake test no longer skips when ftfy installed (monkeypatch path)
  • drop-row threshold semantics: strict-greater (consistent across rows / cols)
  • currency_decimal validator allow-set updated for "auto"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 22:31:26 +00:00

5.9 KiB
Raw Blame History

Landing pages

Three persona-tagged landing pages per docs/PLAN.md §2.3 and docs/DEMO-PLAN.md §3 / §7. Static HTML, zero build step, ship to Cloudflare Pages.

Structure

landing/
├── _shared/styles.css      shared CSS (system fonts, no externals)
├── shopify-pet/index.html  Shopify operator (priority: pet supplies)
├── bookkeeper/index.html   bookkeeper / freelance accountant
├── revops/index.html       marketing / RevOps agency
└── README.md               this file

Each page:

  • Inherits landing/_shared/styles.css
  • Overrides the --accent colour variable in an inline <style> block so each persona has its own visual identity (Shopify = mint green, Bookkeeper = steel blue, RevOps = vivid violet)
  • Has a sticky buy bar with the Gumroad CTA tagged with ?from=<persona>
  • Embeds the live demo (Streamlit) via <iframe> with a sandbox attribute
  • Carries persona-specific H1, sub-copy, use cases, FAQ, and a ready-to-paste terminal block showing the CLI in action
  • Includes Open Graph + Schema.org SoftwareApplication JSON-LD for link-share previews and SEO

Pre-deploy URL substitutions — automated

The HTML carries placeholder URLs (the literal strings https://demo.datatools.app, https://datatools.app, https://gumroad.com/l/datatools, mailto:hello@datatools.app) that must be replaced before deployment. A small Python script does this for you — no global search-and-replace needed.

# 1) Copy the template and fill in your real URLs:
cp landing/deploy.config.example.json landing/deploy.config.json
edit landing/deploy.config.json

# 2) Build the deploy-ready bundle:
python3 landing/deploy.py
# → produces landing/dist/ with substitutions applied,
#   plus robots.txt, sitemap.xml, 404.html, favicon.svg

landing/deploy.config.json is gitignored so your real URLs never hit the repo. Re-run landing/deploy.py whenever you change a URL or edit any HTML source.

Cloudflare Pages deployment

The simplest path — one Pages project pointed at landing/dist/:

# Option A: drag-and-drop the directory in the Cloudflare dashboard
#   Pages → Create project → Direct Upload → drag landing/dist/

# Option B: Wrangler CLI (one command, scriptable)
wrangler pages deploy landing/dist

Configure the custom apex domain (datatools.app) in the Cloudflare Pages project settings; sub-paths /shopify-pet/, /bookkeeper/, /revops/ are served automatically because the directory layout mirrors them. Cache rule defaults are fine (HTML 1 day, CSS 7 days).

If you want separate Pages projects per persona for independent A/B testing, point three projects at the same landing/dist/ and configure each with its own sub-domain (shopify.datatools.app, etc.) and a Pages rule that rewrites the root to that persona's sub-directory.

Telemetry wiring (per DEMO-PLAN §8)

The plan calls for event-only counters, no PII, no Google Analytics.

For each page, on Cloudflare Pages, attach a Worker (or use Cloudflare Web Analytics — it's privacy-friendly out of the box and zero config). Track:

  • page_view per persona (auto from CF Web Analytics)
  • cta_clicked — add a small inline <script> that fires a fetch to /api/event?event=cta_clicked&persona=<persona> when the buy button is clicked, then continues the navigation to Gumroad.
  • demo.run_completed and demo.cta_clicked are owned by the demo app, not the landing page.

Conversion (per DEMO-PLAN §8):

demo_engagement = demo.run_completed / page_view       (target ≥ 30%)
purchase_intent = demo.cta_clicked / demo.run_completed (target ≥  5%)
purchase_rate   = gumroad.purchase / demo.cta_clicked   (target ≥ 30%)

The Gumroad webhook captures ?from=<persona> so we can attribute purchases back to the landing page that produced them.

Maintenance triggers (per DEMO-PLAN §9)

Refresh the page when:

Trigger Action
cta_clicked / run_completed < 5% for 4 weeks The demo is working but the buyer isn't trusting the CTA. Add a screenshot of the network tab showing zero outbound calls. Soften the price callout.
page_view → run_completed < 30% for 4 weeks The demo iframe isn't loading or visitors aren't engaging. Check the iframe URL. Move the demo above the fold if it's currently below.
New tool ships (0609) Add it to the persona's saved pipeline only if it fits — don't bloat the demo with every tool.
Pricing change Update <meta> schema, the buybar .price-tag, the pricing card, and the FAQ. Search-and-replace $49 across the file.
New persona added (4th, 5th) Copy shopify-pet/index.html, replace persona-specific copy, add to the footer cross-link block on the existing pages.

Why static HTML

Per DECISIONS.md §5 and BUSINESS.md §7, the landing-page channel must be:

  • Async-friendly — Cloudflare Pages serves these with no operator involvement
  • Cheap — Cloudflare Pages free tier is sufficient until well past the $5k/mo MRR re-lock trigger (DECISIONS.md §8)
  • Privacy-respecting — no third-party tracker means no cookie banner, which means no friction added to the conversion funnel
  • Zero ongoing maintenance — no framework, no build, no upgrades. The CSS uses system fonts; no Google Fonts; no CDN dependency that could break the page when their TLS certificate rolls.

Anti-temptations (per DEMO-PLAN §11 + plan §5)

These pages deliberately exclude:

  • No live chat widget. Locked by no-touch.
  • No "schedule a demo with us" CTA. Same.
  • No email capture before the demo. Friction kills conversion.
  • No Google Analytics / Meta Pixel. Privacy story is a moat, not a checkbox to ignore.
  • No SaaS-style "free trial / no credit card." This is a one-time download, not a subscription.
  • No A/B-testing framework yet. Pre-PMF traffic doesn't reach statistical significance — ship the single-arm copy, iterate monthly.