Tools shipped this batch (4 → 6 of 9 Ready):
04 Missing Value Handler src/core/missing.py + cli_missing.py + GUI
05 Column Mapper src/core/column_mapper.py + cli_column_map.py + GUI
09 Pipeline Runner src/core/pipeline.py + cli_pipeline.py + GUI
with soft tool-dependency graph (recommended,
not enforced) and JSON save/load for repeatable
weekly cleanups.
Format Standardizer reworked for 1 GB international files:
• Vectorised dispatch + LRU cache over phone/date/currency/boolean/email
• Per-row country / address columns drive parsing
• Audit cap (default 10 k rows, ~50 MB RAM)
• standardize_file(): chunked streaming entry point (~165 k rows/sec)
• currency_decimal="auto" for EU comma-decimal locales
• R$ / kr / zł multi-char currency prefixes
• cli_format.py with auto-stream above 100 MB inputs
Encoding detection arbiter + language-aware probe:
Closes the last 4 xfails (cp1250 / mac_iceland / shift_jis_2004 / lying-BOM)
via tied-confidence arbiter + Cyrillic / EE-Latin coverage probes.
Distribution-readiness assets:
• streamlit_app.py — Streamlit Community Cloud entry shim
• src/gui/app_demo.py — single-page demo, ?p=<persona> routing,
100-row cap + watermark, free-vs-paid boundary enforced at surface
• samples/demo/ — 3 niche datasets + pre-tuned pipeline JSONs
• landing/ — 4 static HTML pages (apex chooser + 3 niche),
shared CSS, deploy.py URL-substitution script,
auto-generated robots.txt + sitemap.xml + 404.html + favicon
• docs/PLAN.md, DEMO-PLAN.md, DEPLOYMENT.md, POST-LAUNCH.md, NEXT-STEPS.md
— full strategy + measurement + deployment + master checklist
Test counts:
before: 1,520 passed · 4 skipped · 17 xfailed
after: 1,729 passed · 0 skipped · 0 xfailed
Tier-1 corpora added:
• missing-corpus 3 use cases + 16 edge cases
• column-mapper-corpus 3 use cases + 5 edge cases
• format-cleaner intl 20-row 13-country stress fixture
Engine hardening flushed out by the corpora:
• interpolate guards against object-dtype columns
• mean/median skip all-NaN columns (silences numpy warning)
• fillna runs under future.no_silent_downcasting (silences pandas warning)
• mojibake test no longer skips when ftfy installed (monkeypatch path)
• drop-row threshold semantics: strict-greater (consistent across rows / cols)
• currency_decimal validator allow-set updated for "auto"
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
143 lines
5.9 KiB
Markdown
143 lines
5.9 KiB
Markdown
# Landing pages
|
||
|
||
Three persona-tagged landing pages per `docs/PLAN.md` §2.3 and
|
||
`docs/DEMO-PLAN.md` §3 / §7. Static HTML, zero build step, ship to
|
||
Cloudflare Pages.
|
||
|
||
## Structure
|
||
|
||
```
|
||
landing/
|
||
├── _shared/styles.css shared CSS (system fonts, no externals)
|
||
├── shopify-pet/index.html Shopify operator (priority: pet supplies)
|
||
├── bookkeeper/index.html bookkeeper / freelance accountant
|
||
├── revops/index.html marketing / RevOps agency
|
||
└── README.md this file
|
||
```
|
||
|
||
Each page:
|
||
|
||
- Inherits `landing/_shared/styles.css`
|
||
- Overrides the `--accent` colour variable in an inline `<style>` block
|
||
so each persona has its own visual identity (Shopify = mint green,
|
||
Bookkeeper = steel blue, RevOps = vivid violet)
|
||
- Has a sticky buy bar with the Gumroad CTA tagged with `?from=<persona>`
|
||
- Embeds the live demo (Streamlit) via `<iframe>` with a sandbox attribute
|
||
- Carries persona-specific H1, sub-copy, use cases, FAQ, and a
|
||
ready-to-paste `terminal` block showing the CLI in action
|
||
- Includes Open Graph + Schema.org `SoftwareApplication` JSON-LD for
|
||
link-share previews and SEO
|
||
|
||
## Pre-deploy URL substitutions — automated
|
||
|
||
The HTML carries placeholder URLs (the literal strings
|
||
`https://demo.datatools.app`, `https://datatools.app`,
|
||
`https://gumroad.com/l/datatools`, `mailto:hello@datatools.app`)
|
||
that **must** be replaced before deployment. A small Python script
|
||
does this for you — no global search-and-replace needed.
|
||
|
||
```bash
|
||
# 1) Copy the template and fill in your real URLs:
|
||
cp landing/deploy.config.example.json landing/deploy.config.json
|
||
edit landing/deploy.config.json
|
||
|
||
# 2) Build the deploy-ready bundle:
|
||
python3 landing/deploy.py
|
||
# → produces landing/dist/ with substitutions applied,
|
||
# plus robots.txt, sitemap.xml, 404.html, favicon.svg
|
||
```
|
||
|
||
`landing/deploy.config.json` is gitignored so your real URLs never
|
||
hit the repo. Re-run `landing/deploy.py` whenever you change a URL or
|
||
edit any HTML source.
|
||
|
||
## Cloudflare Pages deployment
|
||
|
||
The simplest path — one Pages project pointed at `landing/dist/`:
|
||
|
||
```bash
|
||
# Option A: drag-and-drop the directory in the Cloudflare dashboard
|
||
# Pages → Create project → Direct Upload → drag landing/dist/
|
||
|
||
# Option B: Wrangler CLI (one command, scriptable)
|
||
wrangler pages deploy landing/dist
|
||
```
|
||
|
||
Configure the custom apex domain (`datatools.app`) in the Cloudflare
|
||
Pages project settings; sub-paths `/shopify-pet/`, `/bookkeeper/`,
|
||
`/revops/` are served automatically because the directory layout
|
||
mirrors them. Cache rule defaults are fine (HTML 1 day, CSS 7 days).
|
||
|
||
If you want **separate Pages projects** per persona for independent
|
||
A/B testing, point three projects at the same `landing/dist/` and
|
||
configure each with its own sub-domain (`shopify.datatools.app`, etc.)
|
||
and a Pages rule that rewrites the root to that persona's
|
||
sub-directory.
|
||
|
||
## Telemetry wiring (per DEMO-PLAN §8)
|
||
|
||
The plan calls for event-only counters, no PII, no Google Analytics.
|
||
|
||
For each page, on Cloudflare Pages, attach a Worker (or use Cloudflare
|
||
Web Analytics — it's privacy-friendly out of the box and zero config).
|
||
Track:
|
||
|
||
- `page_view` per persona (auto from CF Web Analytics)
|
||
- `cta_clicked` — add a small inline `<script>` that fires a fetch to
|
||
`/api/event?event=cta_clicked&persona=<persona>` when the buy button
|
||
is clicked, then continues the navigation to Gumroad.
|
||
- `demo.run_completed` and `demo.cta_clicked` are owned by the demo
|
||
app, not the landing page.
|
||
|
||
Conversion (per DEMO-PLAN §8):
|
||
|
||
```
|
||
demo_engagement = demo.run_completed / page_view (target ≥ 30%)
|
||
purchase_intent = demo.cta_clicked / demo.run_completed (target ≥ 5%)
|
||
purchase_rate = gumroad.purchase / demo.cta_clicked (target ≥ 30%)
|
||
```
|
||
|
||
The Gumroad webhook captures `?from=<persona>` so we can attribute
|
||
purchases back to the landing page that produced them.
|
||
|
||
## Maintenance triggers (per DEMO-PLAN §9)
|
||
|
||
Refresh the page when:
|
||
|
||
| Trigger | Action |
|
||
|---|---|
|
||
| `cta_clicked / run_completed < 5%` for 4 weeks | The demo is working but the buyer isn't trusting the CTA. Add a screenshot of the network tab showing zero outbound calls. Soften the price callout. |
|
||
| `page_view → run_completed < 30%` for 4 weeks | The demo iframe isn't loading or visitors aren't engaging. Check the iframe URL. Move the demo above the fold if it's currently below. |
|
||
| New tool ships (06–09) | Add it to the persona's saved pipeline only if it fits — don't bloat the demo with every tool. |
|
||
| Pricing change | Update `<meta>` schema, the buybar `.price-tag`, the pricing card, and the FAQ. Search-and-replace `$49` across the file. |
|
||
| New persona added (4th, 5th) | Copy `shopify-pet/index.html`, replace persona-specific copy, add to the `footer` cross-link block on the existing pages. |
|
||
|
||
## Why static HTML
|
||
|
||
Per `DECISIONS.md §5` and `BUSINESS.md §7`, the landing-page channel
|
||
must be:
|
||
|
||
- **Async-friendly** — Cloudflare Pages serves these with no operator
|
||
involvement
|
||
- **Cheap** — Cloudflare Pages free tier is sufficient until well past
|
||
the $5k/mo MRR re-lock trigger (`DECISIONS.md §8`)
|
||
- **Privacy-respecting** — no third-party tracker means no cookie
|
||
banner, which means no friction added to the conversion funnel
|
||
- **Zero ongoing maintenance** — no framework, no build, no upgrades.
|
||
The CSS uses system fonts; no Google Fonts; no CDN dependency that
|
||
could break the page when their TLS certificate rolls.
|
||
|
||
## Anti-temptations (per DEMO-PLAN §11 + plan §5)
|
||
|
||
These pages deliberately exclude:
|
||
|
||
- **No live chat widget.** Locked by no-touch.
|
||
- **No "schedule a demo with us" CTA.** Same.
|
||
- **No email capture before the demo.** Friction kills conversion.
|
||
- **No Google Analytics / Meta Pixel.** Privacy story is a moat, not
|
||
a checkbox to ignore.
|
||
- **No SaaS-style "free trial / no credit card."** This is a one-time
|
||
download, not a subscription.
|
||
- **No A/B-testing framework yet.** Pre-PMF traffic doesn't reach
|
||
statistical significance — ship the single-arm copy, iterate monthly.
|