Files
datatools-dev/landing/README.md
Michael 966af8ef94 feat: 3 new tools, format streaming, distribution-ready demo + landing pages
Tools shipped this batch (4 → 6 of 9 Ready):
  04 Missing Value Handler   src/core/missing.py + cli_missing.py + GUI
  05 Column Mapper           src/core/column_mapper.py + cli_column_map.py + GUI
  09 Pipeline Runner         src/core/pipeline.py + cli_pipeline.py + GUI
                             with soft tool-dependency graph (recommended,
                             not enforced) and JSON save/load for repeatable
                             weekly cleanups.

Format Standardizer reworked for 1 GB international files:
  • Vectorised dispatch + LRU cache over phone/date/currency/boolean/email
  • Per-row country / address columns drive parsing
  • Audit cap (default 10 k rows, ~50 MB RAM)
  • standardize_file(): chunked streaming entry point (~165 k rows/sec)
  • currency_decimal="auto" for EU comma-decimal locales
  • R$ / kr / zł multi-char currency prefixes
  • cli_format.py with auto-stream above 100 MB inputs

Encoding detection arbiter + language-aware probe:
  Closes the last 4 xfails (cp1250 / mac_iceland / shift_jis_2004 / lying-BOM)
  via tied-confidence arbiter + Cyrillic / EE-Latin coverage probes.

Distribution-readiness assets:
  • streamlit_app.py — Streamlit Community Cloud entry shim
  • src/gui/app_demo.py — single-page demo, ?p=<persona> routing,
    100-row cap + watermark, free-vs-paid boundary enforced at surface
  • samples/demo/ — 3 niche datasets + pre-tuned pipeline JSONs
  • landing/ — 4 static HTML pages (apex chooser + 3 niche),
    shared CSS, deploy.py URL-substitution script,
    auto-generated robots.txt + sitemap.xml + 404.html + favicon
  • docs/PLAN.md, DEMO-PLAN.md, DEPLOYMENT.md, POST-LAUNCH.md, NEXT-STEPS.md
    — full strategy + measurement + deployment + master checklist

Test counts:
  before: 1,520 passed · 4 skipped · 17 xfailed
  after:  1,729 passed · 0 skipped · 0  xfailed

Tier-1 corpora added:
  • missing-corpus           3 use cases + 16 edge cases
  • column-mapper-corpus     3 use cases + 5 edge cases
  • format-cleaner intl      20-row 13-country stress fixture

Engine hardening flushed out by the corpora:
  • interpolate guards against object-dtype columns
  • mean/median skip all-NaN columns (silences numpy warning)
  • fillna runs under future.no_silent_downcasting (silences pandas warning)
  • mojibake test no longer skips when ftfy installed (monkeypatch path)
  • drop-row threshold semantics: strict-greater (consistent across rows / cols)
  • currency_decimal validator allow-set updated for "auto"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 22:31:26 +00:00

143 lines
5.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Landing pages
Three persona-tagged landing pages per `docs/PLAN.md` §2.3 and
`docs/DEMO-PLAN.md` §3 / §7. Static HTML, zero build step, ship to
Cloudflare Pages.
## Structure
```
landing/
├── _shared/styles.css shared CSS (system fonts, no externals)
├── shopify-pet/index.html Shopify operator (priority: pet supplies)
├── bookkeeper/index.html bookkeeper / freelance accountant
├── revops/index.html marketing / RevOps agency
└── README.md this file
```
Each page:
- Inherits `landing/_shared/styles.css`
- Overrides the `--accent` colour variable in an inline `<style>` block
so each persona has its own visual identity (Shopify = mint green,
Bookkeeper = steel blue, RevOps = vivid violet)
- Has a sticky buy bar with the Gumroad CTA tagged with `?from=<persona>`
- Embeds the live demo (Streamlit) via `<iframe>` with a sandbox attribute
- Carries persona-specific H1, sub-copy, use cases, FAQ, and a
ready-to-paste `terminal` block showing the CLI in action
- Includes Open Graph + Schema.org `SoftwareApplication` JSON-LD for
link-share previews and SEO
## Pre-deploy URL substitutions — automated
The HTML carries placeholder URLs (the literal strings
`https://demo.datatools.app`, `https://datatools.app`,
`https://gumroad.com/l/datatools`, `mailto:hello@datatools.app`)
that **must** be replaced before deployment. A small Python script
does this for you — no global search-and-replace needed.
```bash
# 1) Copy the template and fill in your real URLs:
cp landing/deploy.config.example.json landing/deploy.config.json
edit landing/deploy.config.json
# 2) Build the deploy-ready bundle:
python3 landing/deploy.py
# → produces landing/dist/ with substitutions applied,
# plus robots.txt, sitemap.xml, 404.html, favicon.svg
```
`landing/deploy.config.json` is gitignored so your real URLs never
hit the repo. Re-run `landing/deploy.py` whenever you change a URL or
edit any HTML source.
## Cloudflare Pages deployment
The simplest path — one Pages project pointed at `landing/dist/`:
```bash
# Option A: drag-and-drop the directory in the Cloudflare dashboard
# Pages → Create project → Direct Upload → drag landing/dist/
# Option B: Wrangler CLI (one command, scriptable)
wrangler pages deploy landing/dist
```
Configure the custom apex domain (`datatools.app`) in the Cloudflare
Pages project settings; sub-paths `/shopify-pet/`, `/bookkeeper/`,
`/revops/` are served automatically because the directory layout
mirrors them. Cache rule defaults are fine (HTML 1 day, CSS 7 days).
If you want **separate Pages projects** per persona for independent
A/B testing, point three projects at the same `landing/dist/` and
configure each with its own sub-domain (`shopify.datatools.app`, etc.)
and a Pages rule that rewrites the root to that persona's
sub-directory.
## Telemetry wiring (per DEMO-PLAN §8)
The plan calls for event-only counters, no PII, no Google Analytics.
For each page, on Cloudflare Pages, attach a Worker (or use Cloudflare
Web Analytics — it's privacy-friendly out of the box and zero config).
Track:
- `page_view` per persona (auto from CF Web Analytics)
- `cta_clicked` — add a small inline `<script>` that fires a fetch to
`/api/event?event=cta_clicked&persona=<persona>` when the buy button
is clicked, then continues the navigation to Gumroad.
- `demo.run_completed` and `demo.cta_clicked` are owned by the demo
app, not the landing page.
Conversion (per DEMO-PLAN §8):
```
demo_engagement = demo.run_completed / page_view (target ≥ 30%)
purchase_intent = demo.cta_clicked / demo.run_completed (target ≥ 5%)
purchase_rate = gumroad.purchase / demo.cta_clicked (target ≥ 30%)
```
The Gumroad webhook captures `?from=<persona>` so we can attribute
purchases back to the landing page that produced them.
## Maintenance triggers (per DEMO-PLAN §9)
Refresh the page when:
| Trigger | Action |
|---|---|
| `cta_clicked / run_completed < 5%` for 4 weeks | The demo is working but the buyer isn't trusting the CTA. Add a screenshot of the network tab showing zero outbound calls. Soften the price callout. |
| `page_view → run_completed < 30%` for 4 weeks | The demo iframe isn't loading or visitors aren't engaging. Check the iframe URL. Move the demo above the fold if it's currently below. |
| New tool ships (0609) | Add it to the persona's saved pipeline only if it fits — don't bloat the demo with every tool. |
| Pricing change | Update `<meta>` schema, the buybar `.price-tag`, the pricing card, and the FAQ. Search-and-replace `$49` across the file. |
| New persona added (4th, 5th) | Copy `shopify-pet/index.html`, replace persona-specific copy, add to the `footer` cross-link block on the existing pages. |
## Why static HTML
Per `DECISIONS.md §5` and `BUSINESS.md §7`, the landing-page channel
must be:
- **Async-friendly** — Cloudflare Pages serves these with no operator
involvement
- **Cheap** — Cloudflare Pages free tier is sufficient until well past
the $5k/mo MRR re-lock trigger (`DECISIONS.md §8`)
- **Privacy-respecting** — no third-party tracker means no cookie
banner, which means no friction added to the conversion funnel
- **Zero ongoing maintenance** — no framework, no build, no upgrades.
The CSS uses system fonts; no Google Fonts; no CDN dependency that
could break the page when their TLS certificate rolls.
## Anti-temptations (per DEMO-PLAN §11 + plan §5)
These pages deliberately exclude:
- **No live chat widget.** Locked by no-touch.
- **No "schedule a demo with us" CTA.** Same.
- **No email capture before the demo.** Friction kills conversion.
- **No Google Analytics / Meta Pixel.** Privacy story is a moat, not
a checkbox to ignore.
- **No SaaS-style "free trial / no credit card."** This is a one-time
download, not a subscription.
- **No A/B-testing framework yet.** Pre-PMF traffic doesn't reach
statistical significance — ship the single-arm copy, iterate monthly.