Files

Michael 966af8ef94 feat: 3 new tools, format streaming, distribution-ready demo + landing pages

Tools shipped this batch (4 → 6 of 9 Ready):
  04 Missing Value Handler   src/core/missing.py + cli_missing.py + GUI
  05 Column Mapper           src/core/column_mapper.py + cli_column_map.py + GUI
  09 Pipeline Runner         src/core/pipeline.py + cli_pipeline.py + GUI
                             with soft tool-dependency graph (recommended,
                             not enforced) and JSON save/load for repeatable
                             weekly cleanups.

Format Standardizer reworked for 1 GB international files:
  • Vectorised dispatch + LRU cache over phone/date/currency/boolean/email
  • Per-row country / address columns drive parsing
  • Audit cap (default 10 k rows, ~50 MB RAM)
  • standardize_file(): chunked streaming entry point (~165 k rows/sec)
  • currency_decimal="auto" for EU comma-decimal locales
  • R$ / kr / zł multi-char currency prefixes
  • cli_format.py with auto-stream above 100 MB inputs

Encoding detection arbiter + language-aware probe:
  Closes the last 4 xfails (cp1250 / mac_iceland / shift_jis_2004 / lying-BOM)
  via tied-confidence arbiter + Cyrillic / EE-Latin coverage probes.

Distribution-readiness assets:
  • streamlit_app.py — Streamlit Community Cloud entry shim
  • src/gui/app_demo.py — single-page demo, ?p=<persona> routing,
    100-row cap + watermark, free-vs-paid boundary enforced at surface
  • samples/demo/ — 3 niche datasets + pre-tuned pipeline JSONs
  • landing/ — 4 static HTML pages (apex chooser + 3 niche),
    shared CSS, deploy.py URL-substitution script,
    auto-generated robots.txt + sitemap.xml + 404.html + favicon
  • docs/PLAN.md, DEMO-PLAN.md, DEPLOYMENT.md, POST-LAUNCH.md, NEXT-STEPS.md
    — full strategy + measurement + deployment + master checklist

Test counts:
  before: 1,520 passed · 4 skipped · 17 xfailed
  after:  1,729 passed · 0 skipped · 0  xfailed

Tier-1 corpora added:
  • missing-corpus           3 use cases + 16 edge cases
  • column-mapper-corpus     3 use cases + 5 edge cases
  • format-cleaner intl      20-row 13-country stress fixture

Engine hardening flushed out by the corpora:
  • interpolate guards against object-dtype columns
  • mean/median skip all-NaN columns (silences numpy warning)
  • fillna runs under future.no_silent_downcasting (silences pandas warning)
  • mojibake test no longer skips when ftfy installed (monkeypatch path)
  • drop-row threshold semantics: strict-greater (consistent across rows / cols)
  • currency_decimal validator allow-set updated for "auto"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-01 22:31:26 +00:00

8.7 KiB

Raw Blame History

Post-launch — 90-day measurement plan

Creator-only. The other half of PLAN.md: PLAN tells you what to build, this tells you what to measure once it's live and which numbers trigger which actions. Version: 1.0 · Adopted: 2026-05-01 · Owner: Michael

This is a runnable monthly checklist, not analytics theatre. Every metric below has a threshold and an action. If you're not willing to execute the action when the threshold trips, drop the metric — measuring without responding is busywork.

1. The five numbers that matter

Every other dashboard, chart, or vanity stat is downstream of these five. The funnel is short on purpose; pre-PMF traffic doesn't have the resolution to support more.

#	Metric	How to compute	Threshold	When tripped
1	Persona engagement	`demo.run_completed / demo.page_view` per persona	< 30 % for 4 consecutive weeks	Demo isn't running or BEFORE preview isn't compelling. Action: check iframe loads; widen BEFORE preview to show pollution clearly; move demo above the fold.
2	Demo→CTA intent	`demo.cta_clicked / demo.run_completed` per persona	< 5 % for 4 consecutive weeks	Demo is impressive but the CTA isn't earning trust. Action: add network-tab privacy screenshot; soften the price callout; A-B test eyebrow copy on the CTA card.
3	Purchase rate	`gumroad.purchase / demo.cta_clicked` per persona	< 30 % for 4 consecutive weeks	Visitors click through but don't pull the card out. Action: check Gumroad listing renders cleanly; verify refund-policy copy; check that the screenshot on the listing matches the demo they just ran.
4	Refund rate	`gumroad.refunds / gumroad.purchase` rolling 30 days	> 5 %	Buyer expectation mismatch. Action: read every refund email; determine if it's a feature gap (build it), a positioning lie (rewrite), or a personal-fit miss (fine, ignore).
5	Support load	email tickets / 100 sales rolling 30 days	> 10	The product isn't self-serve enough at this price. Action: find the top 3 questions; add to in-app onboarding + landing-page FAQ + the persona's saved pipeline.

These five also map to BUSINESS.md §12 — that doc names the metrics; this doc operationalises them.

2. Monthly review — 30-minute checklist

Block 30 minutes on the first Monday of every month for the first six months. After month 6 if numbers are stable, drop to 15 minutes quarterly.

[ ] Pull last 30 days of demo events from Cloudflare Web Analytics
[ ] Pull last 30 days of Gumroad sales + refunds export
[ ] Compute the five numbers in §1 per persona
[ ] Note which thresholds are tripped (if any)
[ ] Read every refund email since last review
[ ] Read every support email since last review
[ ] Decide ONE thing to change this month (only one)
[ ] Update CHANGELOG with what was changed and why
[ ] Schedule next review

The "decide ONE thing" rule is load-bearing. Pre-PMF traffic doesn't have the volume to A/B-test multiple changes in parallel — you'll just confuse yourself about what moved the number.

3. Per-persona scoreboard (template)

Maintain in a single text file or spreadsheet. The shape that fits in a notebook page is the shape you'll actually update.

Month: 2026-06
─────────────────────────────────────────────────────────────────
                       Shopify  Bookkeeper  RevOps  Total
Page views                420         180     290    890
Demo runs                 137          59      82    278
CTA clicks                  9           7       6     22
Purchases                   3           2       2      7

Metric 1 (engage)        33%         33%     28%   31%
Metric 2 (intent)         7%         12%      7%    8%
Metric 3 (purchase)      33%         29%     33%   32%
Metric 4 (refund)         0%          0%      0%    0%
Metric 5 (support)         3 tickets  /  100 sales

Tripped thresholds:      RevOps engagement (28% < 30%)

This-month change:       Move demo embed above the fold on revops
                         page; reduce hero text by 40%.

Last-month change:       Added network-tab screenshot to all 3
                         pages.  Result: intent +1.5 percentage
                         points on Shopify, flat elsewhere.

4. Stage-gate triggers from PLAN.md

Reproduced here so the gate criteria sit beside the metrics that fire them:

Trigger	From	Action
First paying customer	PLAN §4	Continue. Plan is working.
Zero paid in 90 days	PLAN §4	Audit the funnel. Don't add features. Run a small (1-week) outbound experiment to 30 niche-community contacts as a control, even though it stretches the no-touch constraint, to determine whether the bottleneck is reach or conversion.
$5 k/mo MRR	DECISIONS §8	Re-evaluate async constraint. Add priority-support tier (PLAN §2.7).
$10 k/mo MRR	DECISIONS §8	Revisit time-budget allocation. Decide on tools 06–08 vs. additional bundles.
Marketplace shutdown	PLAN §4 / DECISIONS §8	Switch landing-page CTA to own-domain Stripe Checkout. Pre-built; one-line edit.
Streamlit hard direction change	DECISIONS §8	Low-probability re-lock. Tk fallback documented.
Burnout signal	DECISIONS §8	Stop. Triage. The constraint matters more than the revenue ramp.

5. What we deliberately do NOT measure

These look productive but predict nothing pre-PMF. Don't add them.

Bounce rate — single-page sites have artificially high bounce. Useless signal.
Time on page — landing pages are supposed to be quick reads. Long time on page often means confusion, not engagement.
Heatmaps / scroll-depth — no statistical resolution at <500 monthly visitors. Add when you cross 5 k/month.
Email open rates — under §2.7 priority support is the only email channel; opens aren't a buying signal.
Social mentions — vanity. The signal that matters is "did they buy" or "did they come back."

6. What we measure once, then trust

Do these once, then let them run for 6+ months without re-measuring:

Demo correctness — once per pipeline release, run all 3 demos end-to-end via tests/test_pipeline.py and check the output looks reasonable. The CI pipeline already does this; nothing to add.
Cross-platform install — once per release, verify the PyInstaller bundle launches on Mac / Windows / Linux. After three green releases, trust the build pipeline; spot-check on major OS updates only.
Privacy claim integrity — once at launch, capture the network tab while running the cleaner and host that screenshot at a stable URL. Re-capture only when a new tool or dependency is added.

7. Per-persona attribution

The buy buttons on every landing page carry ?from=<persona> query parameters. Gumroad propagates that into the order metadata. Use it to attribute purchases:

persona key	landing page URL	Gumroad query	Source
`shopify-pet`	`/shopify-pet/`	`?from=shopify-pet`	Shopify operator
`bookkeeper`	`/bookkeeper/`	`?from=bookkeeper`	Bookkeeper / freelance accountant
`revops`	`/revops/`	`?from=revops`	Marketing / RevOps agency
`apex`	`/`	(no query — use `unknown` bucket)	Generic discovery

When unknown exceeds 30 % of total, add UTM tagging to community posts and SEO blog backlinks so you can break the bucket apart.

8. The four months that decide whether the plan works

Reading PLAN.md §3 + this doc together, the rough script:

Month	What's running	What we expect to learn
M1 (June)	Installers · demo · 3 landing pages · Gumroad live	Whether the funnel mechanically works. Numbers will be noisy; just look for one purchase.
M2 (July)	M1 + community posts in 3 niches + 1 SEO post	Which persona converts. Re-allocate effort to the highest-converting niche.
M3 (August)	M2 + landing-page changes from M2 review	Whether intent-rate moved on the change. Decide tools 06–08 go/no-go.
M4 (September)	M3 + first repeat-buyer signals	Whether the Pipeline Runner is producing retention as designed.

By end of M4, the data tells you whether the plan is producing $1k–3k/mo (BUSINESS.md §6 6-month target) — extrapolated from the trajectory, not the absolute number.

9. The hardest part of the plan to execute

Not the metrics. Not the build. The "decide ONE thing per month" rule — operators with engineering backgrounds chronically pick three changes per month and conclude nothing because their signal is muddled. This doc says one. It means one.

8.7 KiB Raw Blame History Unescape Escape