Files
datatools-dev/docs/NEXT-STEPS.md
Michael 966af8ef94 feat: 3 new tools, format streaming, distribution-ready demo + landing pages
Tools shipped this batch (4 → 6 of 9 Ready):
  04 Missing Value Handler   src/core/missing.py + cli_missing.py + GUI
  05 Column Mapper           src/core/column_mapper.py + cli_column_map.py + GUI
  09 Pipeline Runner         src/core/pipeline.py + cli_pipeline.py + GUI
                             with soft tool-dependency graph (recommended,
                             not enforced) and JSON save/load for repeatable
                             weekly cleanups.

Format Standardizer reworked for 1 GB international files:
  • Vectorised dispatch + LRU cache over phone/date/currency/boolean/email
  • Per-row country / address columns drive parsing
  • Audit cap (default 10 k rows, ~50 MB RAM)
  • standardize_file(): chunked streaming entry point (~165 k rows/sec)
  • currency_decimal="auto" for EU comma-decimal locales
  • R$ / kr / zł multi-char currency prefixes
  • cli_format.py with auto-stream above 100 MB inputs

Encoding detection arbiter + language-aware probe:
  Closes the last 4 xfails (cp1250 / mac_iceland / shift_jis_2004 / lying-BOM)
  via tied-confidence arbiter + Cyrillic / EE-Latin coverage probes.

Distribution-readiness assets:
  • streamlit_app.py — Streamlit Community Cloud entry shim
  • src/gui/app_demo.py — single-page demo, ?p=<persona> routing,
    100-row cap + watermark, free-vs-paid boundary enforced at surface
  • samples/demo/ — 3 niche datasets + pre-tuned pipeline JSONs
  • landing/ — 4 static HTML pages (apex chooser + 3 niche),
    shared CSS, deploy.py URL-substitution script,
    auto-generated robots.txt + sitemap.xml + 404.html + favicon
  • docs/PLAN.md, DEMO-PLAN.md, DEPLOYMENT.md, POST-LAUNCH.md, NEXT-STEPS.md
    — full strategy + measurement + deployment + master checklist

Test counts:
  before: 1,520 passed · 4 skipped · 17 xfailed
  after:  1,729 passed · 0 skipped · 0  xfailed

Tier-1 corpora added:
  • missing-corpus           3 use cases + 16 edge cases
  • column-mapper-corpus     3 use cases + 5 edge cases
  • format-cleaner intl      20-row 13-country stress fixture

Engine hardening flushed out by the corpora:
  • interpolate guards against object-dtype columns
  • mean/median skip all-NaN columns (silences numpy warning)
  • fillna runs under future.no_silent_downcasting (silences pandas warning)
  • mojibake test no longer skips when ftfy installed (monkeypatch path)
  • drop-row threshold semantics: strict-greater (consistent across rows / cols)
  • currency_decimal validator allow-set updated for "auto"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 22:31:26 +00:00

16 KiB
Raw Blame History

Next Steps — from "code complete" to first paying customer

Creator-only. The runnable checklist that takes the operator from the current state (1,729 tests passing, 6 tools shipped, 0 paying customers) through launch and into the first 90 days. Version: 1.0 · Adopted: 2026-05-01

This document is the single answer to "what now?". Every line item has an owner, a time estimate, a blocker, a cost, and the external dependency that makes it un-shippable today. Items are ordered by must-finish-before-the-next-item — work top-down.

Cross-references:

  • Strategy: PLAN.md (the 8 strategic moves + the 90-day sequence)
  • Demo specs: DEMO-PLAN.md
  • Deployment mechanics: DEPLOYMENT.md
  • Post-launch measurement: POST-LAUNCH.md
  • Locked criteria: DECISIONS.md §1

Status legend:

  • 🟢 Done — the asset exists in this repo
  • 🟡 Buildable now — no external dependency needed
  • 🟠 External dependency — needs an account / signup / payment
  • 🔴 Manual / requires user input that can't be automated

Phase 0 · What's already done (skip ahead)

Item Where it lives
🟢 6 of 9 tools shipped (Dedup, Text, Format, Missing, Column-Map, Pipeline) src/core/, src/cli_*.py, src/gui/pages/
🟢 Pipeline Runner (the retention multiplier per PLAN.md §2.6) src/core/pipeline.py, src/cli_pipeline.py, src/gui/pages/9_Pipeline_Runner.py
🟢 1,729 passing tests · 0 skipped · 0 xfailed tests/
🟢 3 niche demo datasets + pre-tuned pipeline JSONs samples/demo/
🟢 Streamlit demo app + Cloud entry shim streamlit_app.py, src/gui/app_demo.py
🟢 3 niche landing pages + apex chooser + shared CSS landing/
🟢 Landing-page deploy script (URL-substitution + sitemap + 404 + favicon) landing/deploy.py
🟢 Strategic plan + demo plan + post-launch measurement plan + deployment doc docs/PLAN.md, DEMO-PLAN.md, POST-LAUNCH.md, DEPLOYMENT.md

Phase 1 · Stand the funnel up (target: end of week 1, ~6 hours total work)

The bottleneck right now is distribution, not feature count. Everything in this phase is about turning code into a URL the user can hit.

1.1 — 🟠 Push to GitHub (5 min)

What git init (if not already), commit, push to a private or public GitHub repo.
Why Cloud deploy services need a Git source. Streamlit Community Cloud auto-deploys on push to main.
External dependency A GitHub account (free).
Cost $0.
Blocked by Nothing.

1.2 — 🟠 Deploy the demo to Streamlit Community Cloud (15 min)

What Follow DEPLOYMENT.md Part 1. Result: a public URL like https://datatools-demo.streamlit.app.
Why The landing pages embed this in their iframe. Without it, every "Run pipeline" button on the landing pages 404s.
External dependency Free Streamlit Community Cloud account, signed in via GitHub.
Cost $0.
Blocked by 1.1 (the repo must be on GitHub).
Watch out for First build takes 23 min while Cloud installs deps. Subsequent deploys < 30 s.

1.3 — 🟠 Buy the apex domain (5 min, ~$15/year)

What Register datatools.app (or whichever) at any registrar. Point the nameservers at Cloudflare.
Why The landing-page canonical URLs and CTA buttons refer to this domain. Pages can deploy to a free *.pages.dev URL first if you want to defer this.
External dependency A registrar account; payment method.
Cost ~$15/year. Within BUSINESS.md §9 cost cap.
Blocked by Nothing — can run in parallel with 1.1 / 1.2.

1.4 — 🟠 Deploy the landing pages to Cloudflare Pages (15 min)

What Follow DEPLOYMENT.md Part 2. Run python3 landing/deploy.py with the operator's URLs in deploy.config.json, then wrangler pages deploy landing/dist (or drag-drop).
Why This is the marketing surface. Three persona URLs go live as soon as it deploys.
External dependency Free Cloudflare account; Wrangler CLI (optional — drag-drop works too).
Cost $0.
Blocked by 1.2 (the demo URL goes into deploy.config.json); ideally 1.3 for the custom domain.
Watch out for The deploy.config.json file is gitignored — your real URLs never get committed.

1.5 — 🟠 Open a Gumroad listing (15 min) — stub for now

What Create a Gumroad account, draft a listing with a single screenshot + the landing-page copy, set price to $49. Don't enable purchases yet — leave it as a draft.
Why The CTA buttons on the landing pages link to gumroad.com/l/datatools?from=<persona>. Until the listing exists, those buttons 404.
External dependency Free Gumroad account; Stripe-connected payout method (defer to Phase 2).
Cost $0 to draft, ~10% per sale once live.
Blocked by Nothing — can run in parallel with 1.11.4.
Watch out for The listing URL must be gumroad.com/l/datatools to match the landing-page hard-coded CTAs. If you pick a different slug, update landing/deploy.config.jsongumroad_listing and re-run deploy.py.

1.6 — 🟡 End-to-end smoke verification (10 min)

What Run the four curl commands from DEPLOYMENT.md Part 4. All four landing pages, all three demo personas, sitemap.xml.
Why First time something can break is the moment a real user hits it. Ten minutes of curl saves a week of "why is conversion zero."
External dependency None.
Cost $0.
Blocked by 1.4 + 1.2.

Phase 2 · Make it sellable (target: end of week 2)

2.1 — 🟠 Apple Developer Program enrollment (5 min to start, 12 weeks lead)

What Per BUSINESS.md §10. Required for code-signing the macOS installer.
External dependency Apple ID + government-issued ID (individual) or D-U-N-S number (org).
Cost $99/year.
Blocked by Nothing — start ASAP because of the 12 week approval window. The pipeline waits on this; nothing else does.

2.2 — 🟡 PyInstaller spec + cross-platform build (13 days first time)

What A build/datatools.spec that bundles the Streamlit GUI + all 6 tools + samples into one app. Mac .dmg, Windows .exe installer, Linux AppImage.
Why The buyer's deliverable. Without this, there is nothing to attach to the Gumroad listing.
External dependency None for Linux/Mac builds. Windows builds need a Windows machine or a CI matrix runner.
Cost $0 (GitHub Actions matrix runners are free for public repos).
Blocked by Nothing for the spec; 2.1 for the signed Mac build.
Watch out for Streamlit's bundle size lands around 300500 MB per DECISIONS.md §4c — accepted tradeoff.

2.3 — 🟡 macOS sign + notarize (30 min once Apple Dev is approved)

What Sign the .dmg, submit to Apple's notarization service, staple the ticket.
Why Without it, Gatekeeper hard-blocks the install with no obvious way out (per BUSINESS.md §10). The buyer gives up.
External dependency Apple Developer Program (2.1).
Cost $0 incremental over 2.1.
Blocked by 2.1 + 2.2.

2.4 — 🔴 Refund policy + license + Gumroad listing copy (1 hour)

What A clear refund policy (14-day no-questions per the FAQ already on the landing pages) + a software licence text + the Gumroad listing description.
Why Required by Gumroad's terms; surfaces on the listing page; protects against buyer disputes.
External dependency None — operator authoring.
Cost $0.
Blocked by Nothing.
Hint Most of the copy is already in the landing pages' FAQ section — paste it into Gumroad.

2.5 — 🟠 Activate the Gumroad listing (15 min)

What Upload the cross-platform installers from 2.2/2.3, paste the copy from 2.4, set $49 price, enable purchases, configure Stripe payout.
Why This is the "buy" button finally working.
External dependency Gumroad + Stripe account; the installers from 2.2/2.3.
Cost ~10 % per sale.
Blocked by 2.2, 2.3, 2.4.

Phase 3 · First-traffic ignition (target: end of week 4)

Per PLAN.md §3 and BUSINESS.md §7 channel priorities. The strict no-touch constraint of DECISIONS.md §1 #8 makes channel choice matter — these are the only ones that fit.

3.1 — 🔴 First niche-community post (30 min)

What One value-first post in one niche-relevant community (e.g. r/shopify, IndieHackers Shopify chat, a Slack/Discord that allows it). Lead with the demo URL, not the buy URL.
Why Marketplaces alone don't drive discovery. Communities are the only first-touch channel that works under no-touch.
External dependency Account in the chosen community; understand its self-promotion rules.
Cost $0.
Blocked by 1.4 (demo URL must work).
Hint Pick the persona with the most familiar community to the operator. Don't try all three at once — see POST-LAUNCH.md §2 "decide ONE thing" rule.

3.2 — 🟡 First long-tail SEO blog post (46 hours)

What One 8001,500-word post on datatools.app/blog/ (sub-route of Cloudflare Pages or Substack) targeting one niche keyword from BUSINESS.md §7. Topic: a real problem you've encountered, the cleanup steps, the demo URL at the end.
Why Compounding asset — BUSINESS.md §2 says SEO pays in 618 months, not week 1. Don't mistake it for an early-stage channel.
External dependency None.
Cost $0.
Blocked by Nothing.

3.3 — 🟡 Cloudflare Web Analytics + event counters (45 min)

What Enable Cloudflare Web Analytics on the Pages project (one click). Add a tiny inline <script> to each landing page that fires cta_clicked when the buy button is hit, before redirecting. Per POST-LAUNCH.md §1.
Why Without this, the post-launch checklist is unrunnable.
External dependency Cloudflare account (already from 1.4).
Cost $0.
Blocked by 1.4.
Hint The Gumroad webhook captures ?from=<persona> automatically — no extra wiring.

3.4 — 🟡 Email autoresponder (post-purchase delivery + 3-touch onboarding) (23 hours)

What Gumroad's built-in delivery email plus three follow-up emails (day 1, day 7, day 14): "are you running into X?", "here's an advanced trick", "save your pipeline as JSON for next week".
Why Increases activation, reduces refund risk, surfaces support questions while volume is small.
External dependency Gumroad delivery is built-in. The 3-touch sequence needs a free email service (Resend's free tier or Mailchimp's free tier).
Cost $0$30/month per BUSINESS.md §9.
Blocked by 2.5.

Phase 4 · First-buyer trigger and review

Per PLAN.md §4 decision triggers and POST-LAUNCH.md §4.

4.1 — 🟢 Run the monthly review (30 min, first Monday after launch)

What Follow POST-LAUNCH.md §2 — pull last-30-days demo events + Gumroad sales + refunds, compute the five numbers, decide ONE change.
Why Without this discipline, the funnel drifts and the operator changes 5 things at once and learns nothing.
External dependency None — analytics from 3.3, sales from 2.5.
Cost $0.
Blocked by 3.3 + 2.5.

4.2 — 🟢 First paying customer (target: 90 days)

What The actual first sale.
Why Per BUSINESS.md §6: validates the funnel; not the business.
Trigger action Continue, no plan change. Make the first $1k/month within month 6.

4.3 — 🔴 Zero-paid-in-90-days fallback (only fires if 4.2 doesn't)

What Per POST-LAUNCH.md §4 — audit the funnel, not the features. Run a 1-week outbound experiment to 30 niche contacts as a control (per BUSINESS.md §8 the no-touch revisit is allowed below $5k MRR if it produces signal).
Why Distinguishes "no reach" from "no conversion" — they need different fixes.
External dependency Operator's time.
Cost The 10 hr/wk allocation already exists; this displaces other work.
Blocked by The 90-day calendar trigger from 4.2.

Phase 5 · Steady state — what NOT to build

Per PLAN.md §5 (anti-temptations) and DECISIONS.md §8 (re-lock triggers). The trap is treating "more code" as the answer when the data says "more reach" or "more conversion." The five forbidden moves until $5k/mo MRR:

Why locked
More tools (0608) PLAN.md §2.1 distribution-gate. Tool 09 was the exception; no others until first paid customer + one external review.
SaaS pivot DECISIONS.md §4 — recurring infra conflicts with the lifestyle constraint.
Live chat / sales calls DECISIONS.md §1 #8 — no-touch is locked until $5k/mo.
Custom integrations / one-off consulting Breaks "build once, sell many."
Going broad on personas PLAN.md §5 — "all small businesses" converts at 1 %; vertical converts at 515 %.

Triage table — what blocks what

Phase 1 (week 1)              Phase 2 (week 2)         Phase 3 (week 4)
┌──────────────┐              ┌──────────────┐         ┌──────────────┐
│ 1.1 Push GH  │──────────┐   │ 2.1 Apple    │ ───┐    │ 3.1 Community│
│ 1.2 Demo     │──┐       ├──▶│   Dev (1-2w) │    │    │ 3.2 SEO post │
│ 1.3 Domain   │  │       │   │ 2.2 Build    │ ───┤    │ 3.3 Analytics│
│ 1.4 Pages    │◀─┘       │   │ 2.3 Sign     │ ───┤    │ 3.4 Emails   │
│ 1.5 Gumroad  │──────────┘   │ 2.4 Copy     │    │    └──────────────┘
│ 1.6 Verify   │              │ 2.5 Activate │ ◀──┘
└──────────────┘              └──────────────┘            ↓
                                                  ┌──────────────┐
                                                  │ 4.1 Monthly  │
                                                  │ 4.2 First $  │
                                                  │ 4.3 Fallback │
                                                  └──────────────┘

The longest blocking path is 2.1 Apple Developer Program (12 weeks). Start it on day 1 of week 1 — it unblocks everything in Phase 2 and you can do all of Phase 1 while waiting.


Time estimate — total operator time

Phase Hours Wall-clock
Phase 1 ~1 hour end of week 1 (mostly waiting for builds)
Phase 2 ~1 day end of week 2 (gated by Apple Dev approval)
Phase 3 ~6 hours week 34
Phase 4 30 min/month ongoing
Total to launch ~12 hours of operator time ~14 days wall-clock

Well inside the 10 hr/wk constraint of DECISIONS.md §1 #2.


The thing that decides whether the plan works

Not the build. Not the deploy. Not even the first sale.

The discipline of running the monthly review in Phase 4 — and the "decide ONE thing per month" rule from POST-LAUNCH.md §2 — is what separates "this product exists" from "this product compounds." Every feature added before the funnel is measured is a guess; every change made after the monthly review is informed.

Don't skip 4.1.