Tools shipped this batch (4 → 6 of 9 Ready):
04 Missing Value Handler src/core/missing.py + cli_missing.py + GUI
05 Column Mapper src/core/column_mapper.py + cli_column_map.py + GUI
09 Pipeline Runner src/core/pipeline.py + cli_pipeline.py + GUI
with soft tool-dependency graph (recommended,
not enforced) and JSON save/load for repeatable
weekly cleanups.
Format Standardizer reworked for 1 GB international files:
• Vectorised dispatch + LRU cache over phone/date/currency/boolean/email
• Per-row country / address columns drive parsing
• Audit cap (default 10 k rows, ~50 MB RAM)
• standardize_file(): chunked streaming entry point (~165 k rows/sec)
• currency_decimal="auto" for EU comma-decimal locales
• R$ / kr / zł multi-char currency prefixes
• cli_format.py with auto-stream above 100 MB inputs
Encoding detection arbiter + language-aware probe:
Closes the last 4 xfails (cp1250 / mac_iceland / shift_jis_2004 / lying-BOM)
via tied-confidence arbiter + Cyrillic / EE-Latin coverage probes.
Distribution-readiness assets:
• streamlit_app.py — Streamlit Community Cloud entry shim
• src/gui/app_demo.py — single-page demo, ?p=<persona> routing,
100-row cap + watermark, free-vs-paid boundary enforced at surface
• samples/demo/ — 3 niche datasets + pre-tuned pipeline JSONs
• landing/ — 4 static HTML pages (apex chooser + 3 niche),
shared CSS, deploy.py URL-substitution script,
auto-generated robots.txt + sitemap.xml + 404.html + favicon
• docs/PLAN.md, DEMO-PLAN.md, DEPLOYMENT.md, POST-LAUNCH.md, NEXT-STEPS.md
— full strategy + measurement + deployment + master checklist
Test counts:
before: 1,520 passed · 4 skipped · 17 xfailed
after: 1,729 passed · 0 skipped · 0 xfailed
Tier-1 corpora added:
• missing-corpus 3 use cases + 16 edge cases
• column-mapper-corpus 3 use cases + 5 edge cases
• format-cleaner intl 20-row 13-country stress fixture
Engine hardening flushed out by the corpora:
• interpolate guards against object-dtype columns
• mean/median skip all-NaN columns (silences numpy warning)
• fillna runs under future.no_silent_downcasting (silences pandas warning)
• mojibake test no longer skips when ftfy installed (monkeypatch path)
• drop-row threshold semantics: strict-greater (consistent across rows / cols)
• currency_decimal validator allow-set updated for "auto"
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
320 lines
16 KiB
Markdown
320 lines
16 KiB
Markdown
# Next Steps — from "code complete" to first paying customer
|
||
|
||
> Creator-only. The runnable checklist that takes the operator from
|
||
> the current state (1,729 tests passing, 6 tools shipped, 0 paying
|
||
> customers) through launch and into the first 90 days.
|
||
> **Version**: 1.0 · **Adopted**: 2026-05-01
|
||
|
||
This document is the **single answer** to "what now?". Every line
|
||
item has an owner, a time estimate, a blocker, a cost, and the
|
||
external dependency that makes it un-shippable today. Items are
|
||
ordered by **must-finish-before-the-next-item** — work top-down.
|
||
|
||
Cross-references:
|
||
- Strategy: `PLAN.md` (the 8 strategic moves + the 90-day sequence)
|
||
- Demo specs: `DEMO-PLAN.md`
|
||
- Deployment mechanics: `DEPLOYMENT.md`
|
||
- Post-launch measurement: `POST-LAUNCH.md`
|
||
- Locked criteria: `DECISIONS.md` §1
|
||
|
||
Status legend:
|
||
- **🟢** Done — the asset exists in this repo
|
||
- **🟡** Buildable now — no external dependency needed
|
||
- **🟠** External dependency — needs an account / signup / payment
|
||
- **🔴** Manual / requires user input that can't be automated
|
||
|
||
---
|
||
|
||
## Phase 0 · What's already done (skip ahead)
|
||
|
||
| ✓ | Item | Where it lives |
|
||
|---|------|----------------|
|
||
| 🟢 | 6 of 9 tools shipped (Dedup, Text, Format, Missing, Column-Map, Pipeline) | `src/core/`, `src/cli_*.py`, `src/gui/pages/` |
|
||
| 🟢 | Pipeline Runner (the retention multiplier per `PLAN.md` §2.6) | `src/core/pipeline.py`, `src/cli_pipeline.py`, `src/gui/pages/9_Pipeline_Runner.py` |
|
||
| 🟢 | 1,729 passing tests · 0 skipped · 0 xfailed | `tests/` |
|
||
| 🟢 | 3 niche demo datasets + pre-tuned pipeline JSONs | `samples/demo/` |
|
||
| 🟢 | Streamlit demo app + Cloud entry shim | `streamlit_app.py`, `src/gui/app_demo.py` |
|
||
| 🟢 | 3 niche landing pages + apex chooser + shared CSS | `landing/` |
|
||
| 🟢 | Landing-page deploy script (URL-substitution + sitemap + 404 + favicon) | `landing/deploy.py` |
|
||
| 🟢 | Strategic plan + demo plan + post-launch measurement plan + deployment doc | `docs/PLAN.md`, `DEMO-PLAN.md`, `POST-LAUNCH.md`, `DEPLOYMENT.md` |
|
||
|
||
---
|
||
|
||
## Phase 1 · Stand the funnel up (target: end of week 1, ~6 hours total work)
|
||
|
||
The bottleneck right now is **distribution, not feature count**.
|
||
Everything in this phase is about turning code into a URL the user
|
||
can hit.
|
||
|
||
### 1.1 — 🟠 Push to GitHub (5 min)
|
||
|
||
| | |
|
||
|---|---|
|
||
| **What** | `git init` (if not already), commit, push to a private or public GitHub repo. |
|
||
| **Why** | Cloud deploy services need a Git source. Streamlit Community Cloud auto-deploys on push to `main`. |
|
||
| **External dependency** | A GitHub account (free). |
|
||
| **Cost** | $0. |
|
||
| **Blocked by** | Nothing. |
|
||
|
||
### 1.2 — 🟠 Deploy the demo to Streamlit Community Cloud (15 min)
|
||
|
||
| | |
|
||
|---|---|
|
||
| **What** | Follow `DEPLOYMENT.md` Part 1. Result: a public URL like `https://datatools-demo.streamlit.app`. |
|
||
| **Why** | The landing pages embed this in their iframe. Without it, every "Run pipeline" button on the landing pages 404s. |
|
||
| **External dependency** | Free Streamlit Community Cloud account, signed in via GitHub. |
|
||
| **Cost** | $0. |
|
||
| **Blocked by** | 1.1 (the repo must be on GitHub). |
|
||
| **Watch out for** | First build takes 2–3 min while Cloud installs deps. Subsequent deploys < 30 s. |
|
||
|
||
### 1.3 — 🟠 Buy the apex domain (5 min, ~$15/year)
|
||
|
||
| | |
|
||
|---|---|
|
||
| **What** | Register `datatools.app` (or whichever) at any registrar. Point the nameservers at Cloudflare. |
|
||
| **Why** | The landing-page canonical URLs and CTA buttons refer to this domain. Pages can deploy to a free `*.pages.dev` URL first if you want to defer this. |
|
||
| **External dependency** | A registrar account; payment method. |
|
||
| **Cost** | ~$15/year. Within `BUSINESS.md` §9 cost cap. |
|
||
| **Blocked by** | Nothing — can run in parallel with 1.1 / 1.2. |
|
||
|
||
### 1.4 — 🟠 Deploy the landing pages to Cloudflare Pages (15 min)
|
||
|
||
| | |
|
||
|---|---|
|
||
| **What** | Follow `DEPLOYMENT.md` Part 2. Run `python3 landing/deploy.py` with the operator's URLs in `deploy.config.json`, then `wrangler pages deploy landing/dist` (or drag-drop). |
|
||
| **Why** | This is the marketing surface. Three persona URLs go live as soon as it deploys. |
|
||
| **External dependency** | Free Cloudflare account; Wrangler CLI (optional — drag-drop works too). |
|
||
| **Cost** | $0. |
|
||
| **Blocked by** | 1.2 (the demo URL goes into `deploy.config.json`); ideally 1.3 for the custom domain. |
|
||
| **Watch out for** | The `deploy.config.json` file is gitignored — your real URLs never get committed. |
|
||
|
||
### 1.5 — 🟠 Open a Gumroad listing (15 min) **— stub for now**
|
||
|
||
| | |
|
||
|---|---|
|
||
| **What** | Create a Gumroad account, draft a listing with a single screenshot + the landing-page copy, set price to $49. Don't enable purchases yet — leave it as a draft. |
|
||
| **Why** | The CTA buttons on the landing pages link to `gumroad.com/l/datatools?from=<persona>`. Until the listing exists, those buttons 404. |
|
||
| **External dependency** | Free Gumroad account; Stripe-connected payout method (defer to Phase 2). |
|
||
| **Cost** | $0 to draft, ~10% per sale once live. |
|
||
| **Blocked by** | Nothing — can run in parallel with 1.1–1.4. |
|
||
| **Watch out for** | The listing URL must be `gumroad.com/l/datatools` to match the landing-page hard-coded CTAs. If you pick a different slug, update `landing/deploy.config.json` → `gumroad_listing` and re-run `deploy.py`. |
|
||
|
||
### 1.6 — 🟡 End-to-end smoke verification (10 min)
|
||
|
||
| | |
|
||
|---|---|
|
||
| **What** | Run the four `curl` commands from `DEPLOYMENT.md` Part 4. All four landing pages, all three demo personas, sitemap.xml. |
|
||
| **Why** | First time something can break is the moment a real user hits it. Ten minutes of `curl` saves a week of "why is conversion zero." |
|
||
| **External dependency** | None. |
|
||
| **Cost** | $0. |
|
||
| **Blocked by** | 1.4 + 1.2. |
|
||
|
||
---
|
||
|
||
## Phase 2 · Make it sellable (target: end of week 2)
|
||
|
||
### 2.1 — 🟠 Apple Developer Program enrollment (5 min to start, 1–2 weeks lead)
|
||
|
||
| | |
|
||
|---|---|
|
||
| **What** | Per `BUSINESS.md` §10. Required for code-signing the macOS installer. |
|
||
| **External dependency** | Apple ID + government-issued ID (individual) or D-U-N-S number (org). |
|
||
| **Cost** | $99/year. |
|
||
| **Blocked by** | Nothing — start ASAP because of the 1–2 week approval window. The pipeline waits on this; nothing else does. |
|
||
|
||
### 2.2 — 🟡 PyInstaller spec + cross-platform build (1–3 days first time)
|
||
|
||
| | |
|
||
|---|---|
|
||
| **What** | A `build/datatools.spec` that bundles the Streamlit GUI + all 6 tools + samples into one app. Mac `.dmg`, Windows `.exe` installer, Linux AppImage. |
|
||
| **Why** | The buyer's deliverable. Without this, there is nothing to attach to the Gumroad listing. |
|
||
| **External dependency** | None for Linux/Mac builds. Windows builds need a Windows machine or a CI matrix runner. |
|
||
| **Cost** | $0 (GitHub Actions matrix runners are free for public repos). |
|
||
| **Blocked by** | Nothing for the spec; 2.1 for the signed Mac build. |
|
||
| **Watch out for** | Streamlit's bundle size lands around 300–500 MB per `DECISIONS.md` §4c — accepted tradeoff. |
|
||
|
||
### 2.3 — 🟡 macOS sign + notarize (30 min once Apple Dev is approved)
|
||
|
||
| | |
|
||
|---|---|
|
||
| **What** | Sign the `.dmg`, submit to Apple's notarization service, staple the ticket. |
|
||
| **Why** | Without it, Gatekeeper hard-blocks the install with no obvious way out (per `BUSINESS.md` §10). The buyer gives up. |
|
||
| **External dependency** | Apple Developer Program (2.1). |
|
||
| **Cost** | $0 incremental over 2.1. |
|
||
| **Blocked by** | 2.1 + 2.2. |
|
||
|
||
### 2.4 — 🔴 Refund policy + license + Gumroad listing copy (1 hour)
|
||
|
||
| | |
|
||
|---|---|
|
||
| **What** | A clear refund policy (14-day no-questions per the FAQ already on the landing pages) + a software licence text + the Gumroad listing description. |
|
||
| **Why** | Required by Gumroad's terms; surfaces on the listing page; protects against buyer disputes. |
|
||
| **External dependency** | None — operator authoring. |
|
||
| **Cost** | $0. |
|
||
| **Blocked by** | Nothing. |
|
||
| **Hint** | Most of the copy is already in the landing pages' FAQ section — paste it into Gumroad. |
|
||
|
||
### 2.5 — 🟠 Activate the Gumroad listing (15 min)
|
||
|
||
| | |
|
||
|---|---|
|
||
| **What** | Upload the cross-platform installers from 2.2/2.3, paste the copy from 2.4, set $49 price, enable purchases, configure Stripe payout. |
|
||
| **Why** | This is the "buy" button finally working. |
|
||
| **External dependency** | Gumroad + Stripe account; the installers from 2.2/2.3. |
|
||
| **Cost** | ~10 % per sale. |
|
||
| **Blocked by** | 2.2, 2.3, 2.4. |
|
||
|
||
---
|
||
|
||
## Phase 3 · First-traffic ignition (target: end of week 4)
|
||
|
||
Per `PLAN.md` §3 and `BUSINESS.md` §7 channel priorities. The strict
|
||
no-touch constraint of `DECISIONS.md` §1 #8 makes channel choice
|
||
matter — these are the only ones that fit.
|
||
|
||
### 3.1 — 🔴 First niche-community post (30 min)
|
||
|
||
| | |
|
||
|---|---|
|
||
| **What** | One value-first post in one niche-relevant community (e.g. r/shopify, IndieHackers Shopify chat, a Slack/Discord that allows it). Lead with the demo URL, not the buy URL. |
|
||
| **Why** | Marketplaces alone don't drive discovery. Communities are the only first-touch channel that works under no-touch. |
|
||
| **External dependency** | Account in the chosen community; understand its self-promotion rules. |
|
||
| **Cost** | $0. |
|
||
| **Blocked by** | 1.4 (demo URL must work). |
|
||
| **Hint** | Pick the persona with the most familiar community to the operator. Don't try all three at once — see `POST-LAUNCH.md` §2 "decide ONE thing" rule. |
|
||
|
||
### 3.2 — 🟡 First long-tail SEO blog post (4–6 hours)
|
||
|
||
| | |
|
||
|---|---|
|
||
| **What** | One 800–1,500-word post on `datatools.app/blog/` (sub-route of Cloudflare Pages or Substack) targeting one niche keyword from `BUSINESS.md` §7. Topic: a real problem you've encountered, the cleanup steps, the demo URL at the end. |
|
||
| **Why** | Compounding asset — `BUSINESS.md` §2 says SEO pays in 6–18 months, not week 1. Don't mistake it for an early-stage channel. |
|
||
| **External dependency** | None. |
|
||
| **Cost** | $0. |
|
||
| **Blocked by** | Nothing. |
|
||
|
||
### 3.3 — 🟡 Cloudflare Web Analytics + event counters (45 min)
|
||
|
||
| | |
|
||
|---|---|
|
||
| **What** | Enable Cloudflare Web Analytics on the Pages project (one click). Add a tiny inline `<script>` to each landing page that fires `cta_clicked` when the buy button is hit, before redirecting. Per `POST-LAUNCH.md` §1. |
|
||
| **Why** | Without this, the post-launch checklist is unrunnable. |
|
||
| **External dependency** | Cloudflare account (already from 1.4). |
|
||
| **Cost** | $0. |
|
||
| **Blocked by** | 1.4. |
|
||
| **Hint** | The Gumroad webhook captures `?from=<persona>` automatically — no extra wiring. |
|
||
|
||
### 3.4 — 🟡 Email autoresponder (post-purchase delivery + 3-touch onboarding) (2–3 hours)
|
||
|
||
| | |
|
||
|---|---|
|
||
| **What** | Gumroad's built-in delivery email plus three follow-up emails (day 1, day 7, day 14): "are you running into X?", "here's an advanced trick", "save your pipeline as JSON for next week". |
|
||
| **Why** | Increases activation, reduces refund risk, surfaces support questions while volume is small. |
|
||
| **External dependency** | Gumroad delivery is built-in. The 3-touch sequence needs a free email service (Resend's free tier or Mailchimp's free tier). |
|
||
| **Cost** | $0–$30/month per `BUSINESS.md` §9. |
|
||
| **Blocked by** | 2.5. |
|
||
|
||
---
|
||
|
||
## Phase 4 · First-buyer trigger and review
|
||
|
||
Per `PLAN.md` §4 decision triggers and `POST-LAUNCH.md` §4.
|
||
|
||
### 4.1 — 🟢 Run the monthly review (30 min, first Monday after launch)
|
||
|
||
| | |
|
||
|---|---|
|
||
| **What** | Follow `POST-LAUNCH.md` §2 — pull last-30-days demo events + Gumroad sales + refunds, compute the five numbers, decide ONE change. |
|
||
| **Why** | Without this discipline, the funnel drifts and the operator changes 5 things at once and learns nothing. |
|
||
| **External dependency** | None — analytics from 3.3, sales from 2.5. |
|
||
| **Cost** | $0. |
|
||
| **Blocked by** | 3.3 + 2.5. |
|
||
|
||
### 4.2 — 🟢 First paying customer (target: 90 days)
|
||
|
||
| | |
|
||
|---|---|
|
||
| **What** | The actual first sale. |
|
||
| **Why** | Per `BUSINESS.md` §6: validates the funnel; not the business. |
|
||
| **Trigger action** | Continue, no plan change. Make the first $1k/month within month 6. |
|
||
|
||
### 4.3 — 🔴 Zero-paid-in-90-days fallback (only fires if 4.2 doesn't)
|
||
|
||
| | |
|
||
|---|---|
|
||
| **What** | Per `POST-LAUNCH.md` §4 — audit the funnel, not the features. Run a 1-week outbound experiment to 30 niche contacts as a control (per `BUSINESS.md` §8 the no-touch revisit is allowed below $5k MRR if it produces signal). |
|
||
| **Why** | Distinguishes "no reach" from "no conversion" — they need different fixes. |
|
||
| **External dependency** | Operator's time. |
|
||
| **Cost** | The 10 hr/wk allocation already exists; this displaces other work. |
|
||
| **Blocked by** | The 90-day calendar trigger from 4.2. |
|
||
|
||
---
|
||
|
||
## Phase 5 · Steady state — what NOT to build
|
||
|
||
Per `PLAN.md` §5 (anti-temptations) and `DECISIONS.md` §8 (re-lock
|
||
triggers). The trap is treating "more code" as the answer when the
|
||
data says "more reach" or "more conversion." The five forbidden
|
||
moves until $5k/mo MRR:
|
||
|
||
| | Why locked |
|
||
|---|---|
|
||
| ❌ More tools (06–08) | `PLAN.md` §2.1 distribution-gate. Tool 09 was the exception; no others until first paid customer + one external review. |
|
||
| ❌ SaaS pivot | `DECISIONS.md` §4 — recurring infra conflicts with the lifestyle constraint. |
|
||
| ❌ Live chat / sales calls | `DECISIONS.md` §1 #8 — no-touch is locked until $5k/mo. |
|
||
| ❌ Custom integrations / one-off consulting | Breaks "build once, sell many." |
|
||
| ❌ Going broad on personas | `PLAN.md` §5 — "all small businesses" converts at 1 %; vertical converts at 5–15 %. |
|
||
|
||
---
|
||
|
||
## Triage table — what blocks what
|
||
|
||
```
|
||
Phase 1 (week 1) Phase 2 (week 2) Phase 3 (week 4)
|
||
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
|
||
│ 1.1 Push GH │──────────┐ │ 2.1 Apple │ ───┐ │ 3.1 Community│
|
||
│ 1.2 Demo │──┐ ├──▶│ Dev (1-2w) │ │ │ 3.2 SEO post │
|
||
│ 1.3 Domain │ │ │ │ 2.2 Build │ ───┤ │ 3.3 Analytics│
|
||
│ 1.4 Pages │◀─┘ │ │ 2.3 Sign │ ───┤ │ 3.4 Emails │
|
||
│ 1.5 Gumroad │──────────┘ │ 2.4 Copy │ │ └──────────────┘
|
||
│ 1.6 Verify │ │ 2.5 Activate │ ◀──┘
|
||
└──────────────┘ └──────────────┘ ↓
|
||
┌──────────────┐
|
||
│ 4.1 Monthly │
|
||
│ 4.2 First $ │
|
||
│ 4.3 Fallback │
|
||
└──────────────┘
|
||
```
|
||
|
||
The longest blocking path is **2.1 Apple Developer Program**
|
||
(1–2 weeks). Start it on day 1 of week 1 — it unblocks everything in
|
||
Phase 2 and you can do all of Phase 1 while waiting.
|
||
|
||
---
|
||
|
||
## Time estimate — total operator time
|
||
|
||
| Phase | Hours | Wall-clock |
|
||
|---|---|---|
|
||
| Phase 1 | ~1 hour | end of week 1 (mostly waiting for builds) |
|
||
| Phase 2 | ~1 day | end of week 2 (gated by Apple Dev approval) |
|
||
| Phase 3 | ~6 hours | week 3–4 |
|
||
| Phase 4 | 30 min/month | ongoing |
|
||
| **Total to launch** | **~12 hours of operator time** | **~14 days wall-clock** |
|
||
|
||
Well inside the 10 hr/wk constraint of `DECISIONS.md` §1 #2.
|
||
|
||
---
|
||
|
||
## The thing that decides whether the plan works
|
||
|
||
Not the build. Not the deploy. Not even the first sale.
|
||
|
||
**The discipline of running the monthly review** in Phase 4 — and the
|
||
"decide ONE thing per month" rule from `POST-LAUNCH.md` §2 — is what
|
||
separates "this product exists" from "this product compounds." Every
|
||
feature added before the funnel is measured is a guess; every change
|
||
made after the monthly review is informed.
|
||
|
||
Don't skip 4.1.
|