Files
datatools-dev/docs/NEXT-STEPS.md
Michael ee0b1f6f6b docs: design notes for future PDF→CSV tool
New ``docs/FUTURE-TOOLS.md`` captures post-launch tool ideas with a
consistent shape — What / Why / Can we ship now / Approach / GUI
sketch / Effort / Risks / Ship criteria. Resting place for things
the new-tool freeze in ``PLAN.md`` §2.1 refuses to build but that
keep coming up.

First entry: **#10 PDF → CSV extractor** (bank statements et al.).

Key facts captured:

- **Current state**: no PDF infrastructure exists. Zero PDF
  dependencies in requirements.txt; zero PDF-touching code under
  ``src/``. The only "PDF" string in the codebase is the planned-
  output copy for the Quality Check tool, unrelated to extraction.
- **Library picks**: pdfplumber as the extraction core (BSD-3,
  no native compiler, gives coordinate-aware text), Tesseract via
  pytesseract as the OCR fallback for scanned PDFs,
  streamlit-drawable-canvas as the region-picker component.
- **GUI sketch**: user draws a header strip + a row template on a
  rendered page; the tool applies that template across N pages,
  saves the template by layout fingerprint for next month's
  statement, emits CSV.
- **Effort phased A–E**: 3–4 weeks for a text-only MVP; 6–10
  weeks for a polished version with multi-page template recall;
  +2–3 weeks if scanned-PDF OCR is required.
- **Difficulty**: medium-hard. The pieces are well-trodden; the
  combination (region selection that persists across pages and
  across documents with similar layouts) is where the engineering
  goes.
- **Ship criteria**: ≥1 paying customer + ≥3 paid or ≥5 demo
  emails asking for PDF extraction + the bookkeeper niche
  converting at least one customer first. None have fired.

Cross-references added:

- ``docs/REQUIREMENTS.md`` §11: pointer to FUTURE-TOOLS.md for
  parked tool ideas, with a one-paragraph summary of #10.
- ``docs/PLAN.md`` §2.1: notes that the freeze parks future tools
  in FUTURE-TOOLS.md and explicitly names #10 as the current
  highest-pressure entry.
- ``docs/NEXT-STEPS.md`` Phase 5 "what NOT to build" table: a new
  row for the PDF tool tied to the same ship-trigger language.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 01:52:42 +00:00

330 lines
18 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Next Steps — from "code complete" to first paying customer
> Creator-only. The runnable checklist that takes the operator from
> the current state (1,729 tests passing, 6 tools shipped, 0 paying
> customers) through launch and into the first 90 days.
> **Version**: 1.0 · **Adopted**: 2026-05-01
This document is the **single answer** to "what now?". Every line
item has an owner, a time estimate, a blocker, a cost, and the
external dependency that makes it un-shippable today. Items are
ordered by **must-finish-before-the-next-item** — work top-down.
Cross-references:
- Strategy: `PLAN.md` (the 8 strategic moves + the 90-day sequence)
- Demo specs: `DEMO-PLAN.md`
- Deployment mechanics: `DEPLOYMENT.md`
- Post-launch measurement: `POST-LAUNCH.md`
- Locked criteria: `DECISIONS.md` §1
Status legend:
- **🟢** Done — the asset exists in this repo
- **🟡** Buildable now — no external dependency needed
- **🟠** External dependency — needs an account / signup / payment
- **🔴** Manual / requires user input that can't be automated
---
## Phase 0 · What's already done (skip ahead)
| ✓ | Item | Where it lives |
|---|------|----------------|
| 🟢 | 6 of 9 tools shipped (Dedup, Text, Format, Missing, Column-Map, Pipeline) | `src/core/`, `src/cli_*.py`, `src/gui/pages/` |
| 🟢 | Automated Workflows (the retention multiplier per `PLAN.md` §2.6) | `src/core/pipeline.py`, `src/cli_pipeline.py`, `src/gui/pages/9_Pipeline_Runner.py` |
| 🟢 | 1,729 passing tests · 0 skipped · 0 xfailed | `tests/` |
| 🟢 | 3 niche demo datasets + pre-tuned pipeline JSONs | `samples/demo/` |
| 🟢 | Streamlit demo app + Cloud entry shim | `streamlit_app.py`, `src/gui/app_demo.py` |
| 🟢 | 3 niche landing pages + apex chooser + shared CSS | `landing/` |
| 🟢 | Landing-page deploy script (URL-substitution + sitemap + 404 + favicon) | `landing/deploy.py` |
| 🟢 | Strategic plan + demo plan + post-launch measurement plan + deployment doc | `docs/PLAN.md`, `DEMO-PLAN.md`, `POST-LAUNCH.md`, `DEPLOYMENT.md` |
| 🟢 | PyInstaller bundle scaffold (spec + launcher + Streamlit hook + README) | `build/` |
| 🟢 | Customer-facing copy single-source-of-truth (landing + demo + email subjects + Gumroad listing) | `marketing/COPY.md` |
| 🟢 | 9 niche-community post drafts (3 posts × 3 niches: bookkeeper, revops, shopify-pet) | `marketing/community-posts/` |
| 🟢 | 18 email drafts (Gumroad delivery + 5-touch onboarding × 3 niches) | `marketing/emails/` |
---
## Phase 1 · Stand the funnel up (target: end of week 1, ~6 hours total work)
The bottleneck right now is **distribution, not feature count**.
Everything in this phase is about turning code into a URL the user
can hit.
### 1.1 — 🟠 Push to GitHub (5 min)
| | |
|---|---|
| **What** | `git init` (if not already), commit, push to a private or public GitHub repo. |
| **Why** | Cloud deploy services need a Git source. Streamlit Community Cloud auto-deploys on push to `main`. |
| **External dependency** | A GitHub account (free). |
| **Cost** | $0. |
| **Blocked by** | Nothing. |
### 1.2 — 🟠 Deploy the demo to Streamlit Community Cloud (15 min)
| | |
|---|---|
| **What** | Follow `DEPLOYMENT.md` Part 1. Result: a public URL like `https://datatools-demo.streamlit.app`. |
| **Why** | The landing pages embed this in their iframe. Without it, every "Run pipeline" button on the landing pages 404s. |
| **External dependency** | Free Streamlit Community Cloud account, signed in via GitHub. |
| **Cost** | $0. |
| **Blocked by** | 1.1 (the repo must be on GitHub). |
| **Watch out for** | First build takes 23 min while Cloud installs deps. Subsequent deploys < 30 s. |
### 1.3 — 🟠 Buy the apex domain (5 min, ~$15/year)
| | |
|---|---|
| **What** | Register `datatools.app` (or whichever) at any registrar. Point the nameservers at Cloudflare. |
| **Why** | The landing-page canonical URLs and CTA buttons refer to this domain. Pages can deploy to a free `*.pages.dev` URL first if you want to defer this. |
| **External dependency** | A registrar account; payment method. |
| **Cost** | ~$15/year. Within `BUSINESS.md` §9 cost cap. |
| **Blocked by** | Nothing — can run in parallel with 1.1 / 1.2. |
### 1.4 — 🟠 Deploy the landing pages to Cloudflare Pages (15 min)
| | |
|---|---|
| **What** | Follow `DEPLOYMENT.md` Part 2. Run `python3 landing/deploy.py` with the operator's URLs in `deploy.config.json`, then `wrangler pages deploy landing/dist` (or drag-drop). |
| **Why** | This is the marketing surface. Three persona URLs go live as soon as it deploys. |
| **External dependency** | Free Cloudflare account; Wrangler CLI (optional — drag-drop works too). |
| **Cost** | $0. |
| **Blocked by** | 1.2 (the demo URL goes into `deploy.config.json`); ideally 1.3 for the custom domain. |
| **Watch out for** | The `deploy.config.json` file is gitignored — your real URLs never get committed. |
### 1.5 — 🟠 Open a Gumroad listing (15 min) **— stub for now**
| | |
|---|---|
| **What** | Create a Gumroad account, draft a listing with a single screenshot + the landing-page copy, set price to $49. Don't enable purchases yet — leave it as a draft. |
| **Why** | The CTA buttons on the landing pages link to `gumroad.com/l/datatools?from=<persona>`. Until the listing exists, those buttons 404. |
| **External dependency** | Free Gumroad account; Stripe-connected payout method (defer to Phase 2). |
| **Cost** | $0 to draft, ~10% per sale once live. |
| **Blocked by** | Nothing — can run in parallel with 1.11.4. |
| **Watch out for** | The listing URL must be `gumroad.com/l/datatools` to match the landing-page hard-coded CTAs. If you pick a different slug, update `landing/deploy.config.json``gumroad_listing` and re-run `deploy.py`. |
### 1.6 — 🟡 End-to-end smoke verification (10 min)
| | |
|---|---|
| **What** | Run the four `curl` commands from `DEPLOYMENT.md` Part 4. All four landing pages, all three demo personas, sitemap.xml. |
| **Why** | First time something can break is the moment a real user hits it. Ten minutes of `curl` saves a week of "why is conversion zero." |
| **External dependency** | None. |
| **Cost** | $0. |
| **Blocked by** | 1.4 + 1.2. |
---
## Phase 2 · Make it sellable (target: end of week 2)
### 2.1 — 🟠 Apple Developer Program enrollment (5 min to start, 12 weeks lead)
| | |
|---|---|
| **What** | Per `BUSINESS.md` §10. Required for code-signing the macOS installer. |
| **External dependency** | Apple ID + government-issued ID (individual) or D-U-N-S number (org). |
| **Cost** | $99/year. |
| **Blocked by** | Nothing — start ASAP because of the 12 week approval window. The pipeline waits on this; nothing else does. |
### 2.2 — 🟢 PyInstaller spec + cross-platform build *(scaffold shipped — runs need per-OS hosts)*
| | |
|---|---|
| **What** | `build/datatools.spec` + `build/launcher.py` + `build/hooks/hook-streamlit.py` bundle the Streamlit GUI + all 6 tools + samples into one app. Folder-mode (one-dir) by default; Mac `.dmg`, Windows `.exe`, Linux `.tar.gz`. Per-platform recipe in `build/README.md`. |
| **Why** | The buyer's deliverable. Without this, there is nothing to attach to the Gumroad listing. |
| **External dependency** | `pip install pyinstaller`. None for Linux/Mac builds. Windows builds need a Windows machine or a CI matrix runner. |
| **Cost** | $0 (GitHub Actions matrix runners are free for public repos). |
| **Blocked by** | Nothing for the spec; 2.1 for the signed Mac build. |
| **Watch out for** | Streamlit's bundle size lands around 300500 MB per `DECISIONS.md` §4c — accepted tradeoff. PyInstaller cross-compilation isn't supported — Mac builds need a Mac, Windows builds need a Windows host. |
| **Where it lives** | `build/datatools.spec`, `build/launcher.py`, `build/hooks/hook-streamlit.py`, `build/README.md` |
### 2.3 — 🟡 macOS sign + notarize (30 min once Apple Dev is approved)
| | |
|---|---|
| **What** | Sign the `.dmg`, submit to Apple's notarization service, staple the ticket. |
| **Why** | Without it, Gatekeeper hard-blocks the install with no obvious way out (per `BUSINESS.md` §10). The buyer gives up. |
| **External dependency** | Apple Developer Program (2.1). |
| **Cost** | $0 incremental over 2.1. |
| **Blocked by** | 2.1 + 2.2. |
### 2.4 — 🟢 Refund policy + license + Gumroad listing copy *(drafted in COPY.md)*
| | |
|---|---|
| **What** | A clear refund policy (30-day no-questions) + a software licence text + the Gumroad listing description. |
| **Why** | Required by Gumroad's terms; surfaces on the listing page; protects against buyer disputes. |
| **External dependency** | None — operator authoring. |
| **Cost** | $0. |
| **Blocked by** | Nothing. |
| **Where it lives** | `marketing/COPY.md` § 5 (Gumroad listing — full title / tagline / description / bullets / refund text / tags). Refund window is also referenced in COPY.md § 0 so it stays consistent across surfaces. |
| **Still to author** | A short licence text (one-time perpetual use, no redistribution) — not in COPY.md yet. Recommend Polyform Strict 1.0.0 or a 10-line bespoke text. |
### 2.5 — 🟠 Activate the Gumroad listing (15 min)
| | |
|---|---|
| **What** | Upload the cross-platform installers from 2.2/2.3, paste the copy from 2.4, set $49 price, enable purchases, configure Stripe payout. |
| **Why** | This is the "buy" button finally working. |
| **External dependency** | Gumroad + Stripe account; the installers from 2.2/2.3. |
| **Cost** | ~10 % per sale. |
| **Blocked by** | 2.2, 2.3, 2.4. |
---
## Phase 3 · First-traffic ignition (target: end of week 4)
Per `PLAN.md` §3 and `BUSINESS.md` §7 channel priorities. The strict
no-touch constraint of `DECISIONS.md` §1 #8 makes channel choice
matter — these are the only ones that fit.
### 3.1 — 🟢 First niche-community post *(9 drafts ready — pick one and personalize)*
| | |
|---|---|
| **What** | One value-first post in one niche-relevant community (e.g. r/Bookkeeping, r/revops, r/shopify; IndieHackers; niche Slacks/Discords). Lead with the demo URL, not the buy URL. |
| **Why** | Marketplaces alone don't drive discovery. Communities are the only first-touch channel that works under no-touch. |
| **External dependency** | Account in the chosen community; understand its self-promotion rules. |
| **Cost** | $0. |
| **Blocked by** | 1.4 (demo URL must work). |
| **Hint** | Pick the niche the operator knows best. Don't post all three drafts in the same community in the same week — see `marketing/community-posts/README.md` for cadence guidance. |
| **Where it lives** | `marketing/community-posts/{bookkeeper,revops,shopify-pet}/0{1-story,2-tip,3-soft-offer}.md` — 3 posts × 3 niches = 9 drafts. |
### 3.2 — 🟡 First long-tail SEO blog post (46 hours)
| | |
|---|---|
| **What** | One 8001,500-word post on `datatools.app/blog/` (sub-route of Cloudflare Pages or Substack) targeting one niche keyword from `BUSINESS.md` §7. Topic: a real problem you've encountered, the cleanup steps, the demo URL at the end. |
| **Why** | Compounding asset — `BUSINESS.md` §2 says SEO pays in 618 months, not week 1. Don't mistake it for an early-stage channel. |
| **External dependency** | None. |
| **Cost** | $0. |
| **Blocked by** | Nothing. |
### 3.3 — 🟡 Cloudflare Web Analytics + event counters (45 min)
| | |
|---|---|
| **What** | Enable Cloudflare Web Analytics on the Pages project (one click). Add a tiny inline `<script>` to each landing page that fires `cta_clicked` when the buy button is hit, before redirecting. Per `POST-LAUNCH.md` §1. |
| **Why** | Without this, the post-launch checklist is unrunnable. |
| **External dependency** | Cloudflare account (already from 1.4). |
| **Cost** | $0. |
| **Blocked by** | 1.4. |
| **Hint** | The Gumroad webhook captures `?from=<persona>` automatically — no extra wiring. |
### 3.4 — 🟢 Email autoresponder *(18 drafts ready — paste into provider)*
| | |
|---|---|
| **What** | Gumroad's built-in delivery email plus a **5-touch** onboarding sequence (Day 1, 3, 7, 14, 30) per niche. Per-niche segmentation via Gumroad's "What do you do?" custom field at checkout. |
| **Why** | Increases activation, reduces refund risk, surfaces support questions while volume is small. The Day-1 email in particular drives buyers from "I bought it" to "I ran it" — buyers who don't open within 72h refund at ~3× the rate of buyers who do. |
| **External dependency** | Gumroad delivery is built-in. The 5-touch sequence needs an email service that supports tag-based drips (Buttondown is the cheapest fit; ConvertKit if you want HTML editor; Resend if you'll script it). |
| **Cost** | $0$30/month per `BUSINESS.md` §9. |
| **Blocked by** | 2.5. |
| **Where it lives** | `marketing/emails/{bookkeeper,revops,shopify-pet}/{00-delivery,01-day1,02-day3,03-day7,04-day14,05-day30}.md` — 6 emails × 3 niches = 18 drafts. Variables (`{{first_name}}`, `{{download_url}}`, `{{sample_file_url}}`, `{{landing_page}}`) are listed in `marketing/emails/README.md`. |
| **Sequence policy** | Pause if buyer replies (until you reply); kill on refund request; skip Day 14 + 30 if buyer has already engaged via support. See `marketing/emails/README.md` for full quiet rules. |
---
## Phase 4 · First-buyer trigger and review
Per `PLAN.md` §4 decision triggers and `POST-LAUNCH.md` §4.
### 4.1 — 🟢 Run the monthly review (30 min, first Monday after launch)
| | |
|---|---|
| **What** | Follow `POST-LAUNCH.md` §2 — pull last-30-days demo events + Gumroad sales + refunds, compute the five numbers, decide ONE change. |
| **Why** | Without this discipline, the funnel drifts and the operator changes 5 things at once and learns nothing. |
| **External dependency** | None — analytics from 3.3, sales from 2.5. |
| **Cost** | $0. |
| **Blocked by** | 3.3 + 2.5. |
### 4.2 — 🟢 First paying customer (target: 90 days)
| | |
|---|---|
| **What** | The actual first sale. |
| **Why** | Per `BUSINESS.md` §6: validates the funnel; not the business. |
| **Trigger action** | Continue, no plan change. Make the first $1k/month within month 6. |
### 4.3 — 🔴 Zero-paid-in-90-days fallback (only fires if 4.2 doesn't)
| | |
|---|---|
| **What** | Per `POST-LAUNCH.md` §4 — audit the funnel, not the features. Run a 1-week outbound experiment to 30 niche contacts as a control (per `BUSINESS.md` §8 the no-touch revisit is allowed below $5k MRR if it produces signal). |
| **Why** | Distinguishes "no reach" from "no conversion" — they need different fixes. |
| **External dependency** | Operator's time. |
| **Cost** | The 10 hr/wk allocation already exists; this displaces other work. |
| **Blocked by** | The 90-day calendar trigger from 4.2. |
---
## Phase 5 · Steady state — what NOT to build
Per `PLAN.md` §5 (anti-temptations) and `DECISIONS.md` §8 (re-lock
triggers). The trap is treating "more code" as the answer when the
data says "more reach" or "more conversion." The five forbidden
moves until $5k/mo MRR:
| | Why locked |
|---|---|
| ❌ More tools (0608) | `PLAN.md` §2.1 distribution-gate. Tool 09 was the exception; no others until first paid customer + one external review. |
| ❌ Tool #10 PDF → CSV (the most-asked-for adjacency) | Parked in `docs/FUTURE-TOOLS.md` with full design + 34 wk MVP / 610 wk polished estimate. Ship trigger: paying customer + ≥3 paid or ≥5 demo emails asking for PDF + the bookkeeper niche converting first. None have fired yet. |
| ❌ SaaS pivot | `DECISIONS.md` §4 — recurring infra conflicts with the lifestyle constraint. |
| ❌ Live chat / sales calls | `DECISIONS.md` §1 #8 — no-touch is locked until $5k/mo. |
| ❌ Custom integrations / one-off consulting | Breaks "build once, sell many." |
| ❌ Going broad on personas | `PLAN.md` §5 — "all small businesses" converts at 1 %; vertical converts at 515 %. |
---
## Triage table — what blocks what
```
Phase 1 (week 1) Phase 2 (week 2) Phase 3 (week 4)
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ 1.1 Push GH │──────────┐ │ 2.1 Apple │ ───┐ │ 3.1 Community│
│ 1.2 Demo │──┐ ├──▶│ Dev (1-2w) │ │ │ 3.2 SEO post │
│ 1.3 Domain │ │ │ │ 2.2 Build │ ───┤ │ 3.3 Analytics│
│ 1.4 Pages │◀─┘ │ │ 2.3 Sign │ ───┤ │ 3.4 Emails │
│ 1.5 Gumroad │──────────┘ │ 2.4 Copy │ │ └──────────────┘
│ 1.6 Verify │ │ 2.5 Activate │ ◀──┘
└──────────────┘ └──────────────┘ ↓
┌──────────────┐
│ 4.1 Monthly │
│ 4.2 First $ │
│ 4.3 Fallback │
└──────────────┘
```
The longest blocking path is **2.1 Apple Developer Program**
(12 weeks). Start it on day 1 of week 1 — it unblocks everything in
Phase 2 and you can do all of Phase 1 while waiting.
---
## Time estimate — total operator time
| Phase | Hours | Wall-clock |
|---|---|---|
| Phase 1 | ~1 hour | end of week 1 (mostly waiting for builds) |
| Phase 2 | ~1 day | end of week 2 (gated by Apple Dev approval) |
| Phase 3 | ~6 hours | week 34 |
| Phase 4 | 30 min/month | ongoing |
| **Total to launch** | **~12 hours of operator time** | **~14 days wall-clock** |
Well inside the 10 hr/wk constraint of `DECISIONS.md` §1 #2.
---
## The thing that decides whether the plan works
Not the build. Not the deploy. Not even the first sale.
**The discipline of running the monthly review** in Phase 4 — and the
"decide ONE thing per month" rule from `POST-LAUNCH.md` §2 — is what
separates "this product exists" from "this product compounds." Every
feature added before the funnel is measured is a guess; every change
made after the monthly review is informed.
Don't skip 4.1.