Files
datatools-dev/docs/NEXT-STEPS.md
Michael ee0b1f6f6b docs: design notes for future PDF→CSV tool
New ``docs/FUTURE-TOOLS.md`` captures post-launch tool ideas with a
consistent shape — What / Why / Can we ship now / Approach / GUI
sketch / Effort / Risks / Ship criteria. Resting place for things
the new-tool freeze in ``PLAN.md`` §2.1 refuses to build but that
keep coming up.

First entry: **#10 PDF → CSV extractor** (bank statements et al.).

Key facts captured:

- **Current state**: no PDF infrastructure exists. Zero PDF
  dependencies in requirements.txt; zero PDF-touching code under
  ``src/``. The only "PDF" string in the codebase is the planned-
  output copy for the Quality Check tool, unrelated to extraction.
- **Library picks**: pdfplumber as the extraction core (BSD-3,
  no native compiler, gives coordinate-aware text), Tesseract via
  pytesseract as the OCR fallback for scanned PDFs,
  streamlit-drawable-canvas as the region-picker component.
- **GUI sketch**: user draws a header strip + a row template on a
  rendered page; the tool applies that template across N pages,
  saves the template by layout fingerprint for next month's
  statement, emits CSV.
- **Effort phased A–E**: 3–4 weeks for a text-only MVP; 6–10
  weeks for a polished version with multi-page template recall;
  +2–3 weeks if scanned-PDF OCR is required.
- **Difficulty**: medium-hard. The pieces are well-trodden; the
  combination (region selection that persists across pages and
  across documents with similar layouts) is where the engineering
  goes.
- **Ship criteria**: ≥1 paying customer + ≥3 paid or ≥5 demo
  emails asking for PDF extraction + the bookkeeper niche
  converting at least one customer first. None have fired.

Cross-references added:

- ``docs/REQUIREMENTS.md`` §11: pointer to FUTURE-TOOLS.md for
  parked tool ideas, with a one-paragraph summary of #10.
- ``docs/PLAN.md`` §2.1: notes that the freeze parks future tools
  in FUTURE-TOOLS.md and explicitly names #10 as the current
  highest-pressure entry.
- ``docs/NEXT-STEPS.md`` Phase 5 "what NOT to build" table: a new
  row for the PDF tool tied to the same ship-trigger language.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 01:52:42 +00:00

18 KiB
Raw Permalink Blame History

Next Steps — from "code complete" to first paying customer

Creator-only. The runnable checklist that takes the operator from the current state (1,729 tests passing, 6 tools shipped, 0 paying customers) through launch and into the first 90 days. Version: 1.0 · Adopted: 2026-05-01

This document is the single answer to "what now?". Every line item has an owner, a time estimate, a blocker, a cost, and the external dependency that makes it un-shippable today. Items are ordered by must-finish-before-the-next-item — work top-down.

Cross-references:

  • Strategy: PLAN.md (the 8 strategic moves + the 90-day sequence)
  • Demo specs: DEMO-PLAN.md
  • Deployment mechanics: DEPLOYMENT.md
  • Post-launch measurement: POST-LAUNCH.md
  • Locked criteria: DECISIONS.md §1

Status legend:

  • 🟢 Done — the asset exists in this repo
  • 🟡 Buildable now — no external dependency needed
  • 🟠 External dependency — needs an account / signup / payment
  • 🔴 Manual / requires user input that can't be automated

Phase 0 · What's already done (skip ahead)

Item Where it lives
🟢 6 of 9 tools shipped (Dedup, Text, Format, Missing, Column-Map, Pipeline) src/core/, src/cli_*.py, src/gui/pages/
🟢 Automated Workflows (the retention multiplier per PLAN.md §2.6) src/core/pipeline.py, src/cli_pipeline.py, src/gui/pages/9_Pipeline_Runner.py
🟢 1,729 passing tests · 0 skipped · 0 xfailed tests/
🟢 3 niche demo datasets + pre-tuned pipeline JSONs samples/demo/
🟢 Streamlit demo app + Cloud entry shim streamlit_app.py, src/gui/app_demo.py
🟢 3 niche landing pages + apex chooser + shared CSS landing/
🟢 Landing-page deploy script (URL-substitution + sitemap + 404 + favicon) landing/deploy.py
🟢 Strategic plan + demo plan + post-launch measurement plan + deployment doc docs/PLAN.md, DEMO-PLAN.md, POST-LAUNCH.md, DEPLOYMENT.md
🟢 PyInstaller bundle scaffold (spec + launcher + Streamlit hook + README) build/
🟢 Customer-facing copy single-source-of-truth (landing + demo + email subjects + Gumroad listing) marketing/COPY.md
🟢 9 niche-community post drafts (3 posts × 3 niches: bookkeeper, revops, shopify-pet) marketing/community-posts/
🟢 18 email drafts (Gumroad delivery + 5-touch onboarding × 3 niches) marketing/emails/

Phase 1 · Stand the funnel up (target: end of week 1, ~6 hours total work)

The bottleneck right now is distribution, not feature count. Everything in this phase is about turning code into a URL the user can hit.

1.1 — 🟠 Push to GitHub (5 min)

What git init (if not already), commit, push to a private or public GitHub repo.
Why Cloud deploy services need a Git source. Streamlit Community Cloud auto-deploys on push to main.
External dependency A GitHub account (free).
Cost $0.
Blocked by Nothing.

1.2 — 🟠 Deploy the demo to Streamlit Community Cloud (15 min)

What Follow DEPLOYMENT.md Part 1. Result: a public URL like https://datatools-demo.streamlit.app.
Why The landing pages embed this in their iframe. Without it, every "Run pipeline" button on the landing pages 404s.
External dependency Free Streamlit Community Cloud account, signed in via GitHub.
Cost $0.
Blocked by 1.1 (the repo must be on GitHub).
Watch out for First build takes 23 min while Cloud installs deps. Subsequent deploys < 30 s.

1.3 — 🟠 Buy the apex domain (5 min, ~$15/year)

What Register datatools.app (or whichever) at any registrar. Point the nameservers at Cloudflare.
Why The landing-page canonical URLs and CTA buttons refer to this domain. Pages can deploy to a free *.pages.dev URL first if you want to defer this.
External dependency A registrar account; payment method.
Cost ~$15/year. Within BUSINESS.md §9 cost cap.
Blocked by Nothing — can run in parallel with 1.1 / 1.2.

1.4 — 🟠 Deploy the landing pages to Cloudflare Pages (15 min)

What Follow DEPLOYMENT.md Part 2. Run python3 landing/deploy.py with the operator's URLs in deploy.config.json, then wrangler pages deploy landing/dist (or drag-drop).
Why This is the marketing surface. Three persona URLs go live as soon as it deploys.
External dependency Free Cloudflare account; Wrangler CLI (optional — drag-drop works too).
Cost $0.
Blocked by 1.2 (the demo URL goes into deploy.config.json); ideally 1.3 for the custom domain.
Watch out for The deploy.config.json file is gitignored — your real URLs never get committed.

1.5 — 🟠 Open a Gumroad listing (15 min) — stub for now

What Create a Gumroad account, draft a listing with a single screenshot + the landing-page copy, set price to $49. Don't enable purchases yet — leave it as a draft.
Why The CTA buttons on the landing pages link to gumroad.com/l/datatools?from=<persona>. Until the listing exists, those buttons 404.
External dependency Free Gumroad account; Stripe-connected payout method (defer to Phase 2).
Cost $0 to draft, ~10% per sale once live.
Blocked by Nothing — can run in parallel with 1.11.4.
Watch out for The listing URL must be gumroad.com/l/datatools to match the landing-page hard-coded CTAs. If you pick a different slug, update landing/deploy.config.jsongumroad_listing and re-run deploy.py.

1.6 — 🟡 End-to-end smoke verification (10 min)

What Run the four curl commands from DEPLOYMENT.md Part 4. All four landing pages, all three demo personas, sitemap.xml.
Why First time something can break is the moment a real user hits it. Ten minutes of curl saves a week of "why is conversion zero."
External dependency None.
Cost $0.
Blocked by 1.4 + 1.2.

Phase 2 · Make it sellable (target: end of week 2)

2.1 — 🟠 Apple Developer Program enrollment (5 min to start, 12 weeks lead)

What Per BUSINESS.md §10. Required for code-signing the macOS installer.
External dependency Apple ID + government-issued ID (individual) or D-U-N-S number (org).
Cost $99/year.
Blocked by Nothing — start ASAP because of the 12 week approval window. The pipeline waits on this; nothing else does.

2.2 — 🟢 PyInstaller spec + cross-platform build (scaffold shipped — runs need per-OS hosts)

What build/datatools.spec + build/launcher.py + build/hooks/hook-streamlit.py bundle the Streamlit GUI + all 6 tools + samples into one app. Folder-mode (one-dir) by default; Mac .dmg, Windows .exe, Linux .tar.gz. Per-platform recipe in build/README.md.
Why The buyer's deliverable. Without this, there is nothing to attach to the Gumroad listing.
External dependency pip install pyinstaller. None for Linux/Mac builds. Windows builds need a Windows machine or a CI matrix runner.
Cost $0 (GitHub Actions matrix runners are free for public repos).
Blocked by Nothing for the spec; 2.1 for the signed Mac build.
Watch out for Streamlit's bundle size lands around 300500 MB per DECISIONS.md §4c — accepted tradeoff. PyInstaller cross-compilation isn't supported — Mac builds need a Mac, Windows builds need a Windows host.
Where it lives build/datatools.spec, build/launcher.py, build/hooks/hook-streamlit.py, build/README.md

2.3 — 🟡 macOS sign + notarize (30 min once Apple Dev is approved)

What Sign the .dmg, submit to Apple's notarization service, staple the ticket.
Why Without it, Gatekeeper hard-blocks the install with no obvious way out (per BUSINESS.md §10). The buyer gives up.
External dependency Apple Developer Program (2.1).
Cost $0 incremental over 2.1.
Blocked by 2.1 + 2.2.

2.4 — 🟢 Refund policy + license + Gumroad listing copy (drafted in COPY.md)

What A clear refund policy (30-day no-questions) + a software licence text + the Gumroad listing description.
Why Required by Gumroad's terms; surfaces on the listing page; protects against buyer disputes.
External dependency None — operator authoring.
Cost $0.
Blocked by Nothing.
Where it lives marketing/COPY.md § 5 (Gumroad listing — full title / tagline / description / bullets / refund text / tags). Refund window is also referenced in COPY.md § 0 so it stays consistent across surfaces.
Still to author A short licence text (one-time perpetual use, no redistribution) — not in COPY.md yet. Recommend Polyform Strict 1.0.0 or a 10-line bespoke text.

2.5 — 🟠 Activate the Gumroad listing (15 min)

What Upload the cross-platform installers from 2.2/2.3, paste the copy from 2.4, set $49 price, enable purchases, configure Stripe payout.
Why This is the "buy" button finally working.
External dependency Gumroad + Stripe account; the installers from 2.2/2.3.
Cost ~10 % per sale.
Blocked by 2.2, 2.3, 2.4.

Phase 3 · First-traffic ignition (target: end of week 4)

Per PLAN.md §3 and BUSINESS.md §7 channel priorities. The strict no-touch constraint of DECISIONS.md §1 #8 makes channel choice matter — these are the only ones that fit.

3.1 — 🟢 First niche-community post (9 drafts ready — pick one and personalize)

What One value-first post in one niche-relevant community (e.g. r/Bookkeeping, r/revops, r/shopify; IndieHackers; niche Slacks/Discords). Lead with the demo URL, not the buy URL.
Why Marketplaces alone don't drive discovery. Communities are the only first-touch channel that works under no-touch.
External dependency Account in the chosen community; understand its self-promotion rules.
Cost $0.
Blocked by 1.4 (demo URL must work).
Hint Pick the niche the operator knows best. Don't post all three drafts in the same community in the same week — see marketing/community-posts/README.md for cadence guidance.
Where it lives marketing/community-posts/{bookkeeper,revops,shopify-pet}/0{1-story,2-tip,3-soft-offer}.md — 3 posts × 3 niches = 9 drafts.

3.2 — 🟡 First long-tail SEO blog post (46 hours)

What One 8001,500-word post on datatools.app/blog/ (sub-route of Cloudflare Pages or Substack) targeting one niche keyword from BUSINESS.md §7. Topic: a real problem you've encountered, the cleanup steps, the demo URL at the end.
Why Compounding asset — BUSINESS.md §2 says SEO pays in 618 months, not week 1. Don't mistake it for an early-stage channel.
External dependency None.
Cost $0.
Blocked by Nothing.

3.3 — 🟡 Cloudflare Web Analytics + event counters (45 min)

What Enable Cloudflare Web Analytics on the Pages project (one click). Add a tiny inline <script> to each landing page that fires cta_clicked when the buy button is hit, before redirecting. Per POST-LAUNCH.md §1.
Why Without this, the post-launch checklist is unrunnable.
External dependency Cloudflare account (already from 1.4).
Cost $0.
Blocked by 1.4.
Hint The Gumroad webhook captures ?from=<persona> automatically — no extra wiring.

3.4 — 🟢 Email autoresponder (18 drafts ready — paste into provider)

What Gumroad's built-in delivery email plus a 5-touch onboarding sequence (Day 1, 3, 7, 14, 30) per niche. Per-niche segmentation via Gumroad's "What do you do?" custom field at checkout.
Why Increases activation, reduces refund risk, surfaces support questions while volume is small. The Day-1 email in particular drives buyers from "I bought it" to "I ran it" — buyers who don't open within 72h refund at ~3× the rate of buyers who do.
External dependency Gumroad delivery is built-in. The 5-touch sequence needs an email service that supports tag-based drips (Buttondown is the cheapest fit; ConvertKit if you want HTML editor; Resend if you'll script it).
Cost $0$30/month per BUSINESS.md §9.
Blocked by 2.5.
Where it lives marketing/emails/{bookkeeper,revops,shopify-pet}/{00-delivery,01-day1,02-day3,03-day7,04-day14,05-day30}.md — 6 emails × 3 niches = 18 drafts. Variables ({{first_name}}, {{download_url}}, {{sample_file_url}}, {{landing_page}}) are listed in marketing/emails/README.md.
Sequence policy Pause if buyer replies (until you reply); kill on refund request; skip Day 14 + 30 if buyer has already engaged via support. See marketing/emails/README.md for full quiet rules.

Phase 4 · First-buyer trigger and review

Per PLAN.md §4 decision triggers and POST-LAUNCH.md §4.

4.1 — 🟢 Run the monthly review (30 min, first Monday after launch)

What Follow POST-LAUNCH.md §2 — pull last-30-days demo events + Gumroad sales + refunds, compute the five numbers, decide ONE change.
Why Without this discipline, the funnel drifts and the operator changes 5 things at once and learns nothing.
External dependency None — analytics from 3.3, sales from 2.5.
Cost $0.
Blocked by 3.3 + 2.5.

4.2 — 🟢 First paying customer (target: 90 days)

What The actual first sale.
Why Per BUSINESS.md §6: validates the funnel; not the business.
Trigger action Continue, no plan change. Make the first $1k/month within month 6.

4.3 — 🔴 Zero-paid-in-90-days fallback (only fires if 4.2 doesn't)

What Per POST-LAUNCH.md §4 — audit the funnel, not the features. Run a 1-week outbound experiment to 30 niche contacts as a control (per BUSINESS.md §8 the no-touch revisit is allowed below $5k MRR if it produces signal).
Why Distinguishes "no reach" from "no conversion" — they need different fixes.
External dependency Operator's time.
Cost The 10 hr/wk allocation already exists; this displaces other work.
Blocked by The 90-day calendar trigger from 4.2.

Phase 5 · Steady state — what NOT to build

Per PLAN.md §5 (anti-temptations) and DECISIONS.md §8 (re-lock triggers). The trap is treating "more code" as the answer when the data says "more reach" or "more conversion." The five forbidden moves until $5k/mo MRR:

Why locked
More tools (0608) PLAN.md §2.1 distribution-gate. Tool 09 was the exception; no others until first paid customer + one external review.
Tool #10 PDF → CSV (the most-asked-for adjacency) Parked in docs/FUTURE-TOOLS.md with full design + 34 wk MVP / 610 wk polished estimate. Ship trigger: paying customer + ≥3 paid or ≥5 demo emails asking for PDF + the bookkeeper niche converting first. None have fired yet.
SaaS pivot DECISIONS.md §4 — recurring infra conflicts with the lifestyle constraint.
Live chat / sales calls DECISIONS.md §1 #8 — no-touch is locked until $5k/mo.
Custom integrations / one-off consulting Breaks "build once, sell many."
Going broad on personas PLAN.md §5 — "all small businesses" converts at 1 %; vertical converts at 515 %.

Triage table — what blocks what

Phase 1 (week 1)              Phase 2 (week 2)         Phase 3 (week 4)
┌──────────────┐              ┌──────────────┐         ┌──────────────┐
│ 1.1 Push GH  │──────────┐   │ 2.1 Apple    │ ───┐    │ 3.1 Community│
│ 1.2 Demo     │──┐       ├──▶│   Dev (1-2w) │    │    │ 3.2 SEO post │
│ 1.3 Domain   │  │       │   │ 2.2 Build    │ ───┤    │ 3.3 Analytics│
│ 1.4 Pages    │◀─┘       │   │ 2.3 Sign     │ ───┤    │ 3.4 Emails   │
│ 1.5 Gumroad  │──────────┘   │ 2.4 Copy     │    │    └──────────────┘
│ 1.6 Verify   │              │ 2.5 Activate │ ◀──┘
└──────────────┘              └──────────────┘            ↓
                                                  ┌──────────────┐
                                                  │ 4.1 Monthly  │
                                                  │ 4.2 First $  │
                                                  │ 4.3 Fallback │
                                                  └──────────────┘

The longest blocking path is 2.1 Apple Developer Program (12 weeks). Start it on day 1 of week 1 — it unblocks everything in Phase 2 and you can do all of Phase 1 while waiting.


Time estimate — total operator time

Phase Hours Wall-clock
Phase 1 ~1 hour end of week 1 (mostly waiting for builds)
Phase 2 ~1 day end of week 2 (gated by Apple Dev approval)
Phase 3 ~6 hours week 34
Phase 4 30 min/month ongoing
Total to launch ~12 hours of operator time ~14 days wall-clock

Well inside the 10 hr/wk constraint of DECISIONS.md §1 #2.


The thing that decides whether the plan works

Not the build. Not the deploy. Not even the first sale.

The discipline of running the monthly review in Phase 4 — and the "decide ONE thing per month" rule from POST-LAUNCH.md §2 — is what separates "this product exists" from "this product compounds." Every feature added before the funnel is measured is a guess; every change made after the monthly review is informed.

Don't skip 4.1.