feat: 3 new tools, format streaming, distribution-ready demo + landing pages
Tools shipped this batch (4 → 6 of 9 Ready):
04 Missing Value Handler src/core/missing.py + cli_missing.py + GUI
05 Column Mapper src/core/column_mapper.py + cli_column_map.py + GUI
09 Pipeline Runner src/core/pipeline.py + cli_pipeline.py + GUI
with soft tool-dependency graph (recommended,
not enforced) and JSON save/load for repeatable
weekly cleanups.
Format Standardizer reworked for 1 GB international files:
• Vectorised dispatch + LRU cache over phone/date/currency/boolean/email
• Per-row country / address columns drive parsing
• Audit cap (default 10 k rows, ~50 MB RAM)
• standardize_file(): chunked streaming entry point (~165 k rows/sec)
• currency_decimal="auto" for EU comma-decimal locales
• R$ / kr / zł multi-char currency prefixes
• cli_format.py with auto-stream above 100 MB inputs
Encoding detection arbiter + language-aware probe:
Closes the last 4 xfails (cp1250 / mac_iceland / shift_jis_2004 / lying-BOM)
via tied-confidence arbiter + Cyrillic / EE-Latin coverage probes.
Distribution-readiness assets:
• streamlit_app.py — Streamlit Community Cloud entry shim
• src/gui/app_demo.py — single-page demo, ?p=<persona> routing,
100-row cap + watermark, free-vs-paid boundary enforced at surface
• samples/demo/ — 3 niche datasets + pre-tuned pipeline JSONs
• landing/ — 4 static HTML pages (apex chooser + 3 niche),
shared CSS, deploy.py URL-substitution script,
auto-generated robots.txt + sitemap.xml + 404.html + favicon
• docs/PLAN.md, DEMO-PLAN.md, DEPLOYMENT.md, POST-LAUNCH.md, NEXT-STEPS.md
— full strategy + measurement + deployment + master checklist
Test counts:
before: 1,520 passed · 4 skipped · 17 xfailed
after: 1,729 passed · 0 skipped · 0 xfailed
Tier-1 corpora added:
• missing-corpus 3 use cases + 16 edge cases
• column-mapper-corpus 3 use cases + 5 edge cases
• format-cleaner intl 20-row 13-country stress fixture
Engine hardening flushed out by the corpora:
• interpolate guards against object-dtype columns
• mean/median skip all-NaN columns (silences numpy warning)
• fillna runs under future.no_silent_downcasting (silences pandas warning)
• mojibake test no longer skips when ftfy installed (monkeypatch path)
• drop-row threshold semantics: strict-greater (consistent across rows / cols)
• currency_decimal validator allow-set updated for "auto"
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
332
docs/DEMO-PLAN.md
Normal file
332
docs/DEMO-PLAN.md
Normal file
@@ -0,0 +1,332 @@
|
||||
# Demo Plan — DataTools
|
||||
|
||||
> Creator-only. Implements PLAN.md §2.2 (the demo IS the product) and
|
||||
> §2.3 (niche down — three landing pages, one engine).
|
||||
> **Version**: 1.0 · **Adopted**: 2026-05-01 · **Owner**: Michael
|
||||
|
||||
The hosted demo is the single highest-leverage marketing asset in the
|
||||
plan. This document defines exactly what loads, in what order, with
|
||||
what data, for which buyer — so the operator builds it once and never
|
||||
rebuilds it from a stale headline.
|
||||
|
||||
## 1. Goals
|
||||
|
||||
- Convert a cold visitor to a paid buyer in **under three minutes** of
|
||||
active interaction.
|
||||
- Demonstrate the *full pipeline* (not one tool) on a dataset that
|
||||
*looks like the visitor's own work* — not a toy CSV.
|
||||
- Survive zero attention to maintenance — once running, the demo
|
||||
should keep working as the engine evolves (the pre-saved pipeline
|
||||
JSONs use the same code path the paid product uses).
|
||||
- Provide a shareable artifact for niche-community posts (a public URL
|
||||
the operator can drop into a subreddit reply with one sentence).
|
||||
|
||||
## 2. Constraints (non-negotiable)
|
||||
|
||||
| Constraint | Source | Implication |
|
||||
|---|---|---|
|
||||
| Free hosting at launch | BUSINESS.md §9 | Streamlit Community Cloud (1 GB RAM, sleeps after 7 days idle) |
|
||||
| No login | BUSINESS.md §7 | No email gate, no signup wall, no "create account to continue" |
|
||||
| Async / no-touch | DECISIONS.md §1 #8 | Cannot offer "schedule a demo with us" CTA |
|
||||
| Runs locally on paid product | BUSINESS.md §11 | Demo can't expose the same engine to abuse — needs row caps |
|
||||
| Friction kills conversion | BUSINESS.md §7 | Demo dataset preloaded; no "select a file" first-step |
|
||||
| < $1,200/mo recurring | BUSINESS.md §9 | Migration plan to $5/mo VPS only after rate-limit signal |
|
||||
|
||||
## 3. The three personas (per PLAN.md §2.3)
|
||||
|
||||
| Tag | Persona | Top-of-funnel keyword | Demo dataset | Pre-saved pipeline |
|
||||
|---|---|---|---|---|
|
||||
| `shopify-pet` | Shopify operator (priority: pet supplies) | "shopify customer cleanup" | `samples/demo/shopify_pet_customers.csv` | `shopify_pet_pipeline.json` |
|
||||
| `bookkeeper` | Bookkeeper / freelance accountant | "reconcile bank export csv" | `samples/demo/bookkeeper_bank_reconcile.csv` | `bookkeeper_bank_pipeline.json` |
|
||||
| `revops` | Marketing / RevOps agency | "dedupe lead list across vendors" | `samples/demo/agency_combined_leads.csv` | `agency_leads_pipeline.json` |
|
||||
|
||||
Each persona gets its **own landing page URL**, its **own demo dataset
|
||||
loaded by default**, and its **own H1 + below-the-fold copy.** The
|
||||
engine is identical; only positioning differs.
|
||||
|
||||
## 4. Demo dataset specifications
|
||||
|
||||
Each dataset is intentionally small (~15–25 rows) so the full pipeline
|
||||
runs in well under one second on Streamlit Community Cloud's free
|
||||
hardware. Each row is a *plausible-looking* export from that
|
||||
persona's tooling. Each contains every kind of pollution the bundle's
|
||||
five tools fix, so a single demo run shows every tool earning its
|
||||
keep.
|
||||
|
||||
### 4.0 Pain-point coverage map
|
||||
|
||||
Each demo dataset is engineered so the buyer sees their **own top
|
||||
pain** demonstrated in the AFTER preview. The mapping below pairs
|
||||
each pain from PLAN.md §2.3a with the rows / columns that exercise
|
||||
it. Refresh the dataset only when this coverage drops.
|
||||
|
||||
| Persona | Pain (from PLAN §2.3a) | Demo coverage |
|
||||
|---|---|---|
|
||||
| Shopify pet | S1 — Klaviyo per-contact dupes | 5 dup pairs across rows 1–15 (case + format + address-twin variants) |
|
||||
| Shopify pet | S2 — feed-rejection chars | smart-quote / NBSP / BOM in rows 1–6, 9, 11 |
|
||||
| Shopify pet | S3 — multi-channel | partner-style customer IDs (`SHOP-`); demonstration of column-level mapping covered in RevOps demo |
|
||||
| Shopify pet | S4 — subscription identity | rows 1+2, 7+8, 9+10 — same person, different format |
|
||||
| Shopify pet | S5 — VAT-MOSS country drift | rows 16–18 (`United Kingdom` / `U.K.` / `UK`) + rows 19–20 (`Germany`/`Italia`) |
|
||||
| Bookkeeper | B1 — month-overlap re-import | 7 dup pairs spanning Jan↔Feb and Mar boundaries |
|
||||
| Bookkeeper | B2 — 1099 vendor consolidation | Amazon × 3 spellings, Verizon × 2, Acme Realty × 2, Adobe × 2, Costco × 2, Zoom × 2, Stripe × 4 |
|
||||
| Bookkeeper | B3 — audit trail | every cell change in the run logged with old/new/rule — surface in the demo's audit tab |
|
||||
| Bookkeeper | B4 — per-license economics | demonstrated by pricing copy, not data |
|
||||
| Bookkeeper | B5 — multi-currency | rows 26 (EUR), 27 (GBP), 28 (BRL with comma decimal), 29 (parens-negative) |
|
||||
| RevOps | R1 — per-contact tier | 6 cross-source dup pairs (HubSpot × LinkedIn × Manual Scrape) |
|
||||
| RevOps | R2 — deliverability | rows 26–27 (`uma at uniform dot com`, `victor@@victorco.com` invalid emails) |
|
||||
| RevOps | R3 — GDPR / privacy | demonstrated by the network-tab moat panel + zero-upload claim |
|
||||
| RevOps | R4 — vendor unification | 3 source values (HubSpot / LinkedIn / Manual Scrape), 13 country codes, mixed-shape headers |
|
||||
| RevOps | R5 — suppression list | rows 29–30 (`Suppressed`, `Opted Out` tags) |
|
||||
|
||||
### 4.1 `shopify_pet_customers.csv` (20 rows)
|
||||
|
||||
**Looks like**: a Shopify customer export filtered for "Pet Supplies"
|
||||
sales channel, 12 months activity.
|
||||
|
||||
**Pollution included**:
|
||||
- Whitespace padding (" Alice ", "Sydney Opera House Drive ")
|
||||
- Mixed phone formats: `(415) 555-1234`, `415.555.1234`, `5559876543`,
|
||||
`+1 555-111-1111`
|
||||
- International phones: GB, ES, DE, AU, JP (15 demo rows span 6
|
||||
countries)
|
||||
- Currency variants: `$1,240.50`, `£890.25`, `€2.410,75` (EU comma
|
||||
decimal), `A$ 1,299.00`, `¥75000`
|
||||
- Date formats: `2025-12-04`, `12/15/2025`, `?`, `(blank)`, `(none)`,
|
||||
`#N/A`
|
||||
- Disguised nulls: `N/A`, blank, `(blank)`, `?`, `#N/A`, `(none)`,
|
||||
`unknown`
|
||||
- Name casing: `EVE MARTINEZ`, `henry`, `O'NEIL`, `noah`, mixed Title /
|
||||
ALL CAPS / lower
|
||||
- Email case variants that *should* dedup: `Bob@PetShop.com` vs
|
||||
`alice@petshop.com`
|
||||
- 4 fuzzy duplicates (Alice/Bob same address, Grace/Henry same phone,
|
||||
Carlos/Olivia same address, Ivy/Jack same address)
|
||||
|
||||
**After running the pipeline**: 20 rows → 15, ~29 cells canonicalized,
|
||||
~45 sentinels standardised, 5 cross-row duplicates merged. The
|
||||
customer table is now Klaviyo-import-ready and the country column
|
||||
(previously `UK` / `U.K.` / `United Kingdom` / `Germany` / `Italia`)
|
||||
is GB / DE / IT — VAT MOSS report won't break.
|
||||
|
||||
### 4.2 `bookkeeper_bank_reconcile.csv` (30 rows)
|
||||
|
||||
**Looks like**: two months of business checking + credit-card activity
|
||||
exported from a bank portal, with the Feb export accidentally
|
||||
overlapping the Jan export at the month boundary.
|
||||
|
||||
**Pollution included**:
|
||||
- Mixed date formats: `01/15/2025`, `2025-01-15`, `Jan 18 2025`,
|
||||
`1/27/25`, `Feb 5 2025`
|
||||
- Currency formats: `-$129.99`, `($89.50)` parens-negative,
|
||||
`+$3,450.00`, `- $599.88` space, bare `-129.99`, `(50.00)`
|
||||
- Header trailing whitespace: `"Date "`
|
||||
- Smart quotes around descriptions: `"autopay"`
|
||||
- Em-dash sentinels in Vendor: `—`
|
||||
- Smart-em-dash inside descriptions: `STAPLES #4422 — paper, toner`
|
||||
- Vendor casing inconsistency: `Amazon` / `amazon.com` / `AMAZON.COM`,
|
||||
`Verizon` / `verizon`
|
||||
- 6 duplicate transactions (same date+amount+vendor recorded twice
|
||||
with different formats)
|
||||
|
||||
**After running the pipeline**: 30 rows → 23, ~84 cells normalized, 7
|
||||
duplicates removed (month-overlap + VAT-MOSS dups). All dates
|
||||
ISO-formatted, all amounts numeric (including EUR/GBP/BRL with comma
|
||||
decimal), vendor casing canonical, parens-negative resolved.
|
||||
|
||||
### 4.3 `agency_combined_leads.csv` (30 rows)
|
||||
|
||||
**Looks like**: a marketing-ops worksheet combining lead exports from
|
||||
HubSpot + LinkedIn Sales Navigator + manual scraping, ready for
|
||||
campaign targeting.
|
||||
|
||||
**Pollution included**:
|
||||
- Phone formats per region: US, UK, Spain, Germany, China, India,
|
||||
Australia, Mexico, Israel, Singapore, Hong Kong, Italy, South
|
||||
Korea — 13 country codes
|
||||
- Country column inconsistent: `USA` / `US` / `United States`
|
||||
- Disguised nulls: `N/A`, `unknown`, `(unknown)`, `(blank)`, `(none)`,
|
||||
`?`, `—`, `#N/A`, `TBD`
|
||||
- Source column tags origin (`HubSpot` / `LinkedIn` / `Manual Scrape`)
|
||||
- Email duplicates across sources with case variants: `alice@acme.com`
|
||||
+ `Alice.Johnson@acme.com`, `bob@beta.com` + `Bob@Beta.com`,
|
||||
`diana@delta.com` from two sources, `carlos@gamma.io` from two
|
||||
sources, `Frank@Foxtrot.de` + `frank@foxtrot.de`
|
||||
- Name casing: `DIANA LEE`, `henry`, `IVY CHEN`, mixed
|
||||
- 6 fuzzy / cross-source duplicates designed to survive the dedup
|
||||
- Score column with sentinel pollution that needs coercion to integer
|
||||
|
||||
**After running the pipeline**: 30 rows → 24, ~43 cells canonicalized,
|
||||
14 sentinels resolved, 6 cross-source duplicates merged with `merge=true`
|
||||
so each survivor inherits the most-complete picture. Invalid-email
|
||||
rows (deliverability stress) and `Suppressed`/`Opted Out` tags
|
||||
(suppression-list use case) survive as flagged rows the operator
|
||||
manually reviews.
|
||||
|
||||
## 5. UX flow (per persona)
|
||||
|
||||
The demo is a single Streamlit page (likely
|
||||
`src/gui/pages/0_Review.py` repurposed for demo mode, or a
|
||||
dedicated `app_demo.py` for the cloud build).
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────┐
|
||||
│ DataTools — for {Persona} │
|
||||
│ "{Persona-specific H1}" │
|
||||
├──────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Sample dataset preloaded: shopify_pet_customers.csv │
|
||||
│ [Replace with your own file (capped 100 rows)] │
|
||||
│ │
|
||||
│ ┌─ BEFORE preview (15 rows) ─────────────────────────┐ │
|
||||
│ │ Alice | (415) 555-1234 | $1,240.50 | … │ │
|
||||
│ │ Bob | 415.555.1234 | $1,240.50 | … │ │
|
||||
│ │ ... │ │
|
||||
│ └──────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ Pipeline (saved): │
|
||||
│ 1. Text Clean → 2. Format Standardize → │
|
||||
│ 3. Missing → 4. Deduplicate │
|
||||
│ │
|
||||
│ [▶ Run pipeline] │
|
||||
│ │
|
||||
│ ┌─ AFTER preview ───────────────────────────────────┐ │
|
||||
│ │ 15 rows → 11 (4 duplicates merged) │ │
|
||||
│ │ 27 cells canonicalized · 33 sentinels resolved │ │
|
||||
│ │ │ │
|
||||
│ │ Alice Johnson | +14155551234 | 1240.50 | … │ │
|
||||
│ │ ... │ │
|
||||
│ └──────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ [Download cleaned CSV (sample, watermarked)] │
|
||||
│ │
|
||||
│ ┌──────────────────────────────────────────────────┐ │
|
||||
│ │ Like what you see? │ │
|
||||
│ │ Run this on YOUR 50,000-row export — locally. │ │
|
||||
│ │ No upload. Your data never leaves your machine. │ │
|
||||
│ │ [Get DataTools — $49 →] │ │
|
||||
│ └──────────────────────────────────────────────────┘ │
|
||||
└──────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Critical UX points**:
|
||||
- Sample dataset is *already loaded* on page paint. Visitor never
|
||||
sees an empty state.
|
||||
- BEFORE table is shown side-by-side with AFTER once the run
|
||||
completes. Hidden-character toggle on by default so the visitor
|
||||
*sees* what was hidden in their data.
|
||||
- "Replace with your own file" is a secondary action below the BEFORE
|
||||
table — not the headline.
|
||||
- Per-step metrics are shown in the AFTER block: "27 cells
|
||||
canonicalized, 33 sentinels resolved, 4 duplicates merged." Numbers
|
||||
sell more than narrative.
|
||||
- Buy button is **inside** the AFTER block and **above the fold** when
|
||||
the run completes. Friction kills.
|
||||
|
||||
## 6. Free vs paid boundary
|
||||
|
||||
The demo runs the **same code** as the paid product. Caps are surface,
|
||||
not engine.
|
||||
|
||||
| Limit | Free demo | Paid (downloaded) |
|
||||
|---|---|---|
|
||||
| Input rows | 100 | unlimited (1 GB+ via streaming) |
|
||||
| File size | 5 MB | unlimited |
|
||||
| Output | watermarked CSV ("DataTools demo — buy at <url>" appended as last row) | clean CSV |
|
||||
| Pipeline editor | locked to the persona-saved pipeline | full edit / save / load JSON |
|
||||
| Save pipeline JSON | disabled | enabled |
|
||||
| International | enabled | enabled |
|
||||
| Audit log download | disabled | enabled |
|
||||
| Tool 06–09 | as they ship | as they ship |
|
||||
|
||||
The watermark is a **single trailing row**, not an in-cell tag — so
|
||||
the demo's AFTER preview *visibly* reads as production-quality data,
|
||||
not "demo crippled" data.
|
||||
|
||||
## 7. CTA copy (per persona)
|
||||
|
||||
### 7.1 Shopify pet operator
|
||||
|
||||
- **H1**: *Clean your customer / vendor / subscriber exports — locally.*
|
||||
- **Sub**: *Klaviyo-import-ready in 30 seconds. Catches duplicates Excel
|
||||
misses. Your data never leaves your computer.*
|
||||
- **CTA**: *Get DataTools for Shopify — $49 →*
|
||||
|
||||
### 7.2 Bookkeeper / freelance accountant
|
||||
|
||||
- **H1**: *Reconcile messy bank exports. Hand your client an audit
|
||||
trail.*
|
||||
- **Sub**: *Catches the duplicate transaction Quickbooks imported twice.
|
||||
Standardizes dates, amounts, vendor casing. Every change auditable.*
|
||||
- **CTA**: *Get DataTools for Bookkeepers — $49 →*
|
||||
|
||||
### 7.3 Marketing / RevOps agency
|
||||
|
||||
- **H1**: *Dedupe leads across HubSpot, LinkedIn, and manual scrapes.*
|
||||
- **Sub**: *International phones, country normalization, fuzzy dedup
|
||||
with merge — one tool, one schema, no upload.*
|
||||
- **CTA**: *Get DataTools for RevOps — $49 →*
|
||||
|
||||
## 8. Telemetry / conversion tracking
|
||||
|
||||
Async + no-touch + free hosting limits what we can instrument. Use
|
||||
event-only counters, no PII:
|
||||
|
||||
| Event | Source | Aggregate-only field |
|
||||
|---|---|---|
|
||||
| `demo.page_view` | landing page | persona tag |
|
||||
| `demo.run_clicked` | demo page | persona tag |
|
||||
| `demo.run_completed` | demo page | persona tag, rows_processed |
|
||||
| `demo.cta_clicked` | demo page | persona tag |
|
||||
| `gumroad.purchase` | Gumroad webhook | landing-page-source query param (`?from=shopify-pet`) |
|
||||
|
||||
Conversion = `cta_clicked / run_completed`. Demo-quality issue surfaces
|
||||
when `run_completed / page_view` < 30 % (visitors not engaging).
|
||||
|
||||
Self-host counters on Cloudflare Pages (free, GDPR-friendly). No
|
||||
Google Analytics — adds privacy banner, conflicts with the "your data
|
||||
never leaves your computer" message.
|
||||
|
||||
## 9. Maintenance plan
|
||||
|
||||
**Recurring**: zero. The demo runs on the same engine the paid
|
||||
product ships, so any improvement to the engine improves the demo
|
||||
automatically. The pre-saved pipeline JSONs reference column names
|
||||
and tool names, both stable APIs.
|
||||
|
||||
**Triggers for revisit**:
|
||||
|
||||
| Trigger | Action |
|
||||
|---|---|
|
||||
| Streamlit Community Cloud rate-limits / sleeps too aggressively | Migrate to a $5–10/mo VPS (BUSINESS.md §9 contingency) |
|
||||
| Demo dataset becomes stale (e.g. all phones standardize to no-op) | Refresh with a new pollution batch — *don't change the persona* |
|
||||
| `run_completed / page_view < 30 %` for 4 consecutive weeks | Audit the demo: is the BEFORE preview showing the mess clearly? Is the AFTER too small to notice? |
|
||||
| `cta_clicked / run_completed < 5 %` for 4 consecutive weeks | The demo is impressive but the CTA isn't earning trust — revise copy + add a screenshot of the network tab showing zero outbound calls (PLAN.md §2.4) |
|
||||
| New tool ships (06–09) | Decide *per persona* whether to add it to that persona's saved pipeline. Not all tools belong on all personas |
|
||||
|
||||
## 10. Build sequence (drops into PLAN.md week 2)
|
||||
|
||||
| Day | Action |
|
||||
|---|---|
|
||||
| 1 | Demo build of Streamlit app: 3 personas, switch via query param `?p=shopify-pet` |
|
||||
| 2 | Pipeline JSONs wired in; row cap + watermark applied; download button |
|
||||
| 3 | Deploy to Streamlit Community Cloud · 3 sub-paths or 3 separate apps |
|
||||
| 4 | Persona landing pages: 3 static HTML pages on Cloudflare Pages, each with iframe embed of its persona demo + CTA |
|
||||
| 5 | Telemetry counters wired (Cloudflare event API) · Gumroad webhook captures `?from=` |
|
||||
|
||||
End of day 5: three URLs the operator can drop into three different
|
||||
niche-community threads, each performing its own conversion math.
|
||||
|
||||
## 11. Anti-temptations (things the demo deliberately refuses)
|
||||
|
||||
- **No "try it on your data first" gate that requires email.** The
|
||||
whole point is friction-free.
|
||||
- **No "schedule a demo" CTA.** Locked by no-touch.
|
||||
- **No live chat widget.** Same.
|
||||
- **No A/B-test framework yet.** Single-arm copy, ship it, iterate
|
||||
monthly. A/B requires statistical traffic the funnel doesn't have
|
||||
pre-PMF.
|
||||
- **No watermark inside cells.** The AFTER preview must look
|
||||
production-quality. Watermark goes on a single trailing row that's
|
||||
obviously the demo signature.
|
||||
- **No animation / loader theatrics.** Pipeline runs in <1 s; a
|
||||
fake-progress bar lies about speed.
|
||||
236
docs/DEPLOYMENT.md
Normal file
236
docs/DEPLOYMENT.md
Normal file
@@ -0,0 +1,236 @@
|
||||
# Deployment — demo + landing pages
|
||||
|
||||
> One page. Two services. ~30 minutes from "code complete" to
|
||||
> "URL the user can hit." Every step here is from-scratch reproducible
|
||||
> on a clean laptop.
|
||||
> **Version**: 1.0 · **Adopted**: 2026-05-01
|
||||
|
||||
This doc covers the **two distribution surfaces** that ship to public
|
||||
URLs: the Streamlit demo (the iframe target) and the Cloudflare Pages
|
||||
landing pages (the marketing surface that embeds it).
|
||||
|
||||
The *paid* product — PyInstaller installers, code-signing, Gumroad
|
||||
listing — is covered in `docs/NEXT-STEPS.md`.
|
||||
|
||||
---
|
||||
|
||||
## Part 1 · Deploy the demo (Streamlit Community Cloud — free)
|
||||
|
||||
### A. Pre-flight (one-time, ~2 min)
|
||||
|
||||
You need a free [Streamlit Community Cloud](https://streamlit.io/cloud)
|
||||
account. Sign in with the GitHub account that hosts this repo.
|
||||
|
||||
### B. Deploy (~5 min, mostly waiting for the Cloud build)
|
||||
|
||||
1. **Push the repo to GitHub** (private or public — both work). The
|
||||
important files are at the **repo root**:
|
||||
|
||||
- `streamlit_app.py` — Cloud auto-detects this; nothing to configure
|
||||
- `requirements.txt` — Cloud installs from this
|
||||
- `.streamlit/config.toml` — Cloud honours this
|
||||
- `samples/demo/*.csv` + `*_pipeline.json` — the demo's data
|
||||
- `src/` — the engine
|
||||
|
||||
2. In Streamlit Community Cloud → **New app**:
|
||||
- Repository: your fork
|
||||
- Branch: `main`
|
||||
- Main file path: `streamlit_app.py` (the default — leave it)
|
||||
- App URL: `datatools-demo` (or any free subdomain)
|
||||
- **Deploy**
|
||||
|
||||
3. First build is 2–3 min while Cloud installs `pandas`, `phonenumbers`,
|
||||
`rapidfuzz`, etc. Subsequent deploys are < 30 s.
|
||||
|
||||
### C. Verify
|
||||
|
||||
Open the deployed URL. Append `?p=shopify-pet` to the URL bar —
|
||||
the persona-specific demo loads. Try `?p=bookkeeper` and
|
||||
`?p=revops` to confirm all three personas route correctly. Click
|
||||
**Run pipeline**; the AFTER preview should appear within ~1 second.
|
||||
|
||||
### D. The output URL
|
||||
|
||||
The deployed URL is what feeds into `landing/deploy.config.json` →
|
||||
`demo_base_url`. Without trailing slash. For example:
|
||||
|
||||
https://datatools-demo.streamlit.app
|
||||
|
||||
### E. Migration trigger
|
||||
|
||||
Per `BUSINESS.md` §9 / `DEMO-PLAN.md` §9, migrate to a $5–10/mo VPS
|
||||
when:
|
||||
|
||||
- Streamlit Community Cloud rate-limits / sleeps too aggressively, OR
|
||||
- the demo crosses ~5 k page-views/month (free-tier capacity)
|
||||
|
||||
The migration is one command if you containerise:
|
||||
`docker run -p 8501:8501 -v $(pwd):/app python:3.12-slim …`
|
||||
|
||||
---
|
||||
|
||||
## Part 2 · Deploy the landing pages (Cloudflare Pages — free)
|
||||
|
||||
### A. Pre-flight (one-time, ~5 min)
|
||||
|
||||
You need:
|
||||
|
||||
- A Cloudflare account (free) and a domain (any registrar) with
|
||||
nameservers pointed at Cloudflare. **OR** skip the custom domain
|
||||
step and use the auto-generated `*.pages.dev` URL.
|
||||
- A Gumroad listing URL (placeholder until your account is set up —
|
||||
use `https://gumroad.com/l/datatools` and update it later).
|
||||
|
||||
### B. Build the deploy-ready bundle (~30 sec)
|
||||
|
||||
```bash
|
||||
# One-time: copy the template
|
||||
cp landing/deploy.config.example.json landing/deploy.config.json
|
||||
# Edit it with your real URLs
|
||||
edit landing/deploy.config.json
|
||||
# Build
|
||||
python3 landing/deploy.py
|
||||
# → produces landing/dist/
|
||||
```
|
||||
|
||||
`landing/deploy.config.json` is **gitignored**; your real URLs never
|
||||
hit the repo.
|
||||
|
||||
### C. Deploy (~3 min)
|
||||
|
||||
Two paths — pick one:
|
||||
|
||||
**Drag-and-drop (zero CLI):**
|
||||
|
||||
1. Cloudflare Pages dashboard → **Create project** → **Direct Upload**
|
||||
2. Drag `landing/dist/` into the upload zone
|
||||
3. Project name: `datatools` (becomes `datatools.pages.dev`)
|
||||
4. Click **Deploy**
|
||||
|
||||
**Wrangler CLI (one command, scriptable):**
|
||||
|
||||
```bash
|
||||
npm install -g wrangler # one-time
|
||||
wrangler login # one-time
|
||||
wrangler pages deploy landing/dist
|
||||
```
|
||||
|
||||
### D. Custom domain (~5 min, optional)
|
||||
|
||||
Pages dashboard → your project → **Custom domains** → add
|
||||
`datatools.app` (or whichever apex domain you registered). Cloudflare
|
||||
auto-issues TLS. Once propagated:
|
||||
|
||||
- `https://datatools.app/` → apex chooser
|
||||
- `https://datatools.app/shopify-pet/` → Shopify landing
|
||||
- `https://datatools.app/bookkeeper/` → Bookkeeper landing
|
||||
- `https://datatools.app/revops/` → RevOps landing
|
||||
|
||||
### E. Verify
|
||||
|
||||
For each persona:
|
||||
|
||||
1. Open the persona URL.
|
||||
2. Confirm the demo iframe loads (the URL inside it points at the
|
||||
Streamlit demo from Part 1).
|
||||
3. Click "Run pipeline" inside the iframe → AFTER preview appears.
|
||||
4. Click the "Get DataTools" button → opens Gumroad with the
|
||||
correct `?from=<persona>` query (verify in the URL bar).
|
||||
|
||||
If the iframe shows "Refused to connect", check Cloudflare Pages →
|
||||
**Settings** → **Functions** for any CSP that disallows Streamlit's
|
||||
domain. (Default Pages config does not set CSP, so this is rarely an
|
||||
issue.)
|
||||
|
||||
---
|
||||
|
||||
## Part 3 · Updates
|
||||
|
||||
The cycle is:
|
||||
|
||||
```bash
|
||||
# 1) Edit code or copy
|
||||
edit landing/<persona>/index.html
|
||||
edit src/gui/app_demo.py
|
||||
|
||||
# 2) Rebuild landing
|
||||
python3 landing/deploy.py
|
||||
|
||||
# 3) Re-deploy landing
|
||||
wrangler pages deploy landing/dist
|
||||
|
||||
# 4) Re-deploy demo
|
||||
git push origin main
|
||||
# (Streamlit Cloud auto-deploys on push)
|
||||
```
|
||||
|
||||
Both surfaces deploy in under 5 minutes end-to-end.
|
||||
|
||||
---
|
||||
|
||||
## Part 4 · Sanity checks (post-deploy, ~3 min)
|
||||
|
||||
Run these once, then trust the build (per `POST-LAUNCH.md` §6):
|
||||
|
||||
```bash
|
||||
# Landing pages serve and reference the right demo URL
|
||||
curl -s https://datatools.app/ | grep -c persona-card
|
||||
# → 3 (one per persona card)
|
||||
|
||||
curl -s https://datatools.app/shopify-pet/ | grep -c "datatools-demo"
|
||||
# → ≥1 (iframe src points at your demo)
|
||||
|
||||
# Demo responds and routes the persona param
|
||||
curl -s https://datatools-demo.streamlit.app/?p=shopify-pet | grep -c "Shopify"
|
||||
# → ≥1
|
||||
|
||||
# Sitemap is valid XML and lists all 4 pages
|
||||
curl -s https://datatools.app/sitemap.xml | grep -c "<url>"
|
||||
# → 4
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part 5 · Cost ceiling check
|
||||
|
||||
| Service | Tier | Cost | Cap |
|
||||
|---|---|---|---|
|
||||
| Cloudflare Pages | Free | $0 | 500 builds/month, unlimited bandwidth |
|
||||
| Streamlit Community Cloud | Free | $0 | 1 GB RAM, sleeps after 7 days idle |
|
||||
| Custom domain | Cloudflare or registrar | ~$15/year | n/a |
|
||||
| GitHub | Free for private repos with limited collaborators | $0 | n/a |
|
||||
| **Total ongoing** | | **~$1.25/mo** (domain only) | |
|
||||
|
||||
Well inside the `BUSINESS.md` §9 cap of $1,200/mo recurring. The
|
||||
$5–10/mo VPS migration is a contingency only — don't pre-build it.
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Streamlit Cloud build fails with "ModuleNotFoundError: src.core"**
|
||||
|
||||
`streamlit_app.py` puts the repo root on `sys.path` before invoking
|
||||
the demo module — but only if the file is at the repo root. Confirm
|
||||
`streamlit_app.py` lives at `/streamlit_app.py`, not nested in a
|
||||
folder.
|
||||
|
||||
**Cloudflare Pages deploy succeeds but persona pages 404**
|
||||
|
||||
The directory layout is preserved by `deploy.py`. Confirm your
|
||||
`landing/dist/` has `shopify-pet/index.html`, etc. — not just three
|
||||
flat files. If you used drag-and-drop, drag the **directory**, not
|
||||
its contents.
|
||||
|
||||
**The iframe shows "X-Frame-Options denied"**
|
||||
|
||||
Streamlit Community Cloud allows iframe embedding by default. If
|
||||
you've migrated to a self-hosted demo with a reverse proxy, set
|
||||
`X-Frame-Options: ALLOWALL` (or remove the header entirely) for the
|
||||
demo's domain.
|
||||
|
||||
**Gumroad URL has no `?from=` parameter when clicked**
|
||||
|
||||
The `&from=` query param is added by the landing-page CTA, not by
|
||||
Gumroad. If it's missing, the landing-page HTML wasn't substituted —
|
||||
re-run `python3 landing/deploy.py` and re-deploy.
|
||||
319
docs/NEXT-STEPS.md
Normal file
319
docs/NEXT-STEPS.md
Normal file
@@ -0,0 +1,319 @@
|
||||
# Next Steps — from "code complete" to first paying customer
|
||||
|
||||
> Creator-only. The runnable checklist that takes the operator from
|
||||
> the current state (1,729 tests passing, 6 tools shipped, 0 paying
|
||||
> customers) through launch and into the first 90 days.
|
||||
> **Version**: 1.0 · **Adopted**: 2026-05-01
|
||||
|
||||
This document is the **single answer** to "what now?". Every line
|
||||
item has an owner, a time estimate, a blocker, a cost, and the
|
||||
external dependency that makes it un-shippable today. Items are
|
||||
ordered by **must-finish-before-the-next-item** — work top-down.
|
||||
|
||||
Cross-references:
|
||||
- Strategy: `PLAN.md` (the 8 strategic moves + the 90-day sequence)
|
||||
- Demo specs: `DEMO-PLAN.md`
|
||||
- Deployment mechanics: `DEPLOYMENT.md`
|
||||
- Post-launch measurement: `POST-LAUNCH.md`
|
||||
- Locked criteria: `DECISIONS.md` §1
|
||||
|
||||
Status legend:
|
||||
- **🟢** Done — the asset exists in this repo
|
||||
- **🟡** Buildable now — no external dependency needed
|
||||
- **🟠** External dependency — needs an account / signup / payment
|
||||
- **🔴** Manual / requires user input that can't be automated
|
||||
|
||||
---
|
||||
|
||||
## Phase 0 · What's already done (skip ahead)
|
||||
|
||||
| ✓ | Item | Where it lives |
|
||||
|---|------|----------------|
|
||||
| 🟢 | 6 of 9 tools shipped (Dedup, Text, Format, Missing, Column-Map, Pipeline) | `src/core/`, `src/cli_*.py`, `src/gui/pages/` |
|
||||
| 🟢 | Pipeline Runner (the retention multiplier per `PLAN.md` §2.6) | `src/core/pipeline.py`, `src/cli_pipeline.py`, `src/gui/pages/9_Pipeline_Runner.py` |
|
||||
| 🟢 | 1,729 passing tests · 0 skipped · 0 xfailed | `tests/` |
|
||||
| 🟢 | 3 niche demo datasets + pre-tuned pipeline JSONs | `samples/demo/` |
|
||||
| 🟢 | Streamlit demo app + Cloud entry shim | `streamlit_app.py`, `src/gui/app_demo.py` |
|
||||
| 🟢 | 3 niche landing pages + apex chooser + shared CSS | `landing/` |
|
||||
| 🟢 | Landing-page deploy script (URL-substitution + sitemap + 404 + favicon) | `landing/deploy.py` |
|
||||
| 🟢 | Strategic plan + demo plan + post-launch measurement plan + deployment doc | `docs/PLAN.md`, `DEMO-PLAN.md`, `POST-LAUNCH.md`, `DEPLOYMENT.md` |
|
||||
|
||||
---
|
||||
|
||||
## Phase 1 · Stand the funnel up (target: end of week 1, ~6 hours total work)
|
||||
|
||||
The bottleneck right now is **distribution, not feature count**.
|
||||
Everything in this phase is about turning code into a URL the user
|
||||
can hit.
|
||||
|
||||
### 1.1 — 🟠 Push to GitHub (5 min)
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **What** | `git init` (if not already), commit, push to a private or public GitHub repo. |
|
||||
| **Why** | Cloud deploy services need a Git source. Streamlit Community Cloud auto-deploys on push to `main`. |
|
||||
| **External dependency** | A GitHub account (free). |
|
||||
| **Cost** | $0. |
|
||||
| **Blocked by** | Nothing. |
|
||||
|
||||
### 1.2 — 🟠 Deploy the demo to Streamlit Community Cloud (15 min)
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **What** | Follow `DEPLOYMENT.md` Part 1. Result: a public URL like `https://datatools-demo.streamlit.app`. |
|
||||
| **Why** | The landing pages embed this in their iframe. Without it, every "Run pipeline" button on the landing pages 404s. |
|
||||
| **External dependency** | Free Streamlit Community Cloud account, signed in via GitHub. |
|
||||
| **Cost** | $0. |
|
||||
| **Blocked by** | 1.1 (the repo must be on GitHub). |
|
||||
| **Watch out for** | First build takes 2–3 min while Cloud installs deps. Subsequent deploys < 30 s. |
|
||||
|
||||
### 1.3 — 🟠 Buy the apex domain (5 min, ~$15/year)
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **What** | Register `datatools.app` (or whichever) at any registrar. Point the nameservers at Cloudflare. |
|
||||
| **Why** | The landing-page canonical URLs and CTA buttons refer to this domain. Pages can deploy to a free `*.pages.dev` URL first if you want to defer this. |
|
||||
| **External dependency** | A registrar account; payment method. |
|
||||
| **Cost** | ~$15/year. Within `BUSINESS.md` §9 cost cap. |
|
||||
| **Blocked by** | Nothing — can run in parallel with 1.1 / 1.2. |
|
||||
|
||||
### 1.4 — 🟠 Deploy the landing pages to Cloudflare Pages (15 min)
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **What** | Follow `DEPLOYMENT.md` Part 2. Run `python3 landing/deploy.py` with the operator's URLs in `deploy.config.json`, then `wrangler pages deploy landing/dist` (or drag-drop). |
|
||||
| **Why** | This is the marketing surface. Three persona URLs go live as soon as it deploys. |
|
||||
| **External dependency** | Free Cloudflare account; Wrangler CLI (optional — drag-drop works too). |
|
||||
| **Cost** | $0. |
|
||||
| **Blocked by** | 1.2 (the demo URL goes into `deploy.config.json`); ideally 1.3 for the custom domain. |
|
||||
| **Watch out for** | The `deploy.config.json` file is gitignored — your real URLs never get committed. |
|
||||
|
||||
### 1.5 — 🟠 Open a Gumroad listing (15 min) **— stub for now**
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **What** | Create a Gumroad account, draft a listing with a single screenshot + the landing-page copy, set price to $49. Don't enable purchases yet — leave it as a draft. |
|
||||
| **Why** | The CTA buttons on the landing pages link to `gumroad.com/l/datatools?from=<persona>`. Until the listing exists, those buttons 404. |
|
||||
| **External dependency** | Free Gumroad account; Stripe-connected payout method (defer to Phase 2). |
|
||||
| **Cost** | $0 to draft, ~10% per sale once live. |
|
||||
| **Blocked by** | Nothing — can run in parallel with 1.1–1.4. |
|
||||
| **Watch out for** | The listing URL must be `gumroad.com/l/datatools` to match the landing-page hard-coded CTAs. If you pick a different slug, update `landing/deploy.config.json` → `gumroad_listing` and re-run `deploy.py`. |
|
||||
|
||||
### 1.6 — 🟡 End-to-end smoke verification (10 min)
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **What** | Run the four `curl` commands from `DEPLOYMENT.md` Part 4. All four landing pages, all three demo personas, sitemap.xml. |
|
||||
| **Why** | First time something can break is the moment a real user hits it. Ten minutes of `curl` saves a week of "why is conversion zero." |
|
||||
| **External dependency** | None. |
|
||||
| **Cost** | $0. |
|
||||
| **Blocked by** | 1.4 + 1.2. |
|
||||
|
||||
---
|
||||
|
||||
## Phase 2 · Make it sellable (target: end of week 2)
|
||||
|
||||
### 2.1 — 🟠 Apple Developer Program enrollment (5 min to start, 1–2 weeks lead)
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **What** | Per `BUSINESS.md` §10. Required for code-signing the macOS installer. |
|
||||
| **External dependency** | Apple ID + government-issued ID (individual) or D-U-N-S number (org). |
|
||||
| **Cost** | $99/year. |
|
||||
| **Blocked by** | Nothing — start ASAP because of the 1–2 week approval window. The pipeline waits on this; nothing else does. |
|
||||
|
||||
### 2.2 — 🟡 PyInstaller spec + cross-platform build (1–3 days first time)
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **What** | A `build/datatools.spec` that bundles the Streamlit GUI + all 6 tools + samples into one app. Mac `.dmg`, Windows `.exe` installer, Linux AppImage. |
|
||||
| **Why** | The buyer's deliverable. Without this, there is nothing to attach to the Gumroad listing. |
|
||||
| **External dependency** | None for Linux/Mac builds. Windows builds need a Windows machine or a CI matrix runner. |
|
||||
| **Cost** | $0 (GitHub Actions matrix runners are free for public repos). |
|
||||
| **Blocked by** | Nothing for the spec; 2.1 for the signed Mac build. |
|
||||
| **Watch out for** | Streamlit's bundle size lands around 300–500 MB per `DECISIONS.md` §4c — accepted tradeoff. |
|
||||
|
||||
### 2.3 — 🟡 macOS sign + notarize (30 min once Apple Dev is approved)
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **What** | Sign the `.dmg`, submit to Apple's notarization service, staple the ticket. |
|
||||
| **Why** | Without it, Gatekeeper hard-blocks the install with no obvious way out (per `BUSINESS.md` §10). The buyer gives up. |
|
||||
| **External dependency** | Apple Developer Program (2.1). |
|
||||
| **Cost** | $0 incremental over 2.1. |
|
||||
| **Blocked by** | 2.1 + 2.2. |
|
||||
|
||||
### 2.4 — 🔴 Refund policy + license + Gumroad listing copy (1 hour)
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **What** | A clear refund policy (14-day no-questions per the FAQ already on the landing pages) + a software licence text + the Gumroad listing description. |
|
||||
| **Why** | Required by Gumroad's terms; surfaces on the listing page; protects against buyer disputes. |
|
||||
| **External dependency** | None — operator authoring. |
|
||||
| **Cost** | $0. |
|
||||
| **Blocked by** | Nothing. |
|
||||
| **Hint** | Most of the copy is already in the landing pages' FAQ section — paste it into Gumroad. |
|
||||
|
||||
### 2.5 — 🟠 Activate the Gumroad listing (15 min)
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **What** | Upload the cross-platform installers from 2.2/2.3, paste the copy from 2.4, set $49 price, enable purchases, configure Stripe payout. |
|
||||
| **Why** | This is the "buy" button finally working. |
|
||||
| **External dependency** | Gumroad + Stripe account; the installers from 2.2/2.3. |
|
||||
| **Cost** | ~10 % per sale. |
|
||||
| **Blocked by** | 2.2, 2.3, 2.4. |
|
||||
|
||||
---
|
||||
|
||||
## Phase 3 · First-traffic ignition (target: end of week 4)
|
||||
|
||||
Per `PLAN.md` §3 and `BUSINESS.md` §7 channel priorities. The strict
|
||||
no-touch constraint of `DECISIONS.md` §1 #8 makes channel choice
|
||||
matter — these are the only ones that fit.
|
||||
|
||||
### 3.1 — 🔴 First niche-community post (30 min)
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **What** | One value-first post in one niche-relevant community (e.g. r/shopify, IndieHackers Shopify chat, a Slack/Discord that allows it). Lead with the demo URL, not the buy URL. |
|
||||
| **Why** | Marketplaces alone don't drive discovery. Communities are the only first-touch channel that works under no-touch. |
|
||||
| **External dependency** | Account in the chosen community; understand its self-promotion rules. |
|
||||
| **Cost** | $0. |
|
||||
| **Blocked by** | 1.4 (demo URL must work). |
|
||||
| **Hint** | Pick the persona with the most familiar community to the operator. Don't try all three at once — see `POST-LAUNCH.md` §2 "decide ONE thing" rule. |
|
||||
|
||||
### 3.2 — 🟡 First long-tail SEO blog post (4–6 hours)
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **What** | One 800–1,500-word post on `datatools.app/blog/` (sub-route of Cloudflare Pages or Substack) targeting one niche keyword from `BUSINESS.md` §7. Topic: a real problem you've encountered, the cleanup steps, the demo URL at the end. |
|
||||
| **Why** | Compounding asset — `BUSINESS.md` §2 says SEO pays in 6–18 months, not week 1. Don't mistake it for an early-stage channel. |
|
||||
| **External dependency** | None. |
|
||||
| **Cost** | $0. |
|
||||
| **Blocked by** | Nothing. |
|
||||
|
||||
### 3.3 — 🟡 Cloudflare Web Analytics + event counters (45 min)
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **What** | Enable Cloudflare Web Analytics on the Pages project (one click). Add a tiny inline `<script>` to each landing page that fires `cta_clicked` when the buy button is hit, before redirecting. Per `POST-LAUNCH.md` §1. |
|
||||
| **Why** | Without this, the post-launch checklist is unrunnable. |
|
||||
| **External dependency** | Cloudflare account (already from 1.4). |
|
||||
| **Cost** | $0. |
|
||||
| **Blocked by** | 1.4. |
|
||||
| **Hint** | The Gumroad webhook captures `?from=<persona>` automatically — no extra wiring. |
|
||||
|
||||
### 3.4 — 🟡 Email autoresponder (post-purchase delivery + 3-touch onboarding) (2–3 hours)
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **What** | Gumroad's built-in delivery email plus three follow-up emails (day 1, day 7, day 14): "are you running into X?", "here's an advanced trick", "save your pipeline as JSON for next week". |
|
||||
| **Why** | Increases activation, reduces refund risk, surfaces support questions while volume is small. |
|
||||
| **External dependency** | Gumroad delivery is built-in. The 3-touch sequence needs a free email service (Resend's free tier or Mailchimp's free tier). |
|
||||
| **Cost** | $0–$30/month per `BUSINESS.md` §9. |
|
||||
| **Blocked by** | 2.5. |
|
||||
|
||||
---
|
||||
|
||||
## Phase 4 · First-buyer trigger and review
|
||||
|
||||
Per `PLAN.md` §4 decision triggers and `POST-LAUNCH.md` §4.
|
||||
|
||||
### 4.1 — 🟢 Run the monthly review (30 min, first Monday after launch)
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **What** | Follow `POST-LAUNCH.md` §2 — pull last-30-days demo events + Gumroad sales + refunds, compute the five numbers, decide ONE change. |
|
||||
| **Why** | Without this discipline, the funnel drifts and the operator changes 5 things at once and learns nothing. |
|
||||
| **External dependency** | None — analytics from 3.3, sales from 2.5. |
|
||||
| **Cost** | $0. |
|
||||
| **Blocked by** | 3.3 + 2.5. |
|
||||
|
||||
### 4.2 — 🟢 First paying customer (target: 90 days)
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **What** | The actual first sale. |
|
||||
| **Why** | Per `BUSINESS.md` §6: validates the funnel; not the business. |
|
||||
| **Trigger action** | Continue, no plan change. Make the first $1k/month within month 6. |
|
||||
|
||||
### 4.3 — 🔴 Zero-paid-in-90-days fallback (only fires if 4.2 doesn't)
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **What** | Per `POST-LAUNCH.md` §4 — audit the funnel, not the features. Run a 1-week outbound experiment to 30 niche contacts as a control (per `BUSINESS.md` §8 the no-touch revisit is allowed below $5k MRR if it produces signal). |
|
||||
| **Why** | Distinguishes "no reach" from "no conversion" — they need different fixes. |
|
||||
| **External dependency** | Operator's time. |
|
||||
| **Cost** | The 10 hr/wk allocation already exists; this displaces other work. |
|
||||
| **Blocked by** | The 90-day calendar trigger from 4.2. |
|
||||
|
||||
---
|
||||
|
||||
## Phase 5 · Steady state — what NOT to build
|
||||
|
||||
Per `PLAN.md` §5 (anti-temptations) and `DECISIONS.md` §8 (re-lock
|
||||
triggers). The trap is treating "more code" as the answer when the
|
||||
data says "more reach" or "more conversion." The five forbidden
|
||||
moves until $5k/mo MRR:
|
||||
|
||||
| | Why locked |
|
||||
|---|---|
|
||||
| ❌ More tools (06–08) | `PLAN.md` §2.1 distribution-gate. Tool 09 was the exception; no others until first paid customer + one external review. |
|
||||
| ❌ SaaS pivot | `DECISIONS.md` §4 — recurring infra conflicts with the lifestyle constraint. |
|
||||
| ❌ Live chat / sales calls | `DECISIONS.md` §1 #8 — no-touch is locked until $5k/mo. |
|
||||
| ❌ Custom integrations / one-off consulting | Breaks "build once, sell many." |
|
||||
| ❌ Going broad on personas | `PLAN.md` §5 — "all small businesses" converts at 1 %; vertical converts at 5–15 %. |
|
||||
|
||||
---
|
||||
|
||||
## Triage table — what blocks what
|
||||
|
||||
```
|
||||
Phase 1 (week 1) Phase 2 (week 2) Phase 3 (week 4)
|
||||
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
|
||||
│ 1.1 Push GH │──────────┐ │ 2.1 Apple │ ───┐ │ 3.1 Community│
|
||||
│ 1.2 Demo │──┐ ├──▶│ Dev (1-2w) │ │ │ 3.2 SEO post │
|
||||
│ 1.3 Domain │ │ │ │ 2.2 Build │ ───┤ │ 3.3 Analytics│
|
||||
│ 1.4 Pages │◀─┘ │ │ 2.3 Sign │ ───┤ │ 3.4 Emails │
|
||||
│ 1.5 Gumroad │──────────┘ │ 2.4 Copy │ │ └──────────────┘
|
||||
│ 1.6 Verify │ │ 2.5 Activate │ ◀──┘
|
||||
└──────────────┘ └──────────────┘ ↓
|
||||
┌──────────────┐
|
||||
│ 4.1 Monthly │
|
||||
│ 4.2 First $ │
|
||||
│ 4.3 Fallback │
|
||||
└──────────────┘
|
||||
```
|
||||
|
||||
The longest blocking path is **2.1 Apple Developer Program**
|
||||
(1–2 weeks). Start it on day 1 of week 1 — it unblocks everything in
|
||||
Phase 2 and you can do all of Phase 1 while waiting.
|
||||
|
||||
---
|
||||
|
||||
## Time estimate — total operator time
|
||||
|
||||
| Phase | Hours | Wall-clock |
|
||||
|---|---|---|
|
||||
| Phase 1 | ~1 hour | end of week 1 (mostly waiting for builds) |
|
||||
| Phase 2 | ~1 day | end of week 2 (gated by Apple Dev approval) |
|
||||
| Phase 3 | ~6 hours | week 3–4 |
|
||||
| Phase 4 | 30 min/month | ongoing |
|
||||
| **Total to launch** | **~12 hours of operator time** | **~14 days wall-clock** |
|
||||
|
||||
Well inside the 10 hr/wk constraint of `DECISIONS.md` §1 #2.
|
||||
|
||||
---
|
||||
|
||||
## The thing that decides whether the plan works
|
||||
|
||||
Not the build. Not the deploy. Not even the first sale.
|
||||
|
||||
**The discipline of running the monthly review** in Phase 4 — and the
|
||||
"decide ONE thing per month" rule from `POST-LAUNCH.md` §2 — is what
|
||||
separates "this product exists" from "this product compounds." Every
|
||||
feature added before the funnel is measured is a guess; every change
|
||||
made after the monthly review is informed.
|
||||
|
||||
Don't skip 4.1.
|
||||
220
docs/PLAN.md
Normal file
220
docs/PLAN.md
Normal file
@@ -0,0 +1,220 @@
|
||||
# Strategic Plan — DataTools
|
||||
|
||||
> Creator-only. Locks the "what next" in light of the locked criteria
|
||||
> (DECISIONS.md §1) and the v1.6 honest status (BUSINESS.md §13).
|
||||
> **Version**: 1.0 · **Adopted**: 2026-05-01 · **Owner**: Michael
|
||||
|
||||
This document is the active plan, derived from the strategic review of
|
||||
2026-05-01. It compresses the eight strategic moves and a 90-day
|
||||
execution sequence onto one page so the next decision (build vs.
|
||||
ship vs. market) has a single reference.
|
||||
|
||||
It is **not** a re-lock of operating criteria — those still live in
|
||||
DECISIONS.md and have not changed. This plan is downstream of those
|
||||
criteria; if a move below conflicts with §1 of Decisions, the criteria
|
||||
win.
|
||||
|
||||
## 1. Frame
|
||||
|
||||
**Locked context** (BUSINESS.md, DECISIONS.md):
|
||||
|
||||
- Niche Python automation tools, $49–79 single / $149 suite.
|
||||
- Cash budget ≤ $1,200/mo recurring · Time ≤ 10 hr/wk · No external funding.
|
||||
- Async + no-touch sales (revisit at $5k/mo MRR).
|
||||
- Marketplace-first distribution (Gumroad / Lemon Squeezy).
|
||||
- Streamlit GUI + CLI dual interface, runs locally.
|
||||
- Lifestyle cashflow goal (no exit needed).
|
||||
|
||||
**Honest current state** (2026-05-01):
|
||||
|
||||
| Asset | State |
|
||||
|---|---|
|
||||
| Tools 1–5 (Dedup, Text Clean, Format Standardize, Missing, Column Mapper) | Ready · 1,691 tests passing · 0 xfailed |
|
||||
| Tools 6–9 (Outlier, Multi-File Merge, Validator, Pipeline) | Coming Soon |
|
||||
| PyInstaller installer pipeline | Not started |
|
||||
| macOS code signing (Apple Dev Program) | Not started |
|
||||
| Hosted browser demo (Streamlit Cloud) | Not deployed |
|
||||
| Landing page | Not live |
|
||||
| Marketplace listing (Gumroad) | Not listed |
|
||||
| Paying customers | 0 |
|
||||
|
||||
**Diagnosis**: the bottleneck is not feature count — it's distribution.
|
||||
The next $1 of value comes from closing the gap between "code-complete"
|
||||
and "buyer-pulls-out-card", not from tool 6.
|
||||
|
||||
## 2. The eight strategic moves
|
||||
|
||||
Numbered moves. Each is consistent with locked criteria.
|
||||
|
||||
### 2.1 Freeze new-tool development (one exception). Ship what exists.
|
||||
|
||||
Tools 6–8 are blocked behind a **distribution gate**: no work on them
|
||||
until the existing 5 tools have a paying customer + one external review
|
||||
(BUSINESS.md §4 sequence rule, applied recursively inside the bundle).
|
||||
|
||||
**Exception granted 2026-05-01**: Tool 09 Pipeline Runner is built
|
||||
*now*. Rationale: the pipeline transforms the bundle from "5 tools you
|
||||
buy" into "an automatable workflow you depend on." That conversion is
|
||||
what produces retention and word-of-mouth — the only marketing channel
|
||||
that scales under the no-network/no-touch constraint.
|
||||
|
||||
### 2.2 The demo *is* the product. Make it embarrassingly good.
|
||||
|
||||
- Three persona-tagged sample datasets, not one generic CSV: Shopify
|
||||
customers / bookkeeper bank export / agency lead list.
|
||||
- Run the *full pipeline* on the sample (Review → Dedup → Text Clean →
|
||||
Format → Missing → Column Map). Free version caps **output rows**,
|
||||
not the experience.
|
||||
- Embed the demo as an **iframe on the landing page** (not "click to
|
||||
open"). Friction kills conversion.
|
||||
- Persistent CTA after demo: *"Run this on your own 50 k-row file →
|
||||
buy for $49 →"* directly above the Gumroad button.
|
||||
|
||||
### 2.3 Niche down. Stop selling "data cleaning."
|
||||
|
||||
One engine, three landing pages:
|
||||
|
||||
| Persona | Landing-page lead | Demo dataset |
|
||||
|---|---|---|
|
||||
| Shopify operator (priority: pet supplies) | "Clean your customer / vendor / subscriber exports" | uc01_shopify_customer_list |
|
||||
| Bookkeeper / freelance accountant | "Reconcile bank exports + vendor lists. Auditable changes." | uc06_bank_export_overlap |
|
||||
| Marketing / RevOps agency | "Dedupe lead lists. Standardize phones across vendors." | uc13_combined_lead_sources |
|
||||
|
||||
Generic copy competes with `pip install pandas`. Vertical copy
|
||||
competes with nothing.
|
||||
|
||||
### 2.3a Top pain points per niche
|
||||
|
||||
The "what does this actually fix?" question. Each pain point below is
|
||||
sourced from operator-domain knowledge of these markets and the
|
||||
buyer-use-case research already captured in `BUSINESS.md §4a`. Pain
|
||||
points are ranked by **frequency × dollar impact** for that persona —
|
||||
high-frequency / high-cost pains lead the landing-page copy and the
|
||||
demo dataset.
|
||||
|
||||
> **Validation gap (honest disclaimer)**: these pains are derived from
|
||||
> operator knowledge of the categories, not from a sample of buyer
|
||||
> interviews. Per `BUSINESS.md §8` (no-touch constraint review at $5k/mo
|
||||
> MRR), validate the top-3 per persona via 5 buyer interviews before the
|
||||
> first $200 of paid acquisition spend. If any pain ranks below the
|
||||
> assumed level, swap it for the next-highest in this list.
|
||||
|
||||
#### Shopify operator (priority: pet supplies)
|
||||
|
||||
| # | Pain | $ / time impact | Tools that fix it |
|
||||
|---|------|-----------------|---|
|
||||
| S1 | **Klaviyo / Mailchimp / Omnisend per-contact billing.** Subscriber list with 10–18 % duplicate rate (case drift, plus signs in Gmail addresses, multiple devices) → recurring overpay forever. | $30–300/mo per percent of dupes on a 50 k list — recurring | Dedup + Format Standardize (email canonicalization) + Pipeline (re-run weekly) |
|
||||
| S2 | **Product feed rejected by Google Merchant Center / Meta Catalog.** Smart quotes in titles, NBSP in SKU, inconsistent attributes; campaign launch delayed 24–72 h while feed gets fixed. | 1–3 days delayed launch × campaign value | Text Cleaner + Format Standardize |
|
||||
| S3 | **Multi-channel order consolidation.** Shopify + Etsy + Amazon + Faire + wholesale spreadsheet, each with a different column for "customer email" / "order total" / "ship country". | 4–8 hr / month manually merging | Column Mapper + Dedup + Pipeline |
|
||||
| S4 | **Subscription identity fragmentation.** Pet-box subscribers cancel and re-sub under a different email; cohort analysis says churn is 20 % when it's actually 12 % — pricing decisions wrong. | Mis-priced LTV → over- or under-paid acquisition | Dedup with `merge=true` survivor |
|
||||
| S5 | **International tax / VAT MOSS compliance.** Country column is `UK` / `U.K.` / `United Kingdom` / `GB` in the same export; VAT report breaks. Phone formats per region break call-center routing. | Compliance penalty risk + ops friction | Format Standardize (per-row country) + Column Mapper |
|
||||
|
||||
#### Bookkeeper / freelance accountant
|
||||
|
||||
| # | Pain | $ / time impact | Tools that fix it |
|
||||
|---|------|-----------------|---|
|
||||
| B1 | **Bank-export month-overlap re-import.** Same transaction posts twice when Jan and Feb exports overlap at the boundary; client's books understate cash by 1–4 %. | 2–4 hr / month / client + reconciliation errors | Dedup with explicit Date+Amount+fuzzy Vendor strategy |
|
||||
| B2 | **QBO / Xero vendor consolidation for 1099 reports.** "Amazon" / "amazon.com" / "AMAZON.COM*4F2X9" become 3 vendors; 1099 reports break, P&L by vendor unusable. | 1–2 hr / 1099 cycle + IRS-paper-trail risk | Format Standardize (name canonicalization) + Dedup |
|
||||
| B3 | **Liability / professional indemnity.** Cannot use AI tools that don't show their work; client audit response window is 24–48 h. | Per-firm liability premium ≈ $500–2,500 / yr | Audit log built into every tool — every change row-logged |
|
||||
| B4 | **Per-license-not-per-client economics.** Most cleanup tools are per-seat / per-client SaaS; bookkeepers managing 10–30 clients hit price walls fast. | $30/mo × N clients vs. $49 once | Desktop license, no per-client constraint |
|
||||
| B5 | **Multi-currency books.** US-domiciled clients with EU customers; comma-decimal amounts (`€1.234,56`) crash standard parsers; parens-negative (`($89.50)`) treated as positive. | 30–60 min per multi-currency client per month | Format Standardize (`currency_decimal=auto`, parens-negative) |
|
||||
|
||||
#### Marketing / RevOps agency
|
||||
|
||||
| # | Pain | $ / time impact | Tools that fix it |
|
||||
|---|------|-----------------|---|
|
||||
| R1 | **HubSpot / Marketo / Iterable per-contact tier pricing.** 10 k contacts → enterprise tier at $4–8 k/mo. Every duplicate is a recurring tax. | $200–800 / month per 1 k duplicate contacts — recurring | Dedup with cross-source merge + Pipeline |
|
||||
| R2 | **Email-deliverability / sender reputation.** Sending to invalid or duplicate addresses tanks reputation; recovery takes weeks. | Catastrophic — entire email programme degraded | Format Standardize (email canonicalization) + Missing (sentinel detection) |
|
||||
| R3 | **GDPR / contact-data privacy.** Uploading lead data to a third-party cleaning SaaS is itself a GDPR concern; legal review blocks adoption. | Compliance risk + 4–8 wk legal-review delay | Local-only desktop app, zero outbound calls |
|
||||
| R4 | **Multi-vendor lead-source unification.** Apollo, ZoomInfo, LinkedIn Sales Nav, manual scrapes — each export has different headers, scoring, country format. | 1–3 days per campaign of manual unification | Column Mapper (alias matching) + Format Standardize (per-row country) + Dedup |
|
||||
| R5 | **Suppression-list management across 5+ platforms.** Each platform has its own format; un-deduped suppression lists let opt-outs slip through, triggering CAN-SPAM / GDPR exposure. | Compliance risk + churn-back cost | Pipeline saved as JSON, re-run on each new suppression batch |
|
||||
|
||||
### 2.4 Operationalize the moat the docs already name.
|
||||
|
||||
Three durable advantages, each promoted from buried feature to
|
||||
landing-page H1:
|
||||
|
||||
- **Quality**: 1 GB international standardization in ~2.5 minutes,
|
||||
locally. Excel can't do this; OpenRefine fights you for an hour.
|
||||
- **Privacy**: "Your data never leaves this computer." Already in the
|
||||
GUI footer — promote to landing-page lead, screenshot the empty
|
||||
network tab.
|
||||
- **Update cadence**: ship a v1.1 patch within 30 days of v1.0 launch.
|
||||
Not features — *evidence* the product is alive. "Added Czech Republic
|
||||
phone format support" beats "no updates in 6 months" every time.
|
||||
|
||||
### 2.5 Surface the audit-trail feature in sales copy.
|
||||
|
||||
Every tool has a structured audit log. Most cleaning tools do not.
|
||||
Bookkeepers and consultants get fired if they can't show what changed
|
||||
to a client. The audit feature is currently invisible on every
|
||||
proposed landing page and should be the **second-largest callout** —
|
||||
right after "runs locally."
|
||||
|
||||
Copy seed: *"Every change auditable. Hand the audit CSV to your client
|
||||
with the cleaned file."*
|
||||
|
||||
### 2.6 The Pipeline Runner is the retention multiplier.
|
||||
|
||||
A buyer with a saved pipeline isn't a one-off purchase — they're a
|
||||
recurring user who recommends the product. This is exactly the
|
||||
behavioural lever the no-touch constraint needs (DECISIONS.md §8
|
||||
trigger). Build it now (see §2.1 exception).
|
||||
|
||||
### 2.7 Add a $199 "priority support" tier post-launch.
|
||||
|
||||
Same code, async-email SLA (24 h response). Targets the bookkeeper /
|
||||
consultant persona whose own time is $300/hr. Zero new product work,
|
||||
~3× ARPU on 5–10 % of buyers. Lock the SLA to **async only** so the
|
||||
no-touch constraint isn't violated. Defer until $5 k/mo MRR (the same
|
||||
trigger DECISIONS.md §8 already names).
|
||||
|
||||
### 2.8 Dependency-aware pipeline UX.
|
||||
|
||||
Tools have soft execution-order preferences (Text Clean before Format
|
||||
Standardize, Format before Dedup, Missing before Dedup). The Pipeline
|
||||
Runner *recommends* the order, *warns* on reversals, and **never
|
||||
forces** — the user owns their workflow. Implementation: see
|
||||
`src/core/pipeline.py` `SOFT_DEPENDENCIES`.
|
||||
|
||||
## 3. 90-day execution sequence
|
||||
|
||||
| Week | Action | Done when |
|
||||
|---|---|---|
|
||||
| 1 | PyInstaller pipeline · Mac/Win unsigned installers · Apple Dev Program enrollment (1–2 wk lead) | `dist/datatools-mac.dmg` and `dist/datatools-win.exe` install on a clean machine |
|
||||
| 2 | Demo deployed to Streamlit Cloud · landing page v1 with embedded demo · 3 persona datasets in the demo | Public URL serves a working pipeline run on a sample dataset in < 30 s |
|
||||
| 3 | Gumroad listing live · share value-first in 3 niche communities (no pitch) · 1 long-tail SEO post for the lead persona | First listing impression captured · post not removed for self-promotion |
|
||||
| 4 | Pipeline Runner v1.0 shipped (this week, 2026-05-01 — exception per §2.1) · v1.1 patch announced with Tool 09 + intl improvements | Pipeline saves/loads JSON · 3 demo pipelines preloaded |
|
||||
| 5–8 | Bookkeeper landing page · agency landing page · second tool's promo cycle · priority-support tier added (defer purchase until §2.7 trigger) | Three live landing pages with distinct H1, demo dataset, conversion target |
|
||||
| 9–13 | Tool 06–08 only **if** revenue trajectory supports continued investment · otherwise more market work on the existing 5 + 09 | Decision made on 13 Aug 2026 with revenue data, not feature ambition |
|
||||
|
||||
## 4. Decision triggers (re-evaluation prompts)
|
||||
|
||||
These flip the plan, not the underlying criteria:
|
||||
|
||||
| Trigger | Reaction |
|
||||
|---|---|
|
||||
| First paying customer in week 4–13 | Continue. Plan is working. |
|
||||
| **Zero** paid in 90 days | Audit the funnel. Demo conversion? Niche fit? Price? Don't add features. |
|
||||
| $5 k/mo MRR | DECISIONS.md §8 trigger fires: revisit async + priority-support tier. |
|
||||
| Marketplace policy / shutdown | Switch to own-domain Stripe immediately; landing pages are already self-hosted. |
|
||||
| Streamlit hard direction change | Low-probability re-lock per DECISIONS.md §8. Tk fallback is documented. |
|
||||
|
||||
## 5. Anti-temptations (things the plan refuses)
|
||||
|
||||
- **More tools before more buyers.** Locked. Exception only for Pipeline Runner per §2.1.
|
||||
- **SaaS pivot.** Recurring infra conflicts with the lifestyle constraint (DECISIONS.md §4).
|
||||
- **Live chat / sales calls.** Conflicts with no-touch (DECISIONS.md §1 #8).
|
||||
- **Custom integrations / one-off consulting.** $300/hr looks tempting; breaks the "build once, sell many" model that justifies the entire strategy.
|
||||
- **Going broad on personas.** "All small businesses" is a generic landing page that converts at 1 %; "Shopify pet-supply operators with 1k–50k customers" converts at 5–15 % in the right communities.
|
||||
|
||||
## 6. What this plan deliberately leaves open
|
||||
|
||||
- Whether tools 06–08 ever ship. Decided on revenue, not roadmap.
|
||||
- Whether to add a fourth niche landing page. Decided on which of the
|
||||
three is producing.
|
||||
- Whether to invest in own-domain SEO. Compounding 6–18 mo asset; not
|
||||
the early-stage channel. Revisit when marketplace + community
|
||||
produces baseline traffic to optimise.
|
||||
- Whether to add a Notion / Slack support community. If support volume
|
||||
per 100 sales > 10 (BUSINESS.md §12 target), revisit; else leave async-email only.
|
||||
158
docs/POST-LAUNCH.md
Normal file
158
docs/POST-LAUNCH.md
Normal file
@@ -0,0 +1,158 @@
|
||||
# Post-launch — 90-day measurement plan
|
||||
|
||||
> Creator-only. The other half of `PLAN.md`: PLAN tells you what to
|
||||
> build, this tells you what to measure once it's live and which
|
||||
> numbers trigger which actions.
|
||||
> **Version**: 1.0 · **Adopted**: 2026-05-01 · **Owner**: Michael
|
||||
|
||||
This is a runnable monthly checklist, not analytics theatre. Every
|
||||
metric below has a **threshold** and an **action**. If you're not
|
||||
willing to execute the action when the threshold trips, drop the
|
||||
metric — measuring without responding is busywork.
|
||||
|
||||
## 1. The five numbers that matter
|
||||
|
||||
Every other dashboard, chart, or vanity stat is downstream of these
|
||||
five. The funnel is short on purpose; pre-PMF traffic doesn't have
|
||||
the resolution to support more.
|
||||
|
||||
| # | Metric | How to compute | Threshold | When tripped |
|
||||
|---|--------|----------------|-----------|--------------|
|
||||
| 1 | **Persona engagement** | `demo.run_completed / demo.page_view` per persona | < 30 % for 4 consecutive weeks | Demo isn't running or BEFORE preview isn't compelling. **Action:** check iframe loads; widen BEFORE preview to show pollution clearly; move demo above the fold. |
|
||||
| 2 | **Demo→CTA intent** | `demo.cta_clicked / demo.run_completed` per persona | < 5 % for 4 consecutive weeks | Demo is impressive but the CTA isn't earning trust. **Action:** add network-tab privacy screenshot; soften the price callout; A-B test eyebrow copy on the CTA card. |
|
||||
| 3 | **Purchase rate** | `gumroad.purchase / demo.cta_clicked` per persona | < 30 % for 4 consecutive weeks | Visitors click through but don't pull the card out. **Action:** check Gumroad listing renders cleanly; verify refund-policy copy; check that the screenshot on the listing matches the demo they just ran. |
|
||||
| 4 | **Refund rate** | `gumroad.refunds / gumroad.purchase` rolling 30 days | > 5 % | Buyer expectation mismatch. **Action:** read every refund email; determine if it's a feature gap (build it), a positioning lie (rewrite), or a personal-fit miss (fine, ignore). |
|
||||
| 5 | **Support load** | email tickets / 100 sales rolling 30 days | > 10 | The product isn't self-serve enough at this price. **Action:** find the top 3 questions; add to in-app onboarding + landing-page FAQ + the persona's saved pipeline. |
|
||||
|
||||
These five also map to BUSINESS.md §12 — that doc names the metrics;
|
||||
this doc operationalises them.
|
||||
|
||||
## 2. Monthly review — 30-minute checklist
|
||||
|
||||
Block 30 minutes on the first Monday of every month for the first six
|
||||
months. After month 6 if numbers are stable, drop to 15 minutes
|
||||
quarterly.
|
||||
|
||||
```
|
||||
[ ] Pull last 30 days of demo events from Cloudflare Web Analytics
|
||||
[ ] Pull last 30 days of Gumroad sales + refunds export
|
||||
[ ] Compute the five numbers in §1 per persona
|
||||
[ ] Note which thresholds are tripped (if any)
|
||||
[ ] Read every refund email since last review
|
||||
[ ] Read every support email since last review
|
||||
[ ] Decide ONE thing to change this month (only one)
|
||||
[ ] Update CHANGELOG with what was changed and why
|
||||
[ ] Schedule next review
|
||||
```
|
||||
|
||||
The "decide ONE thing" rule is load-bearing. Pre-PMF traffic doesn't
|
||||
have the volume to A/B-test multiple changes in parallel — you'll just
|
||||
confuse yourself about what moved the number.
|
||||
|
||||
## 3. Per-persona scoreboard (template)
|
||||
|
||||
Maintain in a single text file or spreadsheet. The shape that fits in
|
||||
a notebook page is the shape you'll actually update.
|
||||
|
||||
```
|
||||
Month: 2026-06
|
||||
─────────────────────────────────────────────────────────────────
|
||||
Shopify Bookkeeper RevOps Total
|
||||
Page views 420 180 290 890
|
||||
Demo runs 137 59 82 278
|
||||
CTA clicks 9 7 6 22
|
||||
Purchases 3 2 2 7
|
||||
|
||||
Metric 1 (engage) 33% 33% 28% 31%
|
||||
Metric 2 (intent) 7% 12% 7% 8%
|
||||
Metric 3 (purchase) 33% 29% 33% 32%
|
||||
Metric 4 (refund) 0% 0% 0% 0%
|
||||
Metric 5 (support) 3 tickets / 100 sales
|
||||
|
||||
Tripped thresholds: RevOps engagement (28% < 30%)
|
||||
|
||||
This-month change: Move demo embed above the fold on revops
|
||||
page; reduce hero text by 40%.
|
||||
|
||||
Last-month change: Added network-tab screenshot to all 3
|
||||
pages. Result: intent +1.5 percentage
|
||||
points on Shopify, flat elsewhere.
|
||||
```
|
||||
|
||||
## 4. Stage-gate triggers from PLAN.md
|
||||
|
||||
Reproduced here so the gate criteria sit beside the metrics that
|
||||
fire them:
|
||||
|
||||
| Trigger | From | Action |
|
||||
|---|------|--------|
|
||||
| **First paying customer** | PLAN §4 | Continue. Plan is working. |
|
||||
| **Zero paid in 90 days** | PLAN §4 | Audit the funnel. Don't add features. Run a small (1-week) outbound experiment to 30 niche-community contacts as a control, even though it stretches the no-touch constraint, to determine whether the bottleneck is reach or conversion. |
|
||||
| **$5 k/mo MRR** | DECISIONS §8 | Re-evaluate async constraint. Add priority-support tier (PLAN §2.7). |
|
||||
| **$10 k/mo MRR** | DECISIONS §8 | Revisit time-budget allocation. Decide on tools 06–08 vs. additional bundles. |
|
||||
| **Marketplace shutdown** | PLAN §4 / DECISIONS §8 | Switch landing-page CTA to own-domain Stripe Checkout. Pre-built; one-line edit. |
|
||||
| **Streamlit hard direction change** | DECISIONS §8 | Low-probability re-lock. Tk fallback documented. |
|
||||
| **Burnout signal** | DECISIONS §8 | Stop. Triage. The constraint matters more than the revenue ramp. |
|
||||
|
||||
## 5. What we deliberately do NOT measure
|
||||
|
||||
These look productive but predict nothing pre-PMF. Don't add them.
|
||||
|
||||
- **Bounce rate** — single-page sites have artificially high bounce. Useless signal.
|
||||
- **Time on page** — landing pages are *supposed* to be quick reads. Long time on page often means confusion, not engagement.
|
||||
- **Heatmaps / scroll-depth** — no statistical resolution at <500 monthly visitors. Add when you cross 5 k/month.
|
||||
- **Email open rates** — under §2.7 priority support is the only email channel; opens aren't a buying signal.
|
||||
- **Social mentions** — vanity. The signal that matters is "did they buy" or "did they come back."
|
||||
|
||||
## 6. What we measure once, then trust
|
||||
|
||||
Do these once, then let them run for 6+ months without re-measuring:
|
||||
|
||||
- **Demo correctness** — once per pipeline release, run all 3 demos
|
||||
end-to-end via `tests/test_pipeline.py` and check the output looks
|
||||
reasonable. The CI pipeline already does this; nothing to add.
|
||||
- **Cross-platform install** — once per release, verify the
|
||||
PyInstaller bundle launches on Mac / Windows / Linux. After three
|
||||
green releases, trust the build pipeline; spot-check on major OS
|
||||
updates only.
|
||||
- **Privacy claim integrity** — once at launch, capture the network
|
||||
tab while running the cleaner and host that screenshot at a stable
|
||||
URL. Re-capture only when a new tool or dependency is added.
|
||||
|
||||
## 7. Per-persona attribution
|
||||
|
||||
The buy buttons on every landing page carry `?from=<persona>` query
|
||||
parameters. Gumroad propagates that into the order metadata. Use it
|
||||
to attribute purchases:
|
||||
|
||||
| persona key | landing page URL | Gumroad query | Source |
|
||||
|---|---|---|---|
|
||||
| `shopify-pet` | `/shopify-pet/` | `?from=shopify-pet` | Shopify operator |
|
||||
| `bookkeeper` | `/bookkeeper/` | `?from=bookkeeper` | Bookkeeper / freelance accountant |
|
||||
| `revops` | `/revops/` | `?from=revops` | Marketing / RevOps agency |
|
||||
| `apex` | `/` | (no query — use `unknown` bucket) | Generic discovery |
|
||||
|
||||
When `unknown` exceeds 30 % of total, add UTM tagging to community
|
||||
posts and SEO blog backlinks so you can break the bucket apart.
|
||||
|
||||
## 8. The four months that decide whether the plan works
|
||||
|
||||
Reading PLAN.md §3 + this doc together, the rough script:
|
||||
|
||||
| Month | What's running | What we expect to learn |
|
||||
|---|---|---|
|
||||
| **M1** (June) | Installers · demo · 3 landing pages · Gumroad live | Whether the funnel mechanically works. Numbers will be noisy; just look for one purchase. |
|
||||
| **M2** (July) | M1 + community posts in 3 niches + 1 SEO post | Which persona converts. Re-allocate effort to the highest-converting niche. |
|
||||
| **M3** (August) | M2 + landing-page changes from M2 review | Whether intent-rate moved on the change. Decide tools 06–08 go/no-go. |
|
||||
| **M4** (September) | M3 + first repeat-buyer signals | Whether the Pipeline Runner is producing retention as designed. |
|
||||
|
||||
By end of M4, the data tells you whether the plan is producing
|
||||
$1k–3k/mo (BUSINESS.md §6 6-month target) — extrapolated from the
|
||||
trajectory, not the absolute number.
|
||||
|
||||
## 9. The hardest part of the plan to execute
|
||||
|
||||
Not the metrics. Not the build. **The "decide ONE thing per month"
|
||||
rule** — operators with engineering backgrounds chronically pick
|
||||
three changes per month and conclude nothing because their signal
|
||||
is muddled. This doc says one. It means one.
|
||||
@@ -46,7 +46,7 @@ Numbered support matrix. Updated with every shipped capability.
|
||||
**Cell-level**:
|
||||
- `smart_punctuation_in_data`, `nbsp_or_unicode_whitespace`, `zero_width_or_invisible`, `dirty_column_headers`, `whitespace_padding`, `null_like_sentinels`, `suspected_mojibake`, `mixed_case_email_column`, `inconsistent_date_format`, `near_duplicate_rows`, `leading_zero_ids`.
|
||||
|
||||
**Encoding integrity**: `encoding_uncertain`, `encoding_decode_failed`, `empty_input`.
|
||||
**Encoding integrity**: `encoding_uncertain`, `encoding_decode_failed`, `encoding_lying_bom`, `empty_input`.
|
||||
|
||||
Sample size: 1,000 rows (configurable).
|
||||
|
||||
@@ -71,17 +71,37 @@ Sample size: 1,000 rows (configurable).
|
||||
- Full-DataFrame `auto_fix`: ~5 min (~30 µs/cell).
|
||||
- Output write: ~10 s.
|
||||
- Recommended RAM: 4× input size for full-Apply path.
|
||||
- Format standardizer (`standardize_file`): ~150k rows/sec on cache-warm
|
||||
international data; chunk-bounded RAM (~50 MB peak at default
|
||||
chunk_size=50,000). A 1 GB CSV with mixed phone+currency+address
|
||||
columns finishes in ~2.5–10 minutes depending on column count.
|
||||
|
||||
## 11. Tools
|
||||
1. Deduplicator — Ready
|
||||
2. Text Cleaner — Ready
|
||||
3. Format Standardizer — Ready
|
||||
4. Missing Value Handler — Coming Soon
|
||||
5. Column Mapper — Coming Soon
|
||||
4. Missing Value Handler — Ready
|
||||
5. Column Mapper — Ready
|
||||
6. Outlier Detector — Coming Soon
|
||||
7. Multi-File Merger — Coming Soon
|
||||
8. Validator & Reporter — Coming Soon
|
||||
9. Pipeline Runner — Coming Soon
|
||||
9. Pipeline Runner — Ready
|
||||
|
||||
### 11.a Recommended pipeline order (soft, not enforced)
|
||||
|
||||
The Pipeline Runner ships with a `SOFT_DEPENDENCIES` table; the
|
||||
following ordering is the default and the basis of the warning
|
||||
surface. Re-ordering is allowed; the runner emits a warning string
|
||||
and proceeds.
|
||||
|
||||
| # | Tool | Why this slot |
|
||||
|---|------|---------------|
|
||||
| 1 | column_map (optional, for header alignment) | Multi-vendor unification — rename early so downstream tools see canonical headers |
|
||||
| 2 | text_clean | NBSP / smart quotes / zero-width pollution silently breaks downstream parsers |
|
||||
| 3 | format_standardize | Phones / dates / currencies → canonical form before missing detection and dedup |
|
||||
| 4 | missing | Sentinel detection, imputation, drop strategies — needs canonical types |
|
||||
| 5 | column_map (optional, for schema enforcement) | Project to target schema, coerce, drop extras AFTER cleaning |
|
||||
| 6 | dedup | Fuzzy matching is most accurate on canonicalised, sentinel-laundered data |
|
||||
|
||||
## 12. Gate (Review & Normalize)
|
||||
- Gates every tool page.
|
||||
@@ -95,7 +115,7 @@ Sample size: 1,000 rows (configurable).
|
||||
|
||||
## 13. Interfaces
|
||||
- **GUI**: Streamlit, browser-based, local, no internet.
|
||||
- **CLI**: `python -m src.cli` (dedup) · `src.cli_text_clean` · `src.cli_analyze`.
|
||||
- **CLI**: `python -m src.cli` (dedup) · `src.cli_text_clean` · `src.cli_format` · `src.cli_missing` · `src.cli_column_map` · `src.cli_pipeline` · `src.cli_analyze`.
|
||||
- **Python API**: `from src.core import …` (analyze, repair_bytes, clean_dataframe, deduplicate, standardize_dataframe, …).
|
||||
- **JSON output**: `--json` on `cli_analyze`.
|
||||
|
||||
@@ -113,8 +133,8 @@ Sample size: 1,000 rows (configurable).
|
||||
- **Dev**: pytest, tox.
|
||||
|
||||
## 16. Test coverage
|
||||
- 1,230 tests passing, 4 skipped (ftfy not installed), 17 xfailed (documented).
|
||||
- Fixture corpora: text-cleaner (21), encodings (31), reference UTF-8 (9), format-cleaner (199 buyer cases).
|
||||
- 1,729 tests passing, 0 skipped, 0 xfailed.
|
||||
- Fixture corpora: text-cleaner (21), encodings (31), reference UTF-8 (9), format-cleaner (199 buyer cases + 20-row international stress fixture), missing-handler (3 use cases + 16 edge cases), column-mapper (3 use cases + 5 edge cases).
|
||||
- Run: `python run_tests.py [--tool …] [--fixtures] [--coverage]`.
|
||||
|
||||
## 17. Privacy / data handling
|
||||
|
||||
Reference in New Issue
Block a user