Replaces the Shopify / RevOps / Bookkeeper demo trio with three accounting personas that share one buyer, each entering through a workflow where a messy export costs money — all running the same saved 4-step pipeline: - bank_reconciliation.csv (Bookkeeper): 26 -> 20 rows, 6 double-posted transactions caught after date+amount standardization. - vendor_1099.csv (AP / 1099): 24 records -> 8 vendors, 7 missing EINs recovered via dedup merge — the 1099-complete story. - ar_open_invoices.csv (AR): 26 -> 21 rows, 5 double-entered invoices removed, blank status backfilled from the twin row. Every number is validated against the live engine and pinned by tests/test_demo_pipelines.py (read path mirrors app_demo._load_demo: dtype=str, keep_default_na=False). Rewires src/gui/app_demo.py PERSONAS (keys bookkeeper / ap-1099 / ar-aging, accounting H1/sub/CTA) and rewrites docs/DEMO-PLAN.md sections 3/4/7 with the validated outcomes. (Repo hygiene forced by a partial-clone gap: finalizes the already-deleted, unreferenced samples/messy_text.csv whose blob was unrecoverable.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
309 lines
17 KiB
Markdown
309 lines
17 KiB
Markdown
# Demo Plan — DataTools
|
||
|
||
> Creator-only. Implements PLAN.md §2.2 (the demo IS the product) and
|
||
> §2.3 (niche down — three landing pages, one engine).
|
||
> **Version**: 1.0 · **Adopted**: 2026-05-01 · **Owner**: Michael
|
||
|
||
The hosted demo is the single highest-leverage marketing asset in the
|
||
plan. This document defines exactly what loads, in what order, with
|
||
what data, for which buyer — so the operator builds it once and never
|
||
rebuilds it from a stale headline.
|
||
|
||
## 1. Goals
|
||
|
||
- Convert a cold visitor to a paid buyer in **under three minutes** of
|
||
active interaction.
|
||
- Demonstrate the *full pipeline* (not one tool) on a dataset that
|
||
*looks like the visitor's own work* — not a toy CSV.
|
||
- Survive zero attention to maintenance — once running, the demo
|
||
should keep working as the engine evolves (the pre-saved pipeline
|
||
JSONs use the same code path the paid product uses).
|
||
- Provide a shareable artifact for niche-community posts (a public URL
|
||
the operator can drop into a subreddit reply with one sentence).
|
||
|
||
## 2. Constraints (non-negotiable)
|
||
|
||
| Constraint | Source | Implication |
|
||
|---|---|---|
|
||
| Free hosting at launch | BUSINESS.md §9 | Streamlit Community Cloud (1 GB RAM, sleeps after 7 days idle) |
|
||
| No login | BUSINESS.md §7 | No email gate, no signup wall, no "create account to continue" |
|
||
| Async / no-touch | DECISIONS.md §1 #8 | Cannot offer "schedule a demo with us" CTA |
|
||
| Runs locally on paid product | BUSINESS.md §11 | Demo can't expose the same engine to abuse — needs row caps |
|
||
| Friction kills conversion | BUSINESS.md §7 | Demo dataset preloaded; no "select a file" first-step |
|
||
| < $1,200/mo recurring | BUSINESS.md §9 | Migration plan to $5/mo VPS only after rate-limit signal |
|
||
|
||
## 3. The three personas — one audience: accounting (per PLAN.md §2.3)
|
||
|
||
We niche to **accounting** and enter through the three workflows where a
|
||
messy export costs real money. Same engine, three landing pages — each
|
||
is the same buyer at a different desk (bookkeeping, payables, receivables).
|
||
|
||
| Tag | Persona | Top-of-funnel keyword | Demo dataset | Pre-saved pipeline |
|
||
|---|---|---|---|---|
|
||
| `bookkeeper` | Bookkeeper — bank reconciliation | "reconcile bank export csv duplicates" | `samples/demo/bank_reconciliation.csv` | `bank_reconciliation_pipeline.json` |
|
||
| `ap-1099` | Accounts payable — 1099 vendor prep | "clean 1099 vendor list missing EIN" | `samples/demo/vendor_1099.csv` | `vendor_1099_pipeline.json` |
|
||
| `ar-aging` | Accounts receivable — open invoices | "remove duplicate invoices aging report" | `samples/demo/ar_open_invoices.csv` | `ar_open_invoices_pipeline.json` |
|
||
|
||
Each persona gets its **own landing page URL** (`?p=<tag>`), its **own
|
||
demo dataset loaded by default**, and its **own H1 + below-the-fold
|
||
copy** — wired in `src/gui/app_demo.py::PERSONAS`. The engine is
|
||
identical; only positioning differs.
|
||
|
||
## 4. Demo dataset specifications
|
||
|
||
Each dataset is intentionally small (~15–25 rows) so the full pipeline
|
||
runs in well under one second on Streamlit Community Cloud's free
|
||
hardware. Each row is a *plausible-looking* export from that
|
||
persona's tooling. Each contains every kind of pollution the bundle's
|
||
five tools fix, so a single demo run shows every tool earning its
|
||
keep.
|
||
|
||
### 4.0 Value-proof map
|
||
|
||
Each demo dataset is engineered so the buyer sees their **own top pain**
|
||
fixed in the AFTER preview, with one unmistakable headline number. All
|
||
three run the same saved 4-step pipeline (Clean Text → Standardize
|
||
Formats → Fix Missing Values → Find Duplicates). The numbers below are
|
||
**validated against the live engine** (`tests/test_demo_pipelines.py`
|
||
pins them) — refresh the dataset only if a number stops landing.
|
||
|
||
| Persona | Headline proof | What the visitor watches happen |
|
||
|---|---|---|
|
||
| Bookkeeper | **26 → 20 rows · 6 phantom duplicates removed** | The same payment posted twice (different date + amount format) collapses to one; dates go ISO, parens-negatives become real negatives |
|
||
| AP / 1099 | **24 records → 8 vendors · 7 missing EINs recovered** | Each vendor's scattered records merge into one complete row; `merge=true` backfills the EIN/address/phone that any single record was missing |
|
||
| AR aging | **26 → 21 rows · 5 double-entered invoices removed** | Duplicate invoice numbers collapse; a blank status is backfilled from its twin; invoice + due dates go ISO, amounts numeric |
|
||
|
||
### 4.1 `bank_reconciliation.csv` (26 rows) — Bookkeeper
|
||
|
||
**Looks like**: two months (Jan + Feb 2025) of business-checking activity
|
||
from a bank portal, where the Feb re-export overlaps Jan so the same
|
||
transaction posts twice. Columns: `Date, Description, Vendor, Category,
|
||
Amount, Account`.
|
||
|
||
**Pollution included**:
|
||
- Mixed date formats: `01/15/2025`, `2025-01-15`, `Jan 18 2025`, `1/27/25`, `Feb 5 2025`.
|
||
- Currency formats incl. negatives: `-$129.99`, `($89.50)` parens-negative, `+$3,450.00`, `- $599.88`, bare `-129.99`, `(50.00)`.
|
||
- Whitespace + NBSP padding; smart quotes and an em-dash inside descriptions.
|
||
- Vendor casing variety on *non-duplicate* rows: `Amazon` / `amazon.com` / `AMAZON.COM`, `Verizon` / `verizon`.
|
||
- Disguised nulls in Category: `—`, `(blank)`, `?`, `unknown`, `TBD`.
|
||
- **6 duplicate transactions** — each pair shares the same vendor + real value but a different date *and* amount format, so they collapse only after standardization.
|
||
|
||
**After running the pipeline** (validated): **26 → 20 rows, 6 duplicates
|
||
removed**, 36 date/amount cells standardized (0 unparseable), all dates
|
||
ISO, parens-negatives resolved (`($89.50)` → `-89.50`), disguised-null
|
||
categories flagged. The reconciliation ties out.
|
||
|
||
### 4.2 `vendor_1099.csv` (24 rows) — Accounts payable / 1099
|
||
|
||
**Looks like**: a 1099-NEC vendor master list where the same vendor was
|
||
entered 2–3 times across the year by different staff, each record holding
|
||
only *part* of the vendor's details. Columns: `Vendor, Contact, Email,
|
||
Phone, EIN, Address, Total_Paid`.
|
||
|
||
**Pollution included**:
|
||
- The duplicate records for a vendor share one email differing only by case/whitespace (the reliable dedup key, matched with the `email` normalizer).
|
||
- EIN / Phone / Address scattered across the duplicate set so no single record is complete but the union is — gaps marked `—`, `(blank)`, `TBD`, `unknown`, `N/A`.
|
||
- Vendor name casing/spelling variants, phone formats, EIN formats (`12-3456789` vs `123456789`), `Total_Paid` currency variants.
|
||
|
||
**After running the pipeline** (validated): **24 records → 8 vendors, 16
|
||
duplicates removed, 7 missing EINs recovered** by `merge=true` +
|
||
`most_complete` survivor, 35 disguised nulls caught, phones/emails/amounts
|
||
standardized (0 unparseable). One vendor genuinely has no EIN in any
|
||
record — it survives with a blank EIN as the realistic "flag for
|
||
follow-up" case.
|
||
|
||
### 4.3 `ar_open_invoices.csv` (26 rows) — Accounts receivable
|
||
|
||
**Looks like**: an open-invoices (unpaid AR) export where some invoices
|
||
were double-entered in different formats and client contacts are messy.
|
||
Columns: `Invoice, Client, Email, Invoice_Date, Due_Date, Amount, Status`.
|
||
|
||
**Pollution included**:
|
||
- Two date columns with mixed formats; currency variants incl. a credit memo `($300.00)` → `-300.00`.
|
||
- Client name casing variety; email case variants (`AP@Acme.com` vs `ap@acme.com`).
|
||
- Status disguised nulls: `—`, `?`, `(blank)`, `TBD`, `unknown`, `(none)`.
|
||
- **5 double-entered invoices** — same invoice number twice, dates/amount in different formats, one copy with a blank status the other fills.
|
||
|
||
**After running the pipeline** (validated): **26 → 21 rows, 5 duplicate
|
||
invoices removed**, both date columns ISO + amounts numeric + emails
|
||
lowercased (0 unparseable), 7 disguised-null statuses caught, and a blank
|
||
status backfilled from its twin via `merge=true`. The aging report stops
|
||
double-counting.
|
||
|
||
## 5. UX flow (per persona)
|
||
|
||
The demo is a single Streamlit page (likely
|
||
`src/gui/pages/0_Review.py` repurposed for demo mode, or a
|
||
dedicated `app_demo.py` for the cloud build).
|
||
|
||
```
|
||
┌──────────────────────────────────────────────────────────┐
|
||
│ DataTools — for {Persona} │
|
||
│ "{Persona-specific H1}" │
|
||
├──────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ Sample dataset preloaded: bank_reconciliation.csv │
|
||
│ [Replace with your own file (capped 100 rows)] │
|
||
│ │
|
||
│ ┌─ BEFORE preview (26 rows) ─────────────────────────┐ │
|
||
│ │ 01/15/2025 | Stripe | +$3,450.00 | … │ │
|
||
│ │ 2025-01-15 | Stripe | 3450.00 | … (dup) │ │
|
||
│ │ ... │ │
|
||
│ └──────────────────────────────────────────────────┘ │
|
||
│ │
|
||
│ Pipeline (saved): │
|
||
│ 1. Clean Text → 2. Standardize Formats → │
|
||
│ 3. Fix Missing → 4. Find Duplicates │
|
||
│ │
|
||
│ [▶ Run pipeline] │
|
||
│ │
|
||
│ ┌─ AFTER preview ───────────────────────────────────┐ │
|
||
│ │ 26 rows → 20 (6 duplicate transactions removed) │ │
|
||
│ │ 36 cells standardized · 4 disguised nulls flagged │ │
|
||
│ │ │ │
|
||
│ │ 2025-01-15 | Stripe | 3450.00 | … │ │
|
||
│ │ ... │ │
|
||
│ └──────────────────────────────────────────────────┘ │
|
||
│ │
|
||
│ [Download cleaned CSV (sample, watermarked)] │
|
||
│ │
|
||
│ ┌──────────────────────────────────────────────────┐ │
|
||
│ │ Like what you see? │ │
|
||
│ │ Run this on YOUR 50,000-row export — locally. │ │
|
||
│ │ No upload. Your data never leaves your machine. │ │
|
||
│ │ [Get DataTools — $49 →] │ │
|
||
│ └──────────────────────────────────────────────────┘ │
|
||
└──────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
**Critical UX points**:
|
||
- Sample dataset is *already loaded* on page paint. Visitor never
|
||
sees an empty state.
|
||
- BEFORE table is shown side-by-side with AFTER once the run
|
||
completes. Hidden-character toggle on by default so the visitor
|
||
*sees* what was hidden in their data.
|
||
- "Replace with your own file" is a secondary action below the BEFORE
|
||
table — not the headline.
|
||
- Per-step metrics are shown in the AFTER block: "27 cells
|
||
canonicalized, 33 sentinels resolved, 4 duplicates merged." Numbers
|
||
sell more than narrative.
|
||
- Buy button is **inside** the AFTER block and **above the fold** when
|
||
the run completes. Friction kills.
|
||
|
||
## 6. Free vs paid boundary
|
||
|
||
The demo runs the **same code** as the paid product. Caps are surface,
|
||
not engine.
|
||
|
||
| Limit | Free demo | Paid (downloaded) |
|
||
|---|---|---|
|
||
| Input rows | 100 | unlimited (1 GB+ via streaming) |
|
||
| File size | 5 MB | unlimited |
|
||
| Output | watermarked CSV ("DataTools demo — buy at <url>" appended as last row) | clean CSV |
|
||
| Pipeline editor | locked to the persona-saved pipeline | full edit / save / load JSON |
|
||
| Save pipeline JSON | disabled | enabled |
|
||
| International | enabled | enabled |
|
||
| Audit log download | disabled | enabled |
|
||
| Tool 06–09 | as they ship | as they ship |
|
||
|
||
The watermark is a **single trailing row**, not an in-cell tag — so
|
||
the demo's AFTER preview *visibly* reads as production-quality data,
|
||
not "demo crippled" data.
|
||
|
||
## 7. CTA copy (per persona)
|
||
|
||
Copy lives in `src/gui/app_demo.py::PERSONAS` (H1 / sub / CTA per tag);
|
||
keep this section in sync with that dict.
|
||
|
||
### 7.1 Bookkeeper — bank reconciliation (`?p=bookkeeper`)
|
||
|
||
- **H1**: *Catch the transactions your bank export posted twice. Locally.*
|
||
- **Sub**: *When the Jan and Feb exports overlap, the same payment posts
|
||
twice in two formats. DataTools standardizes every date and amount, then
|
||
dedups on the real transaction so your reconciliation ties out — 26 rows
|
||
→ 20, six phantom duplicates gone.*
|
||
- **CTA**: *Get DataTools for Bookkeepers — $49 →*
|
||
|
||
### 7.2 Accounts payable — 1099 prep (`?p=ap-1099`)
|
||
|
||
- **H1**: *Build a clean 1099 vendor list — with the missing EINs filled in.*
|
||
- **Sub**: *The same vendor entered three times, each record holding only
|
||
part of the details. DataTools consolidates to one row and backfills the
|
||
gaps from the duplicates — 24 records → 8 vendors, 7 missing EINs
|
||
recovered.*
|
||
- **CTA**: *Get DataTools for Accounting — $49 →*
|
||
|
||
### 7.3 Accounts receivable — open invoices (`?p=ar-aging`)
|
||
|
||
- **H1**: *Stop chasing the invoices your aging report counted twice. Locally.*
|
||
- **Sub**: *Double-entered invoices inflate your AR aging and your
|
||
follow-ups. DataTools standardizes dates and amounts, lowercases client
|
||
emails, and removes the duplicate invoice numbers — 26 rows → 21, five
|
||
phantom invoices off the books.*
|
||
- **CTA**: *Get DataTools for Accounting — $49 →*
|
||
|
||
## 8. Telemetry / conversion tracking
|
||
|
||
Async + no-touch + free hosting limits what we can instrument. Use
|
||
event-only counters, no PII:
|
||
|
||
| Event | Source | Aggregate-only field |
|
||
|---|---|---|
|
||
| `demo.page_view` | landing page | persona tag |
|
||
| `demo.run_clicked` | demo page | persona tag |
|
||
| `demo.run_completed` | demo page | persona tag, rows_processed |
|
||
| `demo.cta_clicked` | demo page | persona tag |
|
||
| `gumroad.purchase` | Gumroad webhook | landing-page-source query param (`?from=shopify-pet`) |
|
||
|
||
Conversion = `cta_clicked / run_completed`. Demo-quality issue surfaces
|
||
when `run_completed / page_view` < 30 % (visitors not engaging).
|
||
|
||
Self-host counters on Cloudflare Pages (free, GDPR-friendly). No
|
||
Google Analytics — adds privacy banner, conflicts with the "your data
|
||
never leaves your computer" message.
|
||
|
||
## 9. Maintenance plan
|
||
|
||
**Recurring**: zero. The demo runs on the same engine the paid
|
||
product ships, so any improvement to the engine improves the demo
|
||
automatically. The pre-saved pipeline JSONs reference column names
|
||
and tool names, both stable APIs.
|
||
|
||
**Triggers for revisit**:
|
||
|
||
| Trigger | Action |
|
||
|---|---|
|
||
| Streamlit Community Cloud rate-limits / sleeps too aggressively | Migrate to a $5–10/mo VPS (BUSINESS.md §9 contingency) |
|
||
| Demo dataset becomes stale (e.g. all phones standardize to no-op) | Refresh with a new pollution batch — *don't change the persona* |
|
||
| `run_completed / page_view < 30 %` for 4 consecutive weeks | Audit the demo: is the BEFORE preview showing the mess clearly? Is the AFTER too small to notice? |
|
||
| `cta_clicked / run_completed < 5 %` for 4 consecutive weeks | The demo is impressive but the CTA isn't earning trust — revise copy + add a screenshot of the network tab showing zero outbound calls (PLAN.md §2.4) |
|
||
| New tool ships (06–09) | Decide *per persona* whether to add it to that persona's saved pipeline. Not all tools belong on all personas |
|
||
|
||
## 10. Build sequence (drops into PLAN.md week 2)
|
||
|
||
| Day | Action |
|
||
|---|---|
|
||
| 1 | Demo build of Streamlit app: 3 personas, switch via query param `?p=shopify-pet` |
|
||
| 2 | Pipeline JSONs wired in; row cap + watermark applied; download button |
|
||
| 3 | Deploy to Streamlit Community Cloud · 3 sub-paths or 3 separate apps |
|
||
| 4 | Persona landing pages: 3 static HTML pages on Cloudflare Pages, each with iframe embed of its persona demo + CTA |
|
||
| 5 | Telemetry counters wired (Cloudflare event API) · Gumroad webhook captures `?from=` |
|
||
|
||
End of day 5: three URLs the operator can drop into three different
|
||
niche-community threads, each performing its own conversion math.
|
||
|
||
## 11. Anti-temptations (things the demo deliberately refuses)
|
||
|
||
- **No "try it on your data first" gate that requires email.** The
|
||
whole point is friction-free.
|
||
- **No "schedule a demo" CTA.** Locked by no-touch.
|
||
- **No live chat widget.** Same.
|
||
- **No A/B-test framework yet.** Single-arm copy, ship it, iterate
|
||
monthly. A/B requires statistical traffic the funnel doesn't have
|
||
pre-PMF.
|
||
- **No watermark inside cells.** The AFTER preview must look
|
||
production-quality. Watermark goes on a single trailing row that's
|
||
obviously the demo signature.
|
||
- **No animation / loader theatrics.** Pipeline runs in <1 s; a
|
||
fake-progress bar lies about speed.
|