# Demo Plan — DataTools > Creator-only. Implements PLAN.md §2.2 (the demo IS the product) and > §2.3 (niche down — three landing pages, one engine). > **Version**: 1.0 · **Adopted**: 2026-05-01 · **Owner**: Michael The hosted demo is the single highest-leverage marketing asset in the plan. This document defines exactly what loads, in what order, with what data, for which buyer — so the operator builds it once and never rebuilds it from a stale headline. ## 1. Goals - Convert a cold visitor to a paid buyer in **under three minutes** of active interaction. - Demonstrate the *full pipeline* (not one tool) on a dataset that *looks like the visitor's own work* — not a toy CSV. - Survive zero attention to maintenance — once running, the demo should keep working as the engine evolves (the pre-saved pipeline JSONs use the same code path the paid product uses). - Provide a shareable artifact for niche-community posts (a public URL the operator can drop into a subreddit reply with one sentence). ## 2. Constraints (non-negotiable) | Constraint | Source | Implication | |---|---|---| | Free hosting at launch | BUSINESS.md §9 | Streamlit Community Cloud (1 GB RAM, sleeps after 7 days idle) | | No login | BUSINESS.md §7 | No email gate, no signup wall, no "create account to continue" | | Async / no-touch | DECISIONS.md §1 #8 | Cannot offer "schedule a demo with us" CTA | | Runs locally on paid product | BUSINESS.md §11 | Demo can't expose the same engine to abuse — needs row caps | | Friction kills conversion | BUSINESS.md §7 | Demo dataset preloaded; no "select a file" first-step | | < $1,200/mo recurring | BUSINESS.md §9 | Migration plan to $5/mo VPS only after rate-limit signal | ## 3. The three personas (per PLAN.md §2.3) | Tag | Persona | Top-of-funnel keyword | Demo dataset | Pre-saved pipeline | |---|---|---|---|---| | `shopify-pet` | Shopify operator (priority: pet supplies) | "shopify customer cleanup" | `samples/demo/shopify_pet_customers.csv` | `shopify_pet_pipeline.json` | | `bookkeeper` | Bookkeeper / freelance accountant | "reconcile bank export csv" | `samples/demo/bookkeeper_bank_reconcile.csv` | `bookkeeper_bank_pipeline.json` | | `revops` | Marketing / RevOps agency | "dedupe lead list across vendors" | `samples/demo/agency_combined_leads.csv` | `agency_leads_pipeline.json` | Each persona gets its **own landing page URL**, its **own demo dataset loaded by default**, and its **own H1 + below-the-fold copy.** The engine is identical; only positioning differs. ## 4. Demo dataset specifications Each dataset is intentionally small (~15–25 rows) so the full pipeline runs in well under one second on Streamlit Community Cloud's free hardware. Each row is a *plausible-looking* export from that persona's tooling. Each contains every kind of pollution the bundle's five tools fix, so a single demo run shows every tool earning its keep. ### 4.0 Pain-point coverage map Each demo dataset is engineered so the buyer sees their **own top pain** demonstrated in the AFTER preview. The mapping below pairs each pain from PLAN.md §2.3a with the rows / columns that exercise it. Refresh the dataset only when this coverage drops. | Persona | Pain (from PLAN §2.3a) | Demo coverage | |---|---|---| | Shopify pet | S1 — Klaviyo per-contact dupes | 5 dup pairs across rows 1–15 (case + format + address-twin variants) | | Shopify pet | S2 — feed-rejection chars | smart-quote / NBSP / BOM in rows 1–6, 9, 11 | | Shopify pet | S3 — multi-channel | partner-style customer IDs (`SHOP-`); demonstration of column-level mapping covered in RevOps demo | | Shopify pet | S4 — subscription identity | rows 1+2, 7+8, 9+10 — same person, different format | | Shopify pet | S5 — VAT-MOSS country drift | rows 16–18 (`United Kingdom` / `U.K.` / `UK`) + rows 19–20 (`Germany`/`Italia`) | | Bookkeeper | B1 — month-overlap re-import | 7 dup pairs spanning Jan↔Feb and Mar boundaries | | Bookkeeper | B2 — 1099 vendor consolidation | Amazon × 3 spellings, Verizon × 2, Acme Realty × 2, Adobe × 2, Costco × 2, Zoom × 2, Stripe × 4 | | Bookkeeper | B3 — audit trail | every cell change in the run logged with old/new/rule — surface in the demo's audit tab | | Bookkeeper | B4 — per-license economics | demonstrated by pricing copy, not data | | Bookkeeper | B5 — multi-currency | rows 26 (EUR), 27 (GBP), 28 (BRL with comma decimal), 29 (parens-negative) | | RevOps | R1 — per-contact tier | 6 cross-source dup pairs (HubSpot × LinkedIn × Manual Scrape) | | RevOps | R2 — deliverability | rows 26–27 (`uma at uniform dot com`, `victor@@victorco.com` invalid emails) | | RevOps | R3 — GDPR / privacy | demonstrated by the network-tab moat panel + zero-upload claim | | RevOps | R4 — vendor unification | 3 source values (HubSpot / LinkedIn / Manual Scrape), 13 country codes, mixed-shape headers | | RevOps | R5 — suppression list | rows 29–30 (`Suppressed`, `Opted Out` tags) | ### 4.1 `shopify_pet_customers.csv` (20 rows) **Looks like**: a Shopify customer export filtered for "Pet Supplies" sales channel, 12 months activity. **Pollution included**: - Whitespace padding (" Alice ", "Sydney Opera House Drive ") - Mixed phone formats: `(415) 555-1234`, `415.555.1234`, `5559876543`, `+1 555-111-1111` - International phones: GB, ES, DE, AU, JP (15 demo rows span 6 countries) - Currency variants: `$1,240.50`, `£890.25`, `€2.410,75` (EU comma decimal), `A$ 1,299.00`, `¥75000` - Date formats: `2025-12-04`, `12/15/2025`, `?`, `(blank)`, `(none)`, `#N/A` - Disguised nulls: `N/A`, blank, `(blank)`, `?`, `#N/A`, `(none)`, `unknown` - Name casing: `EVE MARTINEZ`, `henry`, `O'NEIL`, `noah`, mixed Title / ALL CAPS / lower - Email case variants that *should* dedup: `Bob@PetShop.com` vs `alice@petshop.com` - 4 fuzzy duplicates (Alice/Bob same address, Grace/Henry same phone, Carlos/Olivia same address, Ivy/Jack same address) **After running the pipeline**: 20 rows → 15, ~29 cells canonicalized, ~45 sentinels standardised, 5 cross-row duplicates merged. The customer table is now Klaviyo-import-ready and the country column (previously `UK` / `U.K.` / `United Kingdom` / `Germany` / `Italia`) is GB / DE / IT — VAT MOSS report won't break. ### 4.2 `bookkeeper_bank_reconcile.csv` (30 rows) **Looks like**: two months of business checking + credit-card activity exported from a bank portal, with the Feb export accidentally overlapping the Jan export at the month boundary. **Pollution included**: - Mixed date formats: `01/15/2025`, `2025-01-15`, `Jan 18 2025`, `1/27/25`, `Feb 5 2025` - Currency formats: `-$129.99`, `($89.50)` parens-negative, `+$3,450.00`, `- $599.88` space, bare `-129.99`, `(50.00)` - Header trailing whitespace: `"Date "` - Smart quotes around descriptions: `"autopay"` - Em-dash sentinels in Vendor: `—` - Smart-em-dash inside descriptions: `STAPLES #4422 — paper, toner` - Vendor casing inconsistency: `Amazon` / `amazon.com` / `AMAZON.COM`, `Verizon` / `verizon` - 6 duplicate transactions (same date+amount+vendor recorded twice with different formats) **After running the pipeline**: 30 rows → 23, ~84 cells normalized, 7 duplicates removed (month-overlap + VAT-MOSS dups). All dates ISO-formatted, all amounts numeric (including EUR/GBP/BRL with comma decimal), vendor casing canonical, parens-negative resolved. ### 4.3 `agency_combined_leads.csv` (30 rows) **Looks like**: a marketing-ops worksheet combining lead exports from HubSpot + LinkedIn Sales Navigator + manual scraping, ready for campaign targeting. **Pollution included**: - Phone formats per region: US, UK, Spain, Germany, China, India, Australia, Mexico, Israel, Singapore, Hong Kong, Italy, South Korea — 13 country codes - Country column inconsistent: `USA` / `US` / `United States` - Disguised nulls: `N/A`, `unknown`, `(unknown)`, `(blank)`, `(none)`, `?`, `—`, `#N/A`, `TBD` - Source column tags origin (`HubSpot` / `LinkedIn` / `Manual Scrape`) - Email duplicates across sources with case variants: `alice@acme.com` + `Alice.Johnson@acme.com`, `bob@beta.com` + `Bob@Beta.com`, `diana@delta.com` from two sources, `carlos@gamma.io` from two sources, `Frank@Foxtrot.de` + `frank@foxtrot.de` - Name casing: `DIANA LEE`, `henry`, `IVY CHEN`, mixed - 6 fuzzy / cross-source duplicates designed to survive the dedup - Score column with sentinel pollution that needs coercion to integer **After running the pipeline**: 30 rows → 24, ~43 cells canonicalized, 14 sentinels resolved, 6 cross-source duplicates merged with `merge=true` so each survivor inherits the most-complete picture. Invalid-email rows (deliverability stress) and `Suppressed`/`Opted Out` tags (suppression-list use case) survive as flagged rows the operator manually reviews. ## 5. UX flow (per persona) The demo is a single Streamlit page (likely `src/gui/pages/0_Review.py` repurposed for demo mode, or a dedicated `app_demo.py` for the cloud build). ``` ┌──────────────────────────────────────────────────────────┐ │ DataTools — for {Persona} │ │ "{Persona-specific H1}" │ ├──────────────────────────────────────────────────────────┤ │ │ │ Sample dataset preloaded: shopify_pet_customers.csv │ │ [Replace with your own file (capped 100 rows)] │ │ │ │ ┌─ BEFORE preview (15 rows) ─────────────────────────┐ │ │ │ Alice | (415) 555-1234 | $1,240.50 | … │ │ │ │ Bob | 415.555.1234 | $1,240.50 | … │ │ │ │ ... │ │ │ └──────────────────────────────────────────────────┘ │ │ │ │ Pipeline (saved): │ │ 1. Text Clean → 2. Format Standardize → │ │ 3. Missing → 4. Deduplicate │ │ │ │ [▶ Run pipeline] │ │ │ │ ┌─ AFTER preview ───────────────────────────────────┐ │ │ │ 15 rows → 11 (4 duplicates merged) │ │ │ │ 27 cells canonicalized · 33 sentinels resolved │ │ │ │ │ │ │ │ Alice Johnson | +14155551234 | 1240.50 | … │ │ │ │ ... │ │ │ └──────────────────────────────────────────────────┘ │ │ │ │ [Download cleaned CSV (sample, watermarked)] │ │ │ │ ┌──────────────────────────────────────────────────┐ │ │ │ Like what you see? │ │ │ │ Run this on YOUR 50,000-row export — locally. │ │ │ │ No upload. Your data never leaves your machine. │ │ │ │ [Get DataTools — $49 →] │ │ │ └──────────────────────────────────────────────────┘ │ └──────────────────────────────────────────────────────────┘ ``` **Critical UX points**: - Sample dataset is *already loaded* on page paint. Visitor never sees an empty state. - BEFORE table is shown side-by-side with AFTER once the run completes. Hidden-character toggle on by default so the visitor *sees* what was hidden in their data. - "Replace with your own file" is a secondary action below the BEFORE table — not the headline. - Per-step metrics are shown in the AFTER block: "27 cells canonicalized, 33 sentinels resolved, 4 duplicates merged." Numbers sell more than narrative. - Buy button is **inside** the AFTER block and **above the fold** when the run completes. Friction kills. ## 6. Free vs paid boundary The demo runs the **same code** as the paid product. Caps are surface, not engine. | Limit | Free demo | Paid (downloaded) | |---|---|---| | Input rows | 100 | unlimited (1 GB+ via streaming) | | File size | 5 MB | unlimited | | Output | watermarked CSV ("DataTools demo — buy at " appended as last row) | clean CSV | | Pipeline editor | locked to the persona-saved pipeline | full edit / save / load JSON | | Save pipeline JSON | disabled | enabled | | International | enabled | enabled | | Audit log download | disabled | enabled | | Tool 06–09 | as they ship | as they ship | The watermark is a **single trailing row**, not an in-cell tag — so the demo's AFTER preview *visibly* reads as production-quality data, not "demo crippled" data. ## 7. CTA copy (per persona) ### 7.1 Shopify pet operator - **H1**: *Clean your customer / vendor / subscriber exports — locally.* - **Sub**: *Klaviyo-import-ready in 30 seconds. Catches duplicates Excel misses. Your data never leaves your computer.* - **CTA**: *Get DataTools for Shopify — $49 →* ### 7.2 Bookkeeper / freelance accountant - **H1**: *Reconcile messy bank exports. Hand your client an audit trail.* - **Sub**: *Catches the duplicate transaction Quickbooks imported twice. Standardizes dates, amounts, vendor casing. Every change auditable.* - **CTA**: *Get DataTools for Bookkeepers — $49 →* ### 7.3 Marketing / RevOps agency - **H1**: *Dedupe leads across HubSpot, LinkedIn, and manual scrapes.* - **Sub**: *International phones, country normalization, fuzzy dedup with merge — one tool, one schema, no upload.* - **CTA**: *Get DataTools for RevOps — $49 →* ## 8. Telemetry / conversion tracking Async + no-touch + free hosting limits what we can instrument. Use event-only counters, no PII: | Event | Source | Aggregate-only field | |---|---|---| | `demo.page_view` | landing page | persona tag | | `demo.run_clicked` | demo page | persona tag | | `demo.run_completed` | demo page | persona tag, rows_processed | | `demo.cta_clicked` | demo page | persona tag | | `gumroad.purchase` | Gumroad webhook | landing-page-source query param (`?from=shopify-pet`) | Conversion = `cta_clicked / run_completed`. Demo-quality issue surfaces when `run_completed / page_view` < 30 % (visitors not engaging). Self-host counters on Cloudflare Pages (free, GDPR-friendly). No Google Analytics — adds privacy banner, conflicts with the "your data never leaves your computer" message. ## 9. Maintenance plan **Recurring**: zero. The demo runs on the same engine the paid product ships, so any improvement to the engine improves the demo automatically. The pre-saved pipeline JSONs reference column names and tool names, both stable APIs. **Triggers for revisit**: | Trigger | Action | |---|---| | Streamlit Community Cloud rate-limits / sleeps too aggressively | Migrate to a $5–10/mo VPS (BUSINESS.md §9 contingency) | | Demo dataset becomes stale (e.g. all phones standardize to no-op) | Refresh with a new pollution batch — *don't change the persona* | | `run_completed / page_view < 30 %` for 4 consecutive weeks | Audit the demo: is the BEFORE preview showing the mess clearly? Is the AFTER too small to notice? | | `cta_clicked / run_completed < 5 %` for 4 consecutive weeks | The demo is impressive but the CTA isn't earning trust — revise copy + add a screenshot of the network tab showing zero outbound calls (PLAN.md §2.4) | | New tool ships (06–09) | Decide *per persona* whether to add it to that persona's saved pipeline. Not all tools belong on all personas | ## 10. Build sequence (drops into PLAN.md week 2) | Day | Action | |---|---| | 1 | Demo build of Streamlit app: 3 personas, switch via query param `?p=shopify-pet` | | 2 | Pipeline JSONs wired in; row cap + watermark applied; download button | | 3 | Deploy to Streamlit Community Cloud · 3 sub-paths or 3 separate apps | | 4 | Persona landing pages: 3 static HTML pages on Cloudflare Pages, each with iframe embed of its persona demo + CTA | | 5 | Telemetry counters wired (Cloudflare event API) · Gumroad webhook captures `?from=` | End of day 5: three URLs the operator can drop into three different niche-community threads, each performing its own conversion math. ## 11. Anti-temptations (things the demo deliberately refuses) - **No "try it on your data first" gate that requires email.** The whole point is friction-free. - **No "schedule a demo" CTA.** Locked by no-touch. - **No live chat widget.** Same. - **No A/B-test framework yet.** Single-arm copy, ship it, iterate monthly. A/B requires statistical traffic the funnel doesn't have pre-PMF. - **No watermark inside cells.** The AFTER preview must look production-quality. Watermark goes on a single trailing row that's obviously the demo signature. - **No animation / loader theatrics.** Pipeline runs in <1 s; a fake-progress bar lies about speed.