demo: reconstruct sales demos for an accounting audience

Replaces the Shopify / RevOps / Bookkeeper demo trio with three accounting
personas that share one buyer, each entering through a workflow where a
messy export costs money — all running the same saved 4-step pipeline:

- bank_reconciliation.csv (Bookkeeper): 26 -> 20 rows, 6 double-posted
  transactions caught after date+amount standardization.
- vendor_1099.csv (AP / 1099): 24 records -> 8 vendors, 7 missing EINs
  recovered via dedup merge — the 1099-complete story.
- ar_open_invoices.csv (AR): 26 -> 21 rows, 5 double-entered invoices
  removed, blank status backfilled from the twin row.

Every number is validated against the live engine and pinned by
tests/test_demo_pipelines.py (read path mirrors app_demo._load_demo:
dtype=str, keep_default_na=False). Rewires src/gui/app_demo.py PERSONAS
(keys bookkeeper / ap-1099 / ar-aging, accounting H1/sub/CTA) and rewrites
docs/DEMO-PLAN.md sections 3/4/7 with the validated outcomes.

(Repo hygiene forced by a partial-clone gap: finalizes the already-deleted,
unreferenced samples/messy_text.csv whose blob was unrecoverable.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-22 18:52:39 +00:00
parent 38616d69e2
commit 6df726e69e
16 changed files with 404 additions and 441 deletions

View File

@@ -32,17 +32,22 @@ rebuilds it from a stale headline.
| Friction kills conversion | BUSINESS.md §7 | Demo dataset preloaded; no "select a file" first-step |
| < $1,200/mo recurring | BUSINESS.md §9 | Migration plan to $5/mo VPS only after rate-limit signal |
## 3. The three personas (per PLAN.md §2.3)
## 3. The three personas — one audience: accounting (per PLAN.md §2.3)
We niche to **accounting** and enter through the three workflows where a
messy export costs real money. Same engine, three landing pages — each
is the same buyer at a different desk (bookkeeping, payables, receivables).
| Tag | Persona | Top-of-funnel keyword | Demo dataset | Pre-saved pipeline |
|---|---|---|---|---|
| `shopify-pet` | Shopify operator (priority: pet supplies) | "shopify customer cleanup" | `samples/demo/shopify_pet_customers.csv` | `shopify_pet_pipeline.json` |
| `bookkeeper` | Bookkeeper / freelance accountant | "reconcile bank export csv" | `samples/demo/bookkeeper_bank_reconcile.csv` | `bookkeeper_bank_pipeline.json` |
| `revops` | Marketing / RevOps agency | "dedupe lead list across vendors" | `samples/demo/agency_combined_leads.csv` | `agency_leads_pipeline.json` |
| `bookkeeper` | Bookkeeper — bank reconciliation | "reconcile bank export csv duplicates" | `samples/demo/bank_reconciliation.csv` | `bank_reconciliation_pipeline.json` |
| `ap-1099` | Accounts payable — 1099 vendor prep | "clean 1099 vendor list missing EIN" | `samples/demo/vendor_1099.csv` | `vendor_1099_pipeline.json` |
| `ar-aging` | Accounts receivable — open invoices | "remove duplicate invoices aging report" | `samples/demo/ar_open_invoices.csv` | `ar_open_invoices_pipeline.json` |
Each persona gets its **own landing page URL**, its **own demo dataset
loaded by default**, and its **own H1 + below-the-fold copy.** The
engine is identical; only positioning differs.
Each persona gets its **own landing page URL** (`?p=<tag>`), its **own
demo dataset loaded by default**, and its **own H1 + below-the-fold
copy** — wired in `src/gui/app_demo.py::PERSONAS`. The engine is
identical; only positioning differs.
## 4. Demo dataset specifications
@@ -53,114 +58,77 @@ persona's tooling. Each contains every kind of pollution the bundle's
five tools fix, so a single demo run shows every tool earning its
keep.
### 4.0 Pain-point coverage map
### 4.0 Value-proof map
Each demo dataset is engineered so the buyer sees their **own top
pain** demonstrated in the AFTER preview. The mapping below pairs
each pain from PLAN.md §2.3a with the rows / columns that exercise
it. Refresh the dataset only when this coverage drops.
Each demo dataset is engineered so the buyer sees their **own top pain**
fixed in the AFTER preview, with one unmistakable headline number. All
three run the same saved 4-step pipeline (Clean Text → Standardize
Formats → Fix Missing Values → Find Duplicates). The numbers below are
**validated against the live engine** (`tests/test_demo_pipelines.py`
pins them) — refresh the dataset only if a number stops landing.
| Persona | Pain (from PLAN §2.3a) | Demo coverage |
| Persona | Headline proof | What the visitor watches happen |
|---|---|---|
| Shopify pet | S1 — Klaviyo per-contact dupes | 5 dup pairs across rows 115 (case + format + address-twin variants) |
| Shopify pet | S2 — feed-rejection chars | smart-quote / NBSP / BOM in rows 16, 9, 11 |
| Shopify pet | S3 — multi-channel | partner-style customer IDs (`SHOP-`); demonstration of column-level mapping covered in RevOps demo |
| Shopify pet | S4 — subscription identity | rows 1+2, 7+8, 9+10 — same person, different format |
| Shopify pet | S5 — VAT-MOSS country drift | rows 1618 (`United Kingdom` / `U.K.` / `UK`) + rows 1920 (`Germany`/`Italia`) |
| Bookkeeper | B1 — month-overlap re-import | 7 dup pairs spanning Jan↔Feb and Mar boundaries |
| Bookkeeper | B2 — 1099 vendor consolidation | Amazon × 3 spellings, Verizon × 2, Acme Realty × 2, Adobe × 2, Costco × 2, Zoom × 2, Stripe × 4 |
| Bookkeeper | B3 — audit trail | every cell change in the run logged with old/new/rule — surface in the demo's audit tab |
| Bookkeeper | B4 — per-license economics | demonstrated by pricing copy, not data |
| Bookkeeper | B5 — multi-currency | rows 26 (EUR), 27 (GBP), 28 (BRL with comma decimal), 29 (parens-negative) |
| RevOps | R1 — per-contact tier | 6 cross-source dup pairs (HubSpot × LinkedIn × Manual Scrape) |
| RevOps | R2 — deliverability | rows 2627 (`uma at uniform dot com`, `victor@@victorco.com` invalid emails) |
| RevOps | R3 — GDPR / privacy | demonstrated by the network-tab moat panel + zero-upload claim |
| RevOps | R4 — vendor unification | 3 source values (HubSpot / LinkedIn / Manual Scrape), 13 country codes, mixed-shape headers |
| RevOps | R5 — suppression list | rows 2930 (`Suppressed`, `Opted Out` tags) |
| Bookkeeper | **26 → 20 rows · 6 phantom duplicates removed** | The same payment posted twice (different date + amount format) collapses to one; dates go ISO, parens-negatives become real negatives |
| AP / 1099 | **24 records → 8 vendors · 7 missing EINs recovered** | Each vendor's scattered records merge into one complete row; `merge=true` backfills the EIN/address/phone that any single record was missing |
| AR aging | **26 → 21 rows · 5 double-entered invoices removed** | Duplicate invoice numbers collapse; a blank status is backfilled from its twin; invoice + due dates go ISO, amounts numeric |
### 4.1 `shopify_pet_customers.csv` (20 rows)
### 4.1 `bank_reconciliation.csv` (26 rows) — Bookkeeper
**Looks like**: a Shopify customer export filtered for "Pet Supplies"
sales channel, 12 months activity.
**Looks like**: two months (Jan + Feb 2025) of business-checking activity
from a bank portal, where the Feb re-export overlaps Jan so the same
transaction posts twice. Columns: `Date, Description, Vendor, Category,
Amount, Account`.
**Pollution included**:
- Whitespace padding (" Alice ", "Sydney Opera House Drive ")
- Mixed phone formats: `(415) 555-1234`, `415.555.1234`, `5559876543`,
`+1 555-111-1111`
- International phones: GB, ES, DE, AU, JP (15 demo rows span 6
countries)
- Currency variants: `$1,240.50`, `£890.25`, `€2.410,75` (EU comma
decimal), `A$ 1,299.00`, `¥75000`
- Date formats: `2025-12-04`, `12/15/2025`, `?`, `(blank)`, `(none)`,
`#N/A`
- Disguised nulls: `N/A`, blank, `(blank)`, `?`, `#N/A`, `(none)`,
`unknown`
- Name casing: `EVE MARTINEZ`, `henry`, `O'NEIL`, `noah`, mixed Title /
ALL CAPS / lower
- Email case variants that *should* dedup: `Bob@PetShop.com` vs
`alice@petshop.com`
- 4 fuzzy duplicates (Alice/Bob same address, Grace/Henry same phone,
Carlos/Olivia same address, Ivy/Jack same address)
- Mixed date formats: `01/15/2025`, `2025-01-15`, `Jan 18 2025`, `1/27/25`, `Feb 5 2025`.
- Currency formats incl. negatives: `-$129.99`, `($89.50)` parens-negative, `+$3,450.00`, `- $599.88`, bare `-129.99`, `(50.00)`.
- Whitespace + NBSP padding; smart quotes and an em-dash inside descriptions.
- Vendor casing variety on *non-duplicate* rows: `Amazon` / `amazon.com` / `AMAZON.COM`, `Verizon` / `verizon`.
- Disguised nulls in Category: `—`, `(blank)`, `?`, `unknown`, `TBD`.
- **6 duplicate transactions** — each pair shares the same vendor + real value but a different date *and* amount format, so they collapse only after standardization.
**After running the pipeline**: 20 rows → 15, ~29 cells canonicalized,
~45 sentinels standardised, 5 cross-row duplicates merged. The
customer table is now Klaviyo-import-ready and the country column
(previously `UK` / `U.K.` / `United Kingdom` / `Germany` / `Italia`)
is GB / DE / IT — VAT MOSS report won't break.
**After running the pipeline** (validated): **26 → 20 rows, 6 duplicates
removed**, 36 date/amount cells standardized (0 unparseable), all dates
ISO, parens-negatives resolved (`($89.50)``-89.50`), disguised-null
categories flagged. The reconciliation ties out.
### 4.2 `bookkeeper_bank_reconcile.csv` (30 rows)
### 4.2 `vendor_1099.csv` (24 rows) — Accounts payable / 1099
**Looks like**: two months of business checking + credit-card activity
exported from a bank portal, with the Feb export accidentally
overlapping the Jan export at the month boundary.
**Looks like**: a 1099-NEC vendor master list where the same vendor was
entered 23 times across the year by different staff, each record holding
only *part* of the vendor's details. Columns: `Vendor, Contact, Email,
Phone, EIN, Address, Total_Paid`.
**Pollution included**:
- Mixed date formats: `01/15/2025`, `2025-01-15`, `Jan 18 2025`,
`1/27/25`, `Feb 5 2025`
- Currency formats: `-$129.99`, `($89.50)` parens-negative,
`+$3,450.00`, `- $599.88` space, bare `-129.99`, `(50.00)`
- Header trailing whitespace: `"Date "`
- Smart quotes around descriptions: `"autopay"`
- Em-dash sentinels in Vendor: `—`
- Smart-em-dash inside descriptions: `STAPLES #4422 — paper, toner`
- Vendor casing inconsistency: `Amazon` / `amazon.com` / `AMAZON.COM`,
`Verizon` / `verizon`
- 6 duplicate transactions (same date+amount+vendor recorded twice
with different formats)
- The duplicate records for a vendor share one email differing only by case/whitespace (the reliable dedup key, matched with the `email` normalizer).
- EIN / Phone / Address scattered across the duplicate set so no single record is complete but the union is — gaps marked `—`, `(blank)`, `TBD`, `unknown`, `N/A`.
- Vendor name casing/spelling variants, phone formats, EIN formats (`12-3456789` vs `123456789`), `Total_Paid` currency variants.
**After running the pipeline**: 30 rows → 23, ~84 cells normalized, 7
duplicates removed (month-overlap + VAT-MOSS dups). All dates
ISO-formatted, all amounts numeric (including EUR/GBP/BRL with comma
decimal), vendor casing canonical, parens-negative resolved.
**After running the pipeline** (validated): **24 records → 8 vendors, 16
duplicates removed, 7 missing EINs recovered** by `merge=true` +
`most_complete` survivor, 35 disguised nulls caught, phones/emails/amounts
standardized (0 unparseable). One vendor genuinely has no EIN in any
record — it survives with a blank EIN as the realistic "flag for
follow-up" case.
### 4.3 `agency_combined_leads.csv` (30 rows)
### 4.3 `ar_open_invoices.csv` (26 rows) — Accounts receivable
**Looks like**: a marketing-ops worksheet combining lead exports from
HubSpot + LinkedIn Sales Navigator + manual scraping, ready for
campaign targeting.
**Looks like**: an open-invoices (unpaid AR) export where some invoices
were double-entered in different formats and client contacts are messy.
Columns: `Invoice, Client, Email, Invoice_Date, Due_Date, Amount, Status`.
**Pollution included**:
- Phone formats per region: US, UK, Spain, Germany, China, India,
Australia, Mexico, Israel, Singapore, Hong Kong, Italy, South
Korea — 13 country codes
- Country column inconsistent: `USA` / `US` / `United States`
- Disguised nulls: `N/A`, `unknown`, `(unknown)`, `(blank)`, `(none)`,
`?`, `—`, `#N/A`, `TBD`
- Source column tags origin (`HubSpot` / `LinkedIn` / `Manual Scrape`)
- Email duplicates across sources with case variants: `alice@acme.com`
+ `Alice.Johnson@acme.com`, `bob@beta.com` + `Bob@Beta.com`,
`diana@delta.com` from two sources, `carlos@gamma.io` from two
sources, `Frank@Foxtrot.de` + `frank@foxtrot.de`
- Name casing: `DIANA LEE`, `henry`, `IVY CHEN`, mixed
- 6 fuzzy / cross-source duplicates designed to survive the dedup
- Score column with sentinel pollution that needs coercion to integer
- Two date columns with mixed formats; currency variants incl. a credit memo `($300.00)``-300.00`.
- Client name casing variety; email case variants (`AP@Acme.com` vs `ap@acme.com`).
- Status disguised nulls: `—`, `?`, `(blank)`, `TBD`, `unknown`, `(none)`.
- **5 double-entered invoices** — same invoice number twice, dates/amount in different formats, one copy with a blank status the other fills.
**After running the pipeline**: 30 rows → 24, ~43 cells canonicalized,
14 sentinels resolved, 6 cross-source duplicates merged with `merge=true`
so each survivor inherits the most-complete picture. Invalid-email
rows (deliverability stress) and `Suppressed`/`Opted Out` tags
(suppression-list use case) survive as flagged rows the operator
manually reviews.
**After running the pipeline** (validated): **26 → 21 rows, 5 duplicate
invoices removed**, both date columns ISO + amounts numeric + emails
lowercased (0 unparseable), 7 disguised-null statuses caught, and a blank
status backfilled from its twin via `merge=true`. The aging report stops
double-counting.
## 5. UX flow (per persona)
@@ -174,26 +142,26 @@ dedicated `app_demo.py` for the cloud build).
│ "{Persona-specific H1}" │
├──────────────────────────────────────────────────────────┤
│ │
│ Sample dataset preloaded: shopify_pet_customers.csv │
│ Sample dataset preloaded: bank_reconciliation.csv
│ [Replace with your own file (capped 100 rows)] │
│ │
│ ┌─ BEFORE preview (15 rows) ─────────────────────────┐ │
│ │ Alice | (415) 555-1234 | $1,240.50 | … │ │
│ │ Bob | 415.555.1234 | $1,240.50 | … │ │
│ ┌─ BEFORE preview (26 rows) ─────────────────────────┐ │
│ │ 01/15/2025 | Stripe | +$3,450.00 | … │ │
│ │ 2025-01-15 | Stripe | 3450.00 | … (dup) │ │
│ │ ... │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ Pipeline (saved): │
│ 1. Text Clean → 2. Format Standardize → │
│ 3. Missing → 4. Deduplicate
│ 1. Clean Text → 2. Standardize Formats → │
│ 3. Fix Missing → 4. Find Duplicates
│ │
│ [▶ Run pipeline] │
│ │
│ ┌─ AFTER preview ───────────────────────────────────┐ │
│ │ 15 rows → 11 (4 duplicates merged) │ │
│ │ 27 cells canonicalized · 33 sentinels resolved │ │
│ │ 26 rows → 20 (6 duplicate transactions removed) │ │
│ │ 36 cells standardized · 4 disguised nulls flagged │ │
│ │ │ │
│ │ Alice Johnson | +14155551234 | 1240.50 | … │ │
│ │ 2025-01-15 | Stripe | 3450.00 | … │ │
│ │ ... │ │
│ └──────────────────────────────────────────────────┘ │
│ │
@@ -244,27 +212,35 @@ not "demo crippled" data.
## 7. CTA copy (per persona)
### 7.1 Shopify pet operator
Copy lives in `src/gui/app_demo.py::PERSONAS` (H1 / sub / CTA per tag);
keep this section in sync with that dict.
- **H1**: *Clean your customer / vendor / subscriber exports — locally.*
- **Sub**: *Klaviyo-import-ready in 30 seconds. Catches duplicates Excel
misses. Your data never leaves your computer.*
- **CTA**: *Get DataTools for Shopify — $49 →*
### 7.1 Bookkeeper — bank reconciliation (`?p=bookkeeper`)
### 7.2 Bookkeeper / freelance accountant
- **H1**: *Reconcile messy bank exports. Hand your client an audit
trail.*
- **Sub**: *Catches the duplicate transaction Quickbooks imported twice.
Standardizes dates, amounts, vendor casing. Every change auditable.*
- **H1**: *Catch the transactions your bank export posted twice. Locally.*
- **Sub**: *When the Jan and Feb exports overlap, the same payment posts
twice in two formats. DataTools standardizes every date and amount, then
dedups on the real transaction so your reconciliation ties out — 26 rows
→ 20, six phantom duplicates gone.*
- **CTA**: *Get DataTools for Bookkeepers — $49 →*
### 7.3 Marketing / RevOps agency
### 7.2 Accounts payable — 1099 prep (`?p=ap-1099`)
- **H1**: *Dedupe leads across HubSpot, LinkedIn, and manual scrapes.*
- **Sub**: *International phones, country normalization, fuzzy dedup
with merge — one tool, one schema, no upload.*
- **CTA**: *Get DataTools for RevOps — $49 →*
- **H1**: *Build a clean 1099 vendor list — with the missing EINs filled in.*
- **Sub**: *The same vendor entered three times, each record holding only
part of the details. DataTools consolidates to one row and backfills the
gaps from the duplicates — 24 records → 8 vendors, 7 missing EINs
recovered.*
- **CTA**: *Get DataTools for Accounting — $49 →*
### 7.3 Accounts receivable — open invoices (`?p=ar-aging`)
- **H1**: *Stop chasing the invoices your aging report counted twice. Locally.*
- **Sub**: *Double-entered invoices inflate your AR aging and your
follow-ups. DataTools standardizes dates and amounts, lowercases client
emails, and removes the duplicate invoice numbers — 26 rows → 21, five
phantom invoices off the books.*
- **CTA**: *Get DataTools for Accounting — $49 →*
## 8. Telemetry / conversion tracking