demo: reconstruct sales demos for an accounting audience
Replaces the Shopify / RevOps / Bookkeeper demo trio with three accounting personas that share one buyer, each entering through a workflow where a messy export costs money — all running the same saved 4-step pipeline: - bank_reconciliation.csv (Bookkeeper): 26 -> 20 rows, 6 double-posted transactions caught after date+amount standardization. - vendor_1099.csv (AP / 1099): 24 records -> 8 vendors, 7 missing EINs recovered via dedup merge — the 1099-complete story. - ar_open_invoices.csv (AR): 26 -> 21 rows, 5 double-entered invoices removed, blank status backfilled from the twin row. Every number is validated against the live engine and pinned by tests/test_demo_pipelines.py (read path mirrors app_demo._load_demo: dtype=str, keep_default_na=False). Rewires src/gui/app_demo.py PERSONAS (keys bookkeeper / ap-1099 / ar-aging, accounting H1/sub/CTA) and rewrites docs/DEMO-PLAN.md sections 3/4/7 with the validated outcomes. (Repo hygiene forced by a partial-clone gap: finalizes the already-deleted, unreferenced samples/messy_text.csv whose blob was unrecoverable.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -32,17 +32,22 @@ rebuilds it from a stale headline.
|
||||
| Friction kills conversion | BUSINESS.md §7 | Demo dataset preloaded; no "select a file" first-step |
|
||||
| < $1,200/mo recurring | BUSINESS.md §9 | Migration plan to $5/mo VPS only after rate-limit signal |
|
||||
|
||||
## 3. The three personas (per PLAN.md §2.3)
|
||||
## 3. The three personas — one audience: accounting (per PLAN.md §2.3)
|
||||
|
||||
We niche to **accounting** and enter through the three workflows where a
|
||||
messy export costs real money. Same engine, three landing pages — each
|
||||
is the same buyer at a different desk (bookkeeping, payables, receivables).
|
||||
|
||||
| Tag | Persona | Top-of-funnel keyword | Demo dataset | Pre-saved pipeline |
|
||||
|---|---|---|---|---|
|
||||
| `shopify-pet` | Shopify operator (priority: pet supplies) | "shopify customer cleanup" | `samples/demo/shopify_pet_customers.csv` | `shopify_pet_pipeline.json` |
|
||||
| `bookkeeper` | Bookkeeper / freelance accountant | "reconcile bank export csv" | `samples/demo/bookkeeper_bank_reconcile.csv` | `bookkeeper_bank_pipeline.json` |
|
||||
| `revops` | Marketing / RevOps agency | "dedupe lead list across vendors" | `samples/demo/agency_combined_leads.csv` | `agency_leads_pipeline.json` |
|
||||
| `bookkeeper` | Bookkeeper — bank reconciliation | "reconcile bank export csv duplicates" | `samples/demo/bank_reconciliation.csv` | `bank_reconciliation_pipeline.json` |
|
||||
| `ap-1099` | Accounts payable — 1099 vendor prep | "clean 1099 vendor list missing EIN" | `samples/demo/vendor_1099.csv` | `vendor_1099_pipeline.json` |
|
||||
| `ar-aging` | Accounts receivable — open invoices | "remove duplicate invoices aging report" | `samples/demo/ar_open_invoices.csv` | `ar_open_invoices_pipeline.json` |
|
||||
|
||||
Each persona gets its **own landing page URL**, its **own demo dataset
|
||||
loaded by default**, and its **own H1 + below-the-fold copy.** The
|
||||
engine is identical; only positioning differs.
|
||||
Each persona gets its **own landing page URL** (`?p=<tag>`), its **own
|
||||
demo dataset loaded by default**, and its **own H1 + below-the-fold
|
||||
copy** — wired in `src/gui/app_demo.py::PERSONAS`. The engine is
|
||||
identical; only positioning differs.
|
||||
|
||||
## 4. Demo dataset specifications
|
||||
|
||||
@@ -53,114 +58,77 @@ persona's tooling. Each contains every kind of pollution the bundle's
|
||||
five tools fix, so a single demo run shows every tool earning its
|
||||
keep.
|
||||
|
||||
### 4.0 Pain-point coverage map
|
||||
### 4.0 Value-proof map
|
||||
|
||||
Each demo dataset is engineered so the buyer sees their **own top
|
||||
pain** demonstrated in the AFTER preview. The mapping below pairs
|
||||
each pain from PLAN.md §2.3a with the rows / columns that exercise
|
||||
it. Refresh the dataset only when this coverage drops.
|
||||
Each demo dataset is engineered so the buyer sees their **own top pain**
|
||||
fixed in the AFTER preview, with one unmistakable headline number. All
|
||||
three run the same saved 4-step pipeline (Clean Text → Standardize
|
||||
Formats → Fix Missing Values → Find Duplicates). The numbers below are
|
||||
**validated against the live engine** (`tests/test_demo_pipelines.py`
|
||||
pins them) — refresh the dataset only if a number stops landing.
|
||||
|
||||
| Persona | Pain (from PLAN §2.3a) | Demo coverage |
|
||||
| Persona | Headline proof | What the visitor watches happen |
|
||||
|---|---|---|
|
||||
| Shopify pet | S1 — Klaviyo per-contact dupes | 5 dup pairs across rows 1–15 (case + format + address-twin variants) |
|
||||
| Shopify pet | S2 — feed-rejection chars | smart-quote / NBSP / BOM in rows 1–6, 9, 11 |
|
||||
| Shopify pet | S3 — multi-channel | partner-style customer IDs (`SHOP-`); demonstration of column-level mapping covered in RevOps demo |
|
||||
| Shopify pet | S4 — subscription identity | rows 1+2, 7+8, 9+10 — same person, different format |
|
||||
| Shopify pet | S5 — VAT-MOSS country drift | rows 16–18 (`United Kingdom` / `U.K.` / `UK`) + rows 19–20 (`Germany`/`Italia`) |
|
||||
| Bookkeeper | B1 — month-overlap re-import | 7 dup pairs spanning Jan↔Feb and Mar boundaries |
|
||||
| Bookkeeper | B2 — 1099 vendor consolidation | Amazon × 3 spellings, Verizon × 2, Acme Realty × 2, Adobe × 2, Costco × 2, Zoom × 2, Stripe × 4 |
|
||||
| Bookkeeper | B3 — audit trail | every cell change in the run logged with old/new/rule — surface in the demo's audit tab |
|
||||
| Bookkeeper | B4 — per-license economics | demonstrated by pricing copy, not data |
|
||||
| Bookkeeper | B5 — multi-currency | rows 26 (EUR), 27 (GBP), 28 (BRL with comma decimal), 29 (parens-negative) |
|
||||
| RevOps | R1 — per-contact tier | 6 cross-source dup pairs (HubSpot × LinkedIn × Manual Scrape) |
|
||||
| RevOps | R2 — deliverability | rows 26–27 (`uma at uniform dot com`, `victor@@victorco.com` invalid emails) |
|
||||
| RevOps | R3 — GDPR / privacy | demonstrated by the network-tab moat panel + zero-upload claim |
|
||||
| RevOps | R4 — vendor unification | 3 source values (HubSpot / LinkedIn / Manual Scrape), 13 country codes, mixed-shape headers |
|
||||
| RevOps | R5 — suppression list | rows 29–30 (`Suppressed`, `Opted Out` tags) |
|
||||
| Bookkeeper | **26 → 20 rows · 6 phantom duplicates removed** | The same payment posted twice (different date + amount format) collapses to one; dates go ISO, parens-negatives become real negatives |
|
||||
| AP / 1099 | **24 records → 8 vendors · 7 missing EINs recovered** | Each vendor's scattered records merge into one complete row; `merge=true` backfills the EIN/address/phone that any single record was missing |
|
||||
| AR aging | **26 → 21 rows · 5 double-entered invoices removed** | Duplicate invoice numbers collapse; a blank status is backfilled from its twin; invoice + due dates go ISO, amounts numeric |
|
||||
|
||||
### 4.1 `shopify_pet_customers.csv` (20 rows)
|
||||
### 4.1 `bank_reconciliation.csv` (26 rows) — Bookkeeper
|
||||
|
||||
**Looks like**: a Shopify customer export filtered for "Pet Supplies"
|
||||
sales channel, 12 months activity.
|
||||
**Looks like**: two months (Jan + Feb 2025) of business-checking activity
|
||||
from a bank portal, where the Feb re-export overlaps Jan so the same
|
||||
transaction posts twice. Columns: `Date, Description, Vendor, Category,
|
||||
Amount, Account`.
|
||||
|
||||
**Pollution included**:
|
||||
- Whitespace padding (" Alice ", "Sydney Opera House Drive ")
|
||||
- Mixed phone formats: `(415) 555-1234`, `415.555.1234`, `5559876543`,
|
||||
`+1 555-111-1111`
|
||||
- International phones: GB, ES, DE, AU, JP (15 demo rows span 6
|
||||
countries)
|
||||
- Currency variants: `$1,240.50`, `£890.25`, `€2.410,75` (EU comma
|
||||
decimal), `A$ 1,299.00`, `¥75000`
|
||||
- Date formats: `2025-12-04`, `12/15/2025`, `?`, `(blank)`, `(none)`,
|
||||
`#N/A`
|
||||
- Disguised nulls: `N/A`, blank, `(blank)`, `?`, `#N/A`, `(none)`,
|
||||
`unknown`
|
||||
- Name casing: `EVE MARTINEZ`, `henry`, `O'NEIL`, `noah`, mixed Title /
|
||||
ALL CAPS / lower
|
||||
- Email case variants that *should* dedup: `Bob@PetShop.com` vs
|
||||
`alice@petshop.com`
|
||||
- 4 fuzzy duplicates (Alice/Bob same address, Grace/Henry same phone,
|
||||
Carlos/Olivia same address, Ivy/Jack same address)
|
||||
- Mixed date formats: `01/15/2025`, `2025-01-15`, `Jan 18 2025`, `1/27/25`, `Feb 5 2025`.
|
||||
- Currency formats incl. negatives: `-$129.99`, `($89.50)` parens-negative, `+$3,450.00`, `- $599.88`, bare `-129.99`, `(50.00)`.
|
||||
- Whitespace + NBSP padding; smart quotes and an em-dash inside descriptions.
|
||||
- Vendor casing variety on *non-duplicate* rows: `Amazon` / `amazon.com` / `AMAZON.COM`, `Verizon` / `verizon`.
|
||||
- Disguised nulls in Category: `—`, `(blank)`, `?`, `unknown`, `TBD`.
|
||||
- **6 duplicate transactions** — each pair shares the same vendor + real value but a different date *and* amount format, so they collapse only after standardization.
|
||||
|
||||
**After running the pipeline**: 20 rows → 15, ~29 cells canonicalized,
|
||||
~45 sentinels standardised, 5 cross-row duplicates merged. The
|
||||
customer table is now Klaviyo-import-ready and the country column
|
||||
(previously `UK` / `U.K.` / `United Kingdom` / `Germany` / `Italia`)
|
||||
is GB / DE / IT — VAT MOSS report won't break.
|
||||
**After running the pipeline** (validated): **26 → 20 rows, 6 duplicates
|
||||
removed**, 36 date/amount cells standardized (0 unparseable), all dates
|
||||
ISO, parens-negatives resolved (`($89.50)` → `-89.50`), disguised-null
|
||||
categories flagged. The reconciliation ties out.
|
||||
|
||||
### 4.2 `bookkeeper_bank_reconcile.csv` (30 rows)
|
||||
### 4.2 `vendor_1099.csv` (24 rows) — Accounts payable / 1099
|
||||
|
||||
**Looks like**: two months of business checking + credit-card activity
|
||||
exported from a bank portal, with the Feb export accidentally
|
||||
overlapping the Jan export at the month boundary.
|
||||
**Looks like**: a 1099-NEC vendor master list where the same vendor was
|
||||
entered 2–3 times across the year by different staff, each record holding
|
||||
only *part* of the vendor's details. Columns: `Vendor, Contact, Email,
|
||||
Phone, EIN, Address, Total_Paid`.
|
||||
|
||||
**Pollution included**:
|
||||
- Mixed date formats: `01/15/2025`, `2025-01-15`, `Jan 18 2025`,
|
||||
`1/27/25`, `Feb 5 2025`
|
||||
- Currency formats: `-$129.99`, `($89.50)` parens-negative,
|
||||
`+$3,450.00`, `- $599.88` space, bare `-129.99`, `(50.00)`
|
||||
- Header trailing whitespace: `"Date "`
|
||||
- Smart quotes around descriptions: `"autopay"`
|
||||
- Em-dash sentinels in Vendor: `—`
|
||||
- Smart-em-dash inside descriptions: `STAPLES #4422 — paper, toner`
|
||||
- Vendor casing inconsistency: `Amazon` / `amazon.com` / `AMAZON.COM`,
|
||||
`Verizon` / `verizon`
|
||||
- 6 duplicate transactions (same date+amount+vendor recorded twice
|
||||
with different formats)
|
||||
- The duplicate records for a vendor share one email differing only by case/whitespace (the reliable dedup key, matched with the `email` normalizer).
|
||||
- EIN / Phone / Address scattered across the duplicate set so no single record is complete but the union is — gaps marked `—`, `(blank)`, `TBD`, `unknown`, `N/A`.
|
||||
- Vendor name casing/spelling variants, phone formats, EIN formats (`12-3456789` vs `123456789`), `Total_Paid` currency variants.
|
||||
|
||||
**After running the pipeline**: 30 rows → 23, ~84 cells normalized, 7
|
||||
duplicates removed (month-overlap + VAT-MOSS dups). All dates
|
||||
ISO-formatted, all amounts numeric (including EUR/GBP/BRL with comma
|
||||
decimal), vendor casing canonical, parens-negative resolved.
|
||||
**After running the pipeline** (validated): **24 records → 8 vendors, 16
|
||||
duplicates removed, 7 missing EINs recovered** by `merge=true` +
|
||||
`most_complete` survivor, 35 disguised nulls caught, phones/emails/amounts
|
||||
standardized (0 unparseable). One vendor genuinely has no EIN in any
|
||||
record — it survives with a blank EIN as the realistic "flag for
|
||||
follow-up" case.
|
||||
|
||||
### 4.3 `agency_combined_leads.csv` (30 rows)
|
||||
### 4.3 `ar_open_invoices.csv` (26 rows) — Accounts receivable
|
||||
|
||||
**Looks like**: a marketing-ops worksheet combining lead exports from
|
||||
HubSpot + LinkedIn Sales Navigator + manual scraping, ready for
|
||||
campaign targeting.
|
||||
**Looks like**: an open-invoices (unpaid AR) export where some invoices
|
||||
were double-entered in different formats and client contacts are messy.
|
||||
Columns: `Invoice, Client, Email, Invoice_Date, Due_Date, Amount, Status`.
|
||||
|
||||
**Pollution included**:
|
||||
- Phone formats per region: US, UK, Spain, Germany, China, India,
|
||||
Australia, Mexico, Israel, Singapore, Hong Kong, Italy, South
|
||||
Korea — 13 country codes
|
||||
- Country column inconsistent: `USA` / `US` / `United States`
|
||||
- Disguised nulls: `N/A`, `unknown`, `(unknown)`, `(blank)`, `(none)`,
|
||||
`?`, `—`, `#N/A`, `TBD`
|
||||
- Source column tags origin (`HubSpot` / `LinkedIn` / `Manual Scrape`)
|
||||
- Email duplicates across sources with case variants: `alice@acme.com`
|
||||
+ `Alice.Johnson@acme.com`, `bob@beta.com` + `Bob@Beta.com`,
|
||||
`diana@delta.com` from two sources, `carlos@gamma.io` from two
|
||||
sources, `Frank@Foxtrot.de` + `frank@foxtrot.de`
|
||||
- Name casing: `DIANA LEE`, `henry`, `IVY CHEN`, mixed
|
||||
- 6 fuzzy / cross-source duplicates designed to survive the dedup
|
||||
- Score column with sentinel pollution that needs coercion to integer
|
||||
- Two date columns with mixed formats; currency variants incl. a credit memo `($300.00)` → `-300.00`.
|
||||
- Client name casing variety; email case variants (`AP@Acme.com` vs `ap@acme.com`).
|
||||
- Status disguised nulls: `—`, `?`, `(blank)`, `TBD`, `unknown`, `(none)`.
|
||||
- **5 double-entered invoices** — same invoice number twice, dates/amount in different formats, one copy with a blank status the other fills.
|
||||
|
||||
**After running the pipeline**: 30 rows → 24, ~43 cells canonicalized,
|
||||
14 sentinels resolved, 6 cross-source duplicates merged with `merge=true`
|
||||
so each survivor inherits the most-complete picture. Invalid-email
|
||||
rows (deliverability stress) and `Suppressed`/`Opted Out` tags
|
||||
(suppression-list use case) survive as flagged rows the operator
|
||||
manually reviews.
|
||||
**After running the pipeline** (validated): **26 → 21 rows, 5 duplicate
|
||||
invoices removed**, both date columns ISO + amounts numeric + emails
|
||||
lowercased (0 unparseable), 7 disguised-null statuses caught, and a blank
|
||||
status backfilled from its twin via `merge=true`. The aging report stops
|
||||
double-counting.
|
||||
|
||||
## 5. UX flow (per persona)
|
||||
|
||||
@@ -174,26 +142,26 @@ dedicated `app_demo.py` for the cloud build).
|
||||
│ "{Persona-specific H1}" │
|
||||
├──────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Sample dataset preloaded: shopify_pet_customers.csv │
|
||||
│ Sample dataset preloaded: bank_reconciliation.csv │
|
||||
│ [Replace with your own file (capped 100 rows)] │
|
||||
│ │
|
||||
│ ┌─ BEFORE preview (15 rows) ─────────────────────────┐ │
|
||||
│ │ Alice | (415) 555-1234 | $1,240.50 | … │ │
|
||||
│ │ Bob | 415.555.1234 | $1,240.50 | … │ │
|
||||
│ ┌─ BEFORE preview (26 rows) ─────────────────────────┐ │
|
||||
│ │ 01/15/2025 | Stripe | +$3,450.00 | … │ │
|
||||
│ │ 2025-01-15 | Stripe | 3450.00 | … (dup) │ │
|
||||
│ │ ... │ │
|
||||
│ └──────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ Pipeline (saved): │
|
||||
│ 1. Text Clean → 2. Format Standardize → │
|
||||
│ 3. Missing → 4. Deduplicate │
|
||||
│ 1. Clean Text → 2. Standardize Formats → │
|
||||
│ 3. Fix Missing → 4. Find Duplicates │
|
||||
│ │
|
||||
│ [▶ Run pipeline] │
|
||||
│ │
|
||||
│ ┌─ AFTER preview ───────────────────────────────────┐ │
|
||||
│ │ 15 rows → 11 (4 duplicates merged) │ │
|
||||
│ │ 27 cells canonicalized · 33 sentinels resolved │ │
|
||||
│ │ 26 rows → 20 (6 duplicate transactions removed) │ │
|
||||
│ │ 36 cells standardized · 4 disguised nulls flagged │ │
|
||||
│ │ │ │
|
||||
│ │ Alice Johnson | +14155551234 | 1240.50 | … │ │
|
||||
│ │ 2025-01-15 | Stripe | 3450.00 | … │ │
|
||||
│ │ ... │ │
|
||||
│ └──────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
@@ -244,27 +212,35 @@ not "demo crippled" data.
|
||||
|
||||
## 7. CTA copy (per persona)
|
||||
|
||||
### 7.1 Shopify pet operator
|
||||
Copy lives in `src/gui/app_demo.py::PERSONAS` (H1 / sub / CTA per tag);
|
||||
keep this section in sync with that dict.
|
||||
|
||||
- **H1**: *Clean your customer / vendor / subscriber exports — locally.*
|
||||
- **Sub**: *Klaviyo-import-ready in 30 seconds. Catches duplicates Excel
|
||||
misses. Your data never leaves your computer.*
|
||||
- **CTA**: *Get DataTools for Shopify — $49 →*
|
||||
### 7.1 Bookkeeper — bank reconciliation (`?p=bookkeeper`)
|
||||
|
||||
### 7.2 Bookkeeper / freelance accountant
|
||||
|
||||
- **H1**: *Reconcile messy bank exports. Hand your client an audit
|
||||
trail.*
|
||||
- **Sub**: *Catches the duplicate transaction Quickbooks imported twice.
|
||||
Standardizes dates, amounts, vendor casing. Every change auditable.*
|
||||
- **H1**: *Catch the transactions your bank export posted twice. Locally.*
|
||||
- **Sub**: *When the Jan and Feb exports overlap, the same payment posts
|
||||
twice in two formats. DataTools standardizes every date and amount, then
|
||||
dedups on the real transaction so your reconciliation ties out — 26 rows
|
||||
→ 20, six phantom duplicates gone.*
|
||||
- **CTA**: *Get DataTools for Bookkeepers — $49 →*
|
||||
|
||||
### 7.3 Marketing / RevOps agency
|
||||
### 7.2 Accounts payable — 1099 prep (`?p=ap-1099`)
|
||||
|
||||
- **H1**: *Dedupe leads across HubSpot, LinkedIn, and manual scrapes.*
|
||||
- **Sub**: *International phones, country normalization, fuzzy dedup
|
||||
with merge — one tool, one schema, no upload.*
|
||||
- **CTA**: *Get DataTools for RevOps — $49 →*
|
||||
- **H1**: *Build a clean 1099 vendor list — with the missing EINs filled in.*
|
||||
- **Sub**: *The same vendor entered three times, each record holding only
|
||||
part of the details. DataTools consolidates to one row and backfills the
|
||||
gaps from the duplicates — 24 records → 8 vendors, 7 missing EINs
|
||||
recovered.*
|
||||
- **CTA**: *Get DataTools for Accounting — $49 →*
|
||||
|
||||
### 7.3 Accounts receivable — open invoices (`?p=ar-aging`)
|
||||
|
||||
- **H1**: *Stop chasing the invoices your aging report counted twice. Locally.*
|
||||
- **Sub**: *Double-entered invoices inflate your AR aging and your
|
||||
follow-ups. DataTools standardizes dates and amounts, lowercases client
|
||||
emails, and removes the duplicate invoice numbers — 26 rows → 21, five
|
||||
phantom invoices off the books.*
|
||||
- **CTA**: *Get DataTools for Accounting — $49 →*
|
||||
|
||||
## 8. Telemetry / conversion tracking
|
||||
|
||||
|
||||
Reference in New Issue
Block a user