feat: Tier B operator scaffolding — bundle, copy SoT, posts, emails

Pick up and finish yesterday's cut-off Tier B pass. - build/: PyInstaller scaffold (datatools.spec + launcher.py + hook-streamlit.py + README) — folder-mode bundle, locked 127.0.0.1, per-OS recipe - marketing/COPY.md: single source of truth for every customer-facing string — landing H1/sub/CTAs, demo CTAs, email subjects, Gumroad listing, banned phrases - marketing/community-posts/: 9 drafts (3 posts × 3 niches: bookkeeper, revops, shopify-pet) — story / tip / soft-offer - marketing/emails/: 18 drafts (Gumroad delivery + 5-touch onboarding × 3 niches), per-niche segmentation guidance - docs/NEXT-STEPS.md: flip 2.2 / 2.4 / 3.1 / 3.4 to done with pointers to the new assets; add Phase 0 inventory rows - .gitignore: narrow `build/` ignore so PyInstaller spec + launcher + hooks get tracked, only generated artifacts (build/build/, build/__pycache__/, build/dist/) stay ignored Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 14:04:37 +00:00
parent 966af8ef94
commit e1f364f010
36 changed files with 1741 additions and 15 deletions
--- a/marketing/emails/README.md
+++ b/marketing/emails/README.md
@@ -0,0 +1,60 @@
+# Email sequences
+
+Per niche (`bookkeeper/`, `revops/`, `shopify-pet/`):
+
+- **`00-delivery.md`** — Day 0 Gumroad delivery email. Triggered when
+  Gumroad confirms the purchase. Job #1: get the buyer to download
+  and open the app inside the first 24h. Buyers who don't open within
+  72h refund at ~3× the rate of buyers who do.
+- **`01-day1.md`** — Day 1 nudge with a sample file matched to the
+  niche. The Day-1 email is the highest-leverage one in the
+  sequence; it converts "I bought it" into "I used it".
+- **`02-day3.md`** — Day 3 deep-dive on one specific feature the
+  niche cares about most.
+- **`03-day7.md`** — Day 7 workflow framing. "Use it every {month /
+  campaign / sync}, not as a one-off."
+- **`04-day14.md`** — Day 14 power-user tip. Surfaces a non-obvious
+  feature; converts "I use it" into "I rely on it".
+- **`05-day30.md`** — Day 30 referral / review ask.
+
+## Sender setup
+
+- **From:** `support@datatools.app` (single-sender to keep replies in
+  one inbox; don't fan out to per-niche aliases until volume warrants)
+- **Reply-To:** same — every email expects a reply pathway
+- **List provider:** Gumroad's built-in for delivery; Buttondown or
+  ConvertKit for the 5-touch sequence (Gumroad's drip is too crude
+  for niche segmentation)
+- **Segmentation:** customers self-tag at checkout (Gumroad custom
+  field "What do you do?"). Map: `bookkeeper`, `revops`,
+  `shopify-pet`, `other`. `other` gets a generic sequence (not
+  drafted yet — Tier C).
+
+## Variables
+
+All emails use these placeholders. Set them at sequence-import time,
+not per-email:
+
+- `{{first_name}}` — Gumroad provides; fall back to "there" if blank
+- `{{download_url}}` — niche-specific download URL from Gumroad
+- `{{sample_file_url}}` — niche-specific sample CSV (`samples/demo/...`)
+- `{{landing_page}}` — niche-specific landing page URL
+- `{{support_email}}` — `support@datatools.app`
+
+## Cadence and quiet rules
+
+- Don't send between 10pm-7am buyer-local-time (Buttondown supports
+  TZ-aware send; ConvertKit doesn't out of the box)
+- If the buyer replies to *any* email in the sequence, pause the
+  remaining touches until you've replied to them. A drip that
+  ignores a customer reply reads as worse than no drip.
+- If the buyer requests a refund, kill the sequence immediately.
+- Day 14 + Day 30 emails are skippable if the buyer has already
+  emailed support with a feature request or bug report — they're
+  engaged enough; don't pile on.
+
+## Subject lines
+
+Subjects are owned by `marketing/COPY.md` § 4. Don't edit subjects
+in-line in the email files; edit COPY.md and re-propagate. Same
+discipline applies to the closing CTA — owned by COPY.md § 0.
--- a/marketing/emails/bookkeeper/00-delivery.md
+++ b/marketing/emails/bookkeeper/00-delivery.md
@@ -0,0 +1,34 @@
+# Bookkeeper · Day 0 — Delivery email
+
+**Subject:** Your DataTools download (start here)
+**Send:** immediately on Gumroad purchase confirmation
+**Goal:** buyer downloads + opens the app within 24h
+
+---
+
+Hi {{first_name}},
+
+Thanks for buying DataTools. Your download:
+
+→ **{{download_url}}**
+
+Three things to do in the next 5 minutes so you don't lose this email under the next 200:
+
+**1. Download the installer for your OS** (Mac `.dmg`, Windows `.exe`, or Linux `.tar.gz`). About 280 MB. The link above auto-detects.
+
+**2. Run it.** First launch takes ~5 seconds; a browser tab opens to `127.0.0.1:8501`. That's the app — running locally on your machine, no network calls. If your browser doesn't open automatically, the terminal window shows the URL.
+
+**3. Drop in a real bank export.** Don't bother with the bundled samples — DataTools is built for messy real-world files. Pull last month's bank export from any client, drag it into the analyzer, and click "Run all". You'll see what the pipeline catches in about 20 seconds.
+
+If something doesn't work: just reply to this email. I read every reply (it goes to my own inbox, not a queue).
+
+If you want to refund: also just reply. 30-day no-questions; no form to fill out.
+
+Tomorrow I'll send a sample bank export with a few of the tricky cases pre-built in, so you can see what the gate report looks like on a known input. After that you'll get one email a week for the next month with one tip each — feel free to unsubscribe at the bottom of any of them.
+
+Welcome aboard.
+
+— Michael
+{{support_email}}
+
+P.S. If you have a bookkeeper friend who'd find this useful, the share-friendly landing page is {{landing_page}}.
--- a/marketing/emails/bookkeeper/01-day1.md
+++ b/marketing/emails/bookkeeper/01-day1.md
@@ -0,0 +1,31 @@
+# Bookkeeper · Day 1 — Try it on this messy bank export first
+
+**Subject:** Try it on this messy bank export first
+**Send:** Day 1, ~9am buyer-local-time
+**Goal:** convert "I bought it" → "I ran it on something"
+
+---
+
+Hi {{first_name}},
+
+Yesterday's email had your download. Today's email has a *file* — a sample bank export I built specifically to break things.
+
+→ **{{sample_file_url}}** (260 KB CSV, 1,400 rows of synthetic data — no real account info)
+
+It's modeled after real exports I've seen from US, UK, and Canadian banks. Hidden in there:
+
+- Mixed date formats (some `MM/DD/YYYY`, some `DD-MM-YY`, one row in `YYYY-MM-DD`)
+- Six different spellings of "Amazon" across the merchant column
+- Trailing whitespace + non-breaking spaces in the description column
+- Three obvious duplicate transactions and two non-obvious ones (different timestamps, same amount + merchant)
+- A totals row at the bottom that's not a transaction
+- One row with currency in `€` instead of `$`
+
+Drop it into DataTools, click **"Run all"** in the analyzer, and look at the gate report. It'll catch all of the above and tell you exactly what changed and why.
+
+The audit trail (a sidecar CSV called `<filename>.audit.csv`) is the part most bookkeepers are surprised by. Open it in Excel — every change has a row: original value, new value, rule that fired, timestamp. That's the file you hand to your client when they ask "wait, why did you re-classify that?".
+
+Try it once on the sample, then once on a real client export. Reply and tell me what it caught (or missed) — I'm building the v1.1 detector list from real-world feedback.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/bookkeeper/02-day3.md
+++ b/marketing/emails/bookkeeper/02-day3.md
@@ -0,0 +1,35 @@
+# Bookkeeper · Day 3 — The audit trail your client will actually open
+
+**Subject:** The audit trail your client will actually open
+**Send:** Day 3
+**Goal:** deepen feature understanding around the audit trail (the
+real differentiator vs. spreadsheet workflow)
+
+---
+
+Hi {{first_name}},
+
+Most "data cleaning" tools spit out a clean file and call it done. The thing your *client* needs — and what protects you in a year when they ask "why did you change that?" — is the audit trail.
+
+Here's the file DataTools writes alongside every cleaned export. It's a CSV called `<filename>.audit.csv` and it sits next to the cleaned file in your output folder.
+
+Five columns, append-only:
+
+| original_value | new_value | rule_applied | confidence | timestamp |
+|----------------|-----------|--------------|------------|-----------|
+| `AMZN Mktp` | `Amazon` | `merchant_canonicalize` | 0.94 | 2026-05-04T09:12:03 |
+| `  Starbucks  ` | `Starbucks` | `whitespace_strip` | 1.00 | 2026-05-04T09:12:03 |
+| `01/02/26` | `2026-02-01` | `date_normalize_dmy` | 0.88 | 2026-05-04T09:12:03 |
+
+Why this matters in a real client conversation:
+
+- **The client asks "why is this Amazon when my statement says AMZN Mktp?"** — open the audit CSV, point at the `merchant_canonicalize` row. Done in 10 seconds.
+- **A reviewer (auditor, accountant, you in 6 months) asks "what changed?"** — the audit CSV is the answer. Diffable, openable in Excel, no proprietary format.
+- **You spot a wrong rule firing** — the `confidence` column tells you which rules to tune. Anything <0.90 is worth eyeballing.
+
+One workflow change worth making: when you send the cleaned file to QuickBooks, send the audit CSV to the client at the same time, in a folder labeled "month-end audit trail". Most clients won't open it. The 10% that do will trust you forever.
+
+Reply if you want me to walk through the audit format on a call — happy to do a quick screen-share for any buyer in the first 30 days.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/bookkeeper/03-day7.md
+++ b/marketing/emails/bookkeeper/03-day7.md
@@ -0,0 +1,32 @@
+# Bookkeeper · Day 7 — One pipeline, every client, every month
+
+**Subject:** One pipeline, every client, every month
+**Send:** Day 7
+**Goal:** reframe from one-off tool to monthly workflow
+
+---
+
+Hi {{first_name}},
+
+A week in. By now you've probably run DataTools on 1-2 client exports and confirmed it does what the landing page promised.
+
+The thing buyers tell me they wish they'd done from day one: **set it up as a workflow, not a one-off.**
+
+The pattern that works:
+
+**1. Make a folder per client.** Inside each client folder, a subfolder per month: `Acme Co/2026-05/`. Drop the raw export here.
+
+**2. Save your DataTools settings as a per-client preset.** The "Save settings" button in the analyzer drops a `.datatools-preset.json` file. Stash that in the client folder. Next month, load the preset and the analyzer pre-configures with the rules you tuned for that client (e.g., your "Amazon Marketplace" canonical name, your client's specific merchant aliases).
+
+**3. Run the pipeline. Get three files back:** the cleaned CSV, the audit CSV, the gate report. Move them into `Acme Co/2026-05/cleaned/`.
+
+**4. Import the cleaned CSV to QuickBooks. Email the audit CSV to the client.**
+
+Total elapsed time per client per month, after the first: 3-5 minutes. The first month per client is longer (~15 min) because you're tuning the preset.
+
+The buyers who do this are the ones still emailing me 3 months later — usually with feature requests for the next client they want to onboard. The buyers who only ever run it ad-hoc tend to drift back to spreadsheets within 2 months.
+
+If you want, reply with a sanitized export and I'll show you what your starting preset should look like — happy to do this for the first 50 buyers.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/bookkeeper/04-day14.md
+++ b/marketing/emails/bookkeeper/04-day14.md
@@ -0,0 +1,35 @@
+# Bookkeeper · Day 14 — Two-minute trick: the gate report
+
+**Subject:** Two-minute trick: the gate report
+**Send:** Day 14
+**Goal:** surface the gate tool — non-obvious, high-value once seen
+
+---
+
+Hi {{first_name}},
+
+The tool inside DataTools that buyers find last is the **gate** — and it's the one that quietly does the most for you.
+
+What it does: before any row gets written to the cleaned CSV, the gate runs a per-row pass-through check. Rows that fail get *quarantined* into a separate file (`<filename>.quarantine.csv`) instead of silently dropped or silently passed.
+
+Default rules (you can add your own):
+
+- Missing required fields (date, amount)
+- Amount in unexpected currency without a flag
+- Date outside the export's stated range (catches the "totals row" issue from Day 1)
+- Duplicate of another row already in the file (per the dedupe pass)
+- Confidence below your threshold on a field that got auto-corrected
+
+The 2-minute workflow:
+
+1. Run the pipeline as usual.
+2. Open `<filename>.quarantine.csv`. (It'll be tiny — typically 0-5% of rows.)
+3. Eyeball it. Anything that's a real transaction, fix-and-re-include manually. Anything that's a totals row / blank row / corrupt row — confirm it's correctly quarantined and delete it.
+4. Re-run the pipeline on the fixed-up version (or just append the manually-fixed rows to the cleaned CSV).
+
+The reason this matters: silent drops are the worst possible failure mode for a bookkeeper. You'd rather a row come out wrong (you'll catch it on review) than disappear (you won't catch it for months). The gate makes the silent-drop case impossible.
+
+Set the gate's confidence threshold to `0.85` for client work. Lower (0.75) for personal / exploratory; higher (0.92+) only if you've spent time tuning your client's preset.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/bookkeeper/05-day30.md
+++ b/marketing/emails/bookkeeper/05-day30.md
@@ -0,0 +1,26 @@
+# Bookkeeper · Day 30 — Heard from a fellow bookkeeper?
+
+**Subject:** Heard from a fellow bookkeeper?
+**Send:** Day 30
+**Goal:** referral / review ask. Last touch in the sequence.
+
+---
+
+Hi {{first_name}},
+
+A month in. If DataTools earned its $49 — would you do me one (very small) favor?
+
+**Pick one of these. Whichever is easiest.**
+
+1. **Gumroad review** (60 seconds): {{download_url}}#reviews — even a single line helps the next bookkeeper trust the listing enough to click "buy".
+2. **Reply to this email with one sentence I can quote** on the bookkeeper landing page. Anonymous if you prefer; I'll never use a name without explicit permission.
+3. **Share the landing page** with one bookkeeper friend who'd benefit: {{landing_page}}. No referral commission scheme, just a link.
+
+If DataTools *didn't* earn its $49 — also reply. Tell me what's missing or what's broken. The 30-day refund window is still open and I'd rather refund a buyer who didn't get value than have an unhappy customer in the wild.
+
+Either way, this is the last automated email you'll get from me. After this you only hear from me when there's a v1.x update or if you reply to one of the previous emails.
+
+Thanks for being an early buyer — the first 50 customers shape the next 5,000.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/revops/00-delivery.md
+++ b/marketing/emails/revops/00-delivery.md
@@ -0,0 +1,34 @@
+# RevOps · Day 0 — Delivery email
+
+**Subject:** Your DataTools download (start here)
+**Send:** immediately on Gumroad purchase confirmation
+**Goal:** download + first run within 24h
+
+---
+
+Hi {{first_name}},
+
+Thanks for buying DataTools. Your download:
+
+→ **{{download_url}}**
+
+Three things to do in the next 5 minutes:
+
+**1. Download the installer for your OS** (Mac `.dmg`, Windows `.exe`, or Linux `.tar.gz`). About 280 MB. The link auto-detects.
+
+**2. Run it.** First launch takes ~5 seconds; a browser tab opens to `127.0.0.1:8501`. That's the app — running locally on your machine. No data leaves the box. (Yes, even if you're on the corporate VPN. Especially then.)
+
+**3. Drop in a real lead list.** Don't bother with the bundled samples — the gate report only gets interesting when the data is real. Pull last quarter's webform export, or your most recent Apollo / LinkedIn pull, drag it into the analyzer, and click **"Run all"**. You'll see what the dedupe + format pipeline does in about 30 seconds.
+
+If something doesn't work: just reply. I read every reply.
+
+Refund: also just reply. 30-day no-questions; no form.
+
+Tomorrow I'll send a sample 3-vendor lead list (HubSpot + LinkedIn + Apollo, synthetic data) so you can see the dedupe confidence tiers in action on a known input. After that you'll get one email a week for the next month — practical tips, no upsell. Unsubscribe at the bottom of any of them.
+
+Welcome aboard.
+
+— Michael
+{{support_email}}
+
+P.S. If you have a RevOps friend who'd find this useful: {{landing_page}}.
--- a/marketing/emails/revops/01-day1.md
+++ b/marketing/emails/revops/01-day1.md
@@ -0,0 +1,36 @@
+# RevOps · Day 1 — Try it on this 3-vendor lead list first
+
+**Subject:** Try it on this 3-vendor lead list first
+**Send:** Day 1, ~9am buyer-local-time
+
+---
+
+Hi {{first_name}},
+
+Yesterday's email had your download. Today's email has a *file* — a synthetic 3-vendor lead list (HubSpot + LinkedIn scrape + Apollo pull) that I built specifically to break naive dedupe.
+
+→ **{{sample_file_url}}** (1.2 MB CSV, 4,800 rows — fully synthetic, no real prospects)
+
+What's hidden in there:
+
+- The same person from 3 sources, with intentionally inconsistent fields:
+  - HubSpot row: full email + company; no LinkedIn URL
+  - LinkedIn row: name + title + LinkedIn URL; no email
+  - Apollo row: email + phone + company; misspelled name
+- ~120 obvious duplicates (same email, different case)
+- ~80 cross-source duplicates (different keys, same person — these are the ones HubSpot's native dedupe misses)
+- ~40 phone numbers in 5 different formats per country (+1, +44, +61)
+- One row per 200 with a hidden zero-width space in the email
+
+Drop it into DataTools, click **"Run all"** in the analyzer, then run the **dedupe** tool with the default 0.85 threshold.
+
+Look at three things in the output:
+
+1. **The cleaned CSV** — what your import would look like
+2. **The audit CSV** — every change, every rule, confidence per change
+3. **The manual-review queue** (`<filename>.review.csv`) — the 0.85-0.95 confidence range. This is where the real dedupe value is; auto-merging this range is what gets people in trouble.
+
+Try it once on the sample, then once on a real list. Reply and tell me what it caught (or missed) — the v1.1 fuzzy-matching tuning comes from real-world feedback.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/revops/02-day3.md
+++ b/marketing/emails/revops/02-day3.md
@@ -0,0 +1,36 @@
+# RevOps · Day 3 — The dedupe rule that catches LinkedIn drift
+
+**Subject:** The dedupe rule that catches LinkedIn drift
+**Send:** Day 3
+**Goal:** deepen feature understanding around the cross-source dedupe
+
+---
+
+Hi {{first_name}},
+
+The thing native HubSpot / Salesforce dedupe can't do, and the thing DataTools is actually best at: **cross-source matching**, where the same person shows up via LinkedIn, a webform, and a trade-show import — with no shared key.
+
+The rule that does the work is in the dedupe tool's **"Block by domain, fuzzy on name+title"** mode. Here's what it does:
+
+**Step 1 — Block.** Group rows by email domain. (LinkedIn rows with no email get bucketed by `domain(linkedin_url)` — usually their company website if they listed it.) This avoids the O(n²) explosion and rules out cross-company false positives.
+
+**Step 2 — Within each block, fuzzy-match on `first_name + last_name + title`.** Token-set ratio at 0.85 default. Catches:
+
+- "Sarah O'Brien, VP Marketing" = "sarah obrien, vp of marketing"
+- "Mike Chen, Head of Sales" = "Michael Chen, Sales Lead" (this one needs a 0.78 threshold; configurable)
+- "J. Smith, Director" = "Jane Smith, Director" (only with a strong company-name match)
+
+**Step 3 — Confidence-tier the merge.** ≥0.95 auto-merges. 0.85-0.95 goes to `<filename>.review.csv` for you to eyeball. <0.85 stays unmerged.
+
+**Step 4 — Field-precedence on merge.** When records merge, you choose which source wins per field. Default precedence (configurable):
+
+- `title`, `company`, `linkedin_url` → LinkedIn wins (more recent)
+- `email`, `phone` → Webform wins (verified)
+- `lifecycle_stage`, `owner` → HubSpot wins (your CRM is canonical)
+
+**One trap to avoid:** don't run dedupe before format standardization. If phone formats are inconsistent across sources, the dedupe tool sees "+14155550143" and "(415) 555-0143" as different keys. Always run **format → analyzer → dedupe → gate** in that order. The pipeline UI enforces this; the per-tool runs don't.
+
+Reply if you want me to walk through the precedence config on a screen-share — happy to do this for any buyer in the first 30 days.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/revops/03-day7.md
+++ b/marketing/emails/revops/03-day7.md
@@ -0,0 +1,34 @@
+# RevOps · Day 7 — Run it before every HubSpot import
+
+**Subject:** Run it before every HubSpot import
+**Send:** Day 7
+**Goal:** reframe from one-off tool to per-campaign workflow
+
+---
+
+Hi {{first_name}},
+
+A week in. By now you've probably run DataTools on a real list once or twice and confirmed the dedupe catches more than HubSpot's native check.
+
+The thing that turns DataTools into a per-month-cost saver instead of a one-off purchase: **make it the gate on every import.**
+
+The pattern that works:
+
+**1. One DataTools run per campaign source.** Webform pull → DataTools. LinkedIn scrape → DataTools. Apollo export → DataTools. Each run produces a "clean" CSV.
+
+**2. Concatenate the cleaned CSVs.** Standard pandas `concat` or just paste in Excel.
+
+**3. One more DataTools run on the concatenation.** This is the cross-source dedupe pass — the one that catches the same person across the three sources.
+
+**4. Compare against your current HubSpot export.** DataTools' dedupe against your existing CRM as the second source catches the people you already paid for last quarter and don't need to import again.
+
+**5. Import only the residue** — the rows that survived all four passes — into HubSpot.
+
+The buyers running this pipeline tell me they've cut their HubSpot marketing-contact bill 15-25% within two months. Not because their pipeline got smaller — because they stopped paying for duplicates.
+
+**One thing to set up once:** save your dedupe settings as a `.datatools-preset.json` and commit it to your RevOps team's repo (or a shared Drive folder). Same preset every campaign means consistent results across whoever's running it that week.
+
+If you want, reply with a sanitized lead list and I'll suggest a starting preset for your sources — happy to do this for the first 50 buyers.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/revops/04-day14.md
+++ b/marketing/emails/revops/04-day14.md
@@ -0,0 +1,34 @@
+# RevOps · Day 14 — Two-minute trick: the confidence tiers
+
+**Subject:** Two-minute trick: the confidence tiers
+**Send:** Day 14
+**Goal:** surface the manual-review queue — non-obvious, high-value
+
+---
+
+Hi {{first_name}},
+
+The single most-skipped feature in DataTools is also the one with the highest payoff per minute: the **manual-review queue**.
+
+Here's what's happening under the hood: every dedupe decision DataTools makes has a confidence score (0.0 to 1.0). The dedupe tool by default puts decisions into three buckets:
+
+- **≥0.95** → auto-merge (cleaned CSV)
+- **0.85 - 0.95** → manual-review queue (`<filename>.review.csv`)
+- **<0.85** → unmerged (kept as separate rows)
+
+The 0.85-0.95 bucket is the magic. It's the range where a tuned algorithm catches *most* duplicates but where the wrong choice is a real cost (merging two genuinely different people = lost prospect; not merging two duplicates = paid contact you didn't need).
+
+The 2-minute workflow:
+
+1. Run dedupe.
+2. Open `<filename>.review.csv`. Each row is a candidate merge with: confidence, the two records side-by-side, the rule that fired.
+3. Eyeball each row. Mark `keep_merge` (Y/N) in the rightmost column.
+4. Re-run dedupe with the `--apply-review-decisions <filename>.review.csv` flag (or click "Apply review decisions" in the GUI).
+5. Final cleaned CSV reflects your manual choices.
+
+For a 5,000-row lead list, the review queue is typically 20-60 rows. ~3 minutes of work. The output is dramatically better than auto-merge-everything-≥0.85, which is what most tools (including HubSpot's) do silently.
+
+**Pro move:** save your `keep_merge` decisions over time. After 3-4 campaigns you'll have a corpus of "yes-merges" and "no-merges" you can use to retune the auto-merge threshold for *your* data. Most teams find their sweet spot is somewhere in 0.88-0.92.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/revops/05-day30.md
+++ b/marketing/emails/revops/05-day30.md
@@ -0,0 +1,26 @@
+# RevOps · Day 30 — Heard from another RevOps lead?
+
+**Subject:** Heard from another RevOps lead?
+**Send:** Day 30
+**Goal:** referral / review ask
+
+---
+
+Hi {{first_name}},
+
+A month in. If DataTools earned its $49 — would you do me one small favor?
+
+**Pick the one that's easiest.**
+
+1. **Gumroad review** (60 seconds): {{download_url}}#reviews — every line helps the next RevOps lead trust the listing enough to click "buy".
+2. **Reply to this email with one sentence I can quote** on the RevOps landing page. Anonymous if you prefer; I'll never use a name without explicit permission.
+3. **Share the landing page** with one RevOps friend who'd benefit: {{landing_page}}. No referral commission, just a link.
+
+If DataTools *didn't* earn its $49 — also reply. Tell me what's missing or broken. The 30-day refund window is still open and I'd rather refund than have an unhappy customer in the wild.
+
+Either way, this is the last automated email you'll get from me. After this you only hear from me when there's a v1.x update or if you reply to one of the previous emails.
+
+Thanks for being an early buyer — the first 50 customers shape the next 5,000.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/shopify-pet/00-delivery.md
+++ b/marketing/emails/shopify-pet/00-delivery.md
@@ -0,0 +1,34 @@
+# Shopify-pet · Day 0 — Delivery email
+
+**Subject:** Your DataTools download (start here)
+**Send:** immediately on Gumroad purchase confirmation
+**Goal:** download + first run within 24h
+
+---
+
+Hi {{first_name}},
+
+Thanks for buying DataTools. Your download:
+
+→ **{{download_url}}**
+
+Three things to do in the next 5 minutes:
+
+**1. Download the installer for your OS** (Mac `.dmg`, Windows `.exe`, or Linux `.tar.gz`). About 280 MB. The link auto-detects.
+
+**2. Run it.** First launch takes ~5 seconds; a browser tab opens to `127.0.0.1:8501`. That's the app — running locally on your machine. No data leaves the box. Your customer list never goes to a server.
+
+**3. Drop in a real Shopify customer export.** Don't bother with the bundled samples. Customers > Export > "All customers" > CSV in Shopify admin. Drag it into DataTools' analyzer, click **"Run all"**. You'll see what it catches — typically a few hundred phone-format issues, some hidden-character emails, and a handful of cross-row duplicates — in about 30 seconds.
+
+If something doesn't work: reply to this email. Goes to my inbox.
+
+Refund: also reply. 30-day no-questions; no form.
+
+Tomorrow I'll send a sample Shopify customer export with the tricky cases pre-built in, so you can see what the cleanup catches on a known input. After that you'll get one email a week for the next month with one tip each. Unsubscribe at the bottom of any of them.
+
+Welcome aboard.
+
+— Michael
+{{support_email}}
+
+P.S. Got a fellow store owner who'd find this useful? {{landing_page}}.
--- a/marketing/emails/shopify-pet/01-day1.md
+++ b/marketing/emails/shopify-pet/01-day1.md
@@ -0,0 +1,32 @@
+# Shopify-pet · Day 1 — Try it on this Shopify customer export first
+
+**Subject:** Try it on this Shopify customer export first
+**Send:** Day 1, ~9am buyer-local-time
+
+---
+
+Hi {{first_name}},
+
+Yesterday's email had your download. Today's email has a *file* — a synthetic Shopify customer export I built specifically to break things Klaviyo silently chokes on.
+
+→ **{{sample_file_url}}** (480 KB CSV, 2,200 rows — fully synthetic, no real customer data)
+
+What's hidden in there:
+
+- Phone numbers in 6 different formats (`(415) 555-0143`, `415.555.0143`, `4155550143`, `+44 20 7946 0958` without country field, `+1-415-555-0143 ext 12`, `415 555 0143`)
+- Email addresses with embedded zero-width spaces (looks identical to a clean email; Klaviyo treats as different addresses)
+- ~80 obvious customer duplicates (same email, different case)
+- ~40 cross-row duplicates (different email, same name + same shipping address — usually the same person ordering with two emails)
+- Shipping addresses with mixed `St.` / `Street` / `St` / `STREET` for the same street name
+- 12 customers from outside North America with country field blank
+
+Drop it into DataTools. Click **"Run all"** in the analyzer. Then run **format → dedupe → text-clean → gate** in that order.
+
+Look at the **gate report** at the end — it'll tell you exactly which rows would have broken Klaviyo, with a one-line "why" per row.
+
+If you want to see the difference: import the **raw** file to a test Klaviyo list, then import the **cleaned** file to a different test list. Compare the SMS-deliverable count. The delta is what you've been losing every month.
+
+Reply and tell me what it caught (or missed) — v1.1 detector improvements come from real-world feedback.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/shopify-pet/02-day3.md
+++ b/marketing/emails/shopify-pet/02-day3.md
@@ -0,0 +1,33 @@
+# Shopify-pet · Day 3 — The phone-format step Klaviyo cares about
+
+**Subject:** The phone-format step Klaviyo cares about
+**Send:** Day 3
+**Goal:** deepen feature understanding around the format standardizer
+
+---
+
+Hi {{first_name}},
+
+The single biggest source of "Klaviyo dropped this customer silently" is phone formatting. DataTools fixes this in one tool — the **format standardizer** — but the *settings* matter.
+
+Klaviyo (and basically every modern SMS platform) wants phones in **E.164** format: `+` then country code then number, no spaces, no dashes, no extension. Like: `+14155550143`.
+
+Three settings in DataTools' format standardizer that get this right:
+
+**1. Set "Phone output format" to `E.164`.** Default is `national` (`(415) 555-0143`) — fine for display, broken for Klaviyo. Change it once; the preset remembers.
+
+**2. Set "Default country" per row, not per file.** This is the non-obvious one. For each customer:
+- If the `country` field has a value (e.g., "Canada", "CA", "Canadá"), use it.
+- If blank, fall back to the country in the *shipping address*.
+- If still blank, fall back to the file-level default (you set this — typically your store's primary market).
+
+DataTools does this automatically when you check "Use per-row country detection". *Skip this and ~30% of international customers will end up with US country codes prepended to their numbers — which Klaviyo accepts but routes wrong, and your SMS never arrives.*
+
+**3. Set "Quarantine un-parseable phones" to ON.** Don't drop them silently; don't pass them to Klaviyo broken. Send them to `<filename>.quarantine.csv` so you can fix the worst 10-20 by hand and re-include them.
+
+The combination — E.164 + per-row country + quarantine — typically takes a Shopify export from "60-70% of phones survive Klaviyo's import" to "97-99%". On a 10,000-customer list, that's 2,500 - 3,500 more customers reachable per campaign.
+
+Reply if you want me to walk through these settings on a screen-share — happy to do this for any buyer in the first 30 days.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/shopify-pet/03-day7.md
+++ b/marketing/emails/shopify-pet/03-day7.md
@@ -0,0 +1,35 @@
+# Shopify-pet · Day 7 — Run it before every Klaviyo sync
+
+**Subject:** Run it before every Klaviyo sync
+**Send:** Day 7
+**Goal:** reframe from one-off tool to per-sync workflow
+
+---
+
+Hi {{first_name}},
+
+A week in. By now you've probably run DataTools on a real customer export once or twice and seen the cleanup catch things you'd been losing in Klaviyo for months.
+
+The thing that turns DataTools into a recurring win instead of a one-off purchase: **run it before every sync, not just the first time.**
+
+The pattern that works for most stores:
+
+**1. Pick a cadence.** Most stores I talk to do this monthly; high-volume stores do it weekly. The cadence should match your "I'm planning a campaign" rhythm.
+
+**2. The Sunday-morning ritual:**
+- Pull a fresh customer export from Shopify (Customers > Export > "All customers")
+- Drop into DataTools
+- Run the pipeline (analyzer → format → text-clean → dedupe → gate)
+- Review the gate quarantine file (typically 0.5-2% of rows)
+- Push the cleaned CSV to Klaviyo (their CSV import or via their API)
+
+**3. Save your settings as a preset.** The "Save settings" button writes a `.datatools-preset.json`. Keep it in your store's Drive / Notion / wherever your shop docs live. Next month, load preset, run pipeline, done in 4 minutes.
+
+**4. After 3 months, retune the preset.** Look at your manual-review queue across the 3 runs. If you're consistently approving 0.86-confidence merges, drop the auto-merge threshold to 0.85. If you're rejecting 0.92 merges, raise it to 0.94. The preset improves with use.
+
+The store owners doing this monthly tell me their open rates go up 8-15% in the first 90 days — not from new content, just from the email actually reaching the inbox.
+
+If you want, reply with a sanitized export and I'll suggest a starting preset for your store — happy to do this for the first 50 buyers.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/shopify-pet/04-day14.md
+++ b/marketing/emails/shopify-pet/04-day14.md
@@ -0,0 +1,32 @@
+# Shopify-pet · Day 14 — Two-minute trick: hidden-character cleanup
+
+**Subject:** Two-minute trick: hidden-character cleanup
+**Send:** Day 14
+**Goal:** surface the text cleaner — non-obvious, high-value
+
+---
+
+Hi {{first_name}},
+
+The tool inside DataTools that buyers find last is the **text cleaner** — and on Shopify customer exports it's usually the one with the most "wait, that was a problem?" moments.
+
+What it catches: invisible characters that got into your customer data when customers typed on their phones. The most common offenders:
+
+- **Zero-width space** (`U+200B`) inside emails — Klaviyo treats `sarah@acme.com` (with hidden char) and `sarah@acme.com` (without) as different addresses
+- **Non-breaking space** (`U+00A0`) inside addresses — Shopify accepts it, Klaviyo accepts it, but USPS address validation fails on it
+- **BOM marker** (`U+FEFF`) at the start of CSV cells — usually from a customer pasting from Word or a PDF
+- **Right-to-left mark** (`U+200F`) — rare, but appears in customer names from Hebrew/Arabic locales
+
+The 2-minute workflow:
+
+1. After the format standardizer pass, run the text cleaner.
+2. It produces an additional sidecar file: `<filename>.hidden-chars.csv` — every cell where it found a hidden char, with a "what was hidden where" annotation.
+3. Skim it. Most are fine to silently strip (zero-width spaces, BOMs). For rare ones (right-to-left marks in a name), confirm before stripping — sometimes they're load-bearing.
+4. Click "Apply cleanup". The text cleaner replaces the hidden chars in the cleaned CSV.
+
+The reason this matters: **dedupe runs after text-clean.** Two emails with a hidden char difference look identical in the GUI but get treated as two separate customers — and your dedupe pass won't catch them unless the text cleaner ran first.
+
+The pipeline order baked into the GUI is: `analyzer → format → text-clean → dedupe → gate`. Stick to it; per-tool runs out of order are the most common source of "wait, why didn't dedupe catch this?".
+
+— Michael
+{{support_email}}
--- a/marketing/emails/shopify-pet/05-day30.md
+++ b/marketing/emails/shopify-pet/05-day30.md
@@ -0,0 +1,26 @@
+# Shopify-pet · Day 30 — Heard from another store owner?
+
+**Subject:** Heard from another store owner?
+**Send:** Day 30
+**Goal:** referral / review ask
+
+---
+
+Hi {{first_name}},
+
+A month in. If DataTools earned its $49 — would you do me one small favor?
+
+**Pick the one that's easiest.**
+
+1. **Gumroad review** (60 seconds): {{download_url}}#reviews — every line helps the next Shopify owner trust the listing enough to click "buy".
+2. **Reply to this email with one sentence I can quote** on the landing page. Anonymous if you prefer; I'll never use a name without explicit permission.
+3. **Share the landing page** with one fellow store owner who'd benefit: {{landing_page}}. No referral commission, just a link.
+
+If DataTools *didn't* earn its $49 — also reply. Tell me what's missing or broken. The 30-day refund window is still open and I'd rather refund than have an unhappy customer in the wild.
+
+Either way, this is the last automated email you'll get from me. After this you only hear from me when there's a v1.x update or if you reply to one of the previous emails.
+
+Thanks for being an early buyer — the first 50 customers shape the next 5,000.
+
+— Michael
+{{support_email}}