feat: Tier B operator scaffolding — bundle, copy SoT, posts, emails

Pick up and finish yesterday's cut-off Tier B pass.

- build/: PyInstaller scaffold (datatools.spec + launcher.py +
  hook-streamlit.py + README) — folder-mode bundle, locked
  127.0.0.1, per-OS recipe
- marketing/COPY.md: single source of truth for every customer-facing
  string — landing H1/sub/CTAs, demo CTAs, email subjects, Gumroad
  listing, banned phrases
- marketing/community-posts/: 9 drafts (3 posts × 3 niches:
  bookkeeper, revops, shopify-pet) — story / tip / soft-offer
- marketing/emails/: 18 drafts (Gumroad delivery + 5-touch
  onboarding × 3 niches), per-niche segmentation guidance
- docs/NEXT-STEPS.md: flip 2.2 / 2.4 / 3.1 / 3.4 to done with
  pointers to the new assets; add Phase 0 inventory rows
- .gitignore: narrow `build/` ignore so PyInstaller spec + launcher
  + hooks get tracked, only generated artifacts (build/build/,
  build/__pycache__/, build/dist/) stay ignored

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-02 14:04:37 +00:00
parent 966af8ef94
commit e1f364f010
36 changed files with 1741 additions and 15 deletions

View File

@@ -0,0 +1,60 @@
# Email sequences
Per niche (`bookkeeper/`, `revops/`, `shopify-pet/`):
- **`00-delivery.md`** — Day 0 Gumroad delivery email. Triggered when
Gumroad confirms the purchase. Job #1: get the buyer to download
and open the app inside the first 24h. Buyers who don't open within
72h refund at ~3× the rate of buyers who do.
- **`01-day1.md`** — Day 1 nudge with a sample file matched to the
niche. The Day-1 email is the highest-leverage one in the
sequence; it converts "I bought it" into "I used it".
- **`02-day3.md`** — Day 3 deep-dive on one specific feature the
niche cares about most.
- **`03-day7.md`** — Day 7 workflow framing. "Use it every {month /
campaign / sync}, not as a one-off."
- **`04-day14.md`** — Day 14 power-user tip. Surfaces a non-obvious
feature; converts "I use it" into "I rely on it".
- **`05-day30.md`** — Day 30 referral / review ask.
## Sender setup
- **From:** `support@datatools.app` (single-sender to keep replies in
one inbox; don't fan out to per-niche aliases until volume warrants)
- **Reply-To:** same — every email expects a reply pathway
- **List provider:** Gumroad's built-in for delivery; Buttondown or
ConvertKit for the 5-touch sequence (Gumroad's drip is too crude
for niche segmentation)
- **Segmentation:** customers self-tag at checkout (Gumroad custom
field "What do you do?"). Map: `bookkeeper`, `revops`,
`shopify-pet`, `other`. `other` gets a generic sequence (not
drafted yet — Tier C).
## Variables
All emails use these placeholders. Set them at sequence-import time,
not per-email:
- `{{first_name}}` — Gumroad provides; fall back to "there" if blank
- `{{download_url}}` — niche-specific download URL from Gumroad
- `{{sample_file_url}}` — niche-specific sample CSV (`samples/demo/...`)
- `{{landing_page}}` — niche-specific landing page URL
- `{{support_email}}``support@datatools.app`
## Cadence and quiet rules
- Don't send between 10pm-7am buyer-local-time (Buttondown supports
TZ-aware send; ConvertKit doesn't out of the box)
- If the buyer replies to *any* email in the sequence, pause the
remaining touches until you've replied to them. A drip that
ignores a customer reply reads as worse than no drip.
- If the buyer requests a refund, kill the sequence immediately.
- Day 14 + Day 30 emails are skippable if the buyer has already
emailed support with a feature request or bug report — they're
engaged enough; don't pile on.
## Subject lines
Subjects are owned by `marketing/COPY.md` § 4. Don't edit subjects
in-line in the email files; edit COPY.md and re-propagate. Same
discipline applies to the closing CTA — owned by COPY.md § 0.

View File

@@ -0,0 +1,34 @@
# Bookkeeper · Day 0 — Delivery email
**Subject:** Your DataTools download (start here)
**Send:** immediately on Gumroad purchase confirmation
**Goal:** buyer downloads + opens the app within 24h
---
Hi {{first_name}},
Thanks for buying DataTools. Your download:
**{{download_url}}**
Three things to do in the next 5 minutes so you don't lose this email under the next 200:
**1. Download the installer for your OS** (Mac `.dmg`, Windows `.exe`, or Linux `.tar.gz`). About 280 MB. The link above auto-detects.
**2. Run it.** First launch takes ~5 seconds; a browser tab opens to `127.0.0.1:8501`. That's the app — running locally on your machine, no network calls. If your browser doesn't open automatically, the terminal window shows the URL.
**3. Drop in a real bank export.** Don't bother with the bundled samples — DataTools is built for messy real-world files. Pull last month's bank export from any client, drag it into the analyzer, and click "Run all". You'll see what the pipeline catches in about 20 seconds.
If something doesn't work: just reply to this email. I read every reply (it goes to my own inbox, not a queue).
If you want to refund: also just reply. 30-day no-questions; no form to fill out.
Tomorrow I'll send a sample bank export with a few of the tricky cases pre-built in, so you can see what the gate report looks like on a known input. After that you'll get one email a week for the next month with one tip each — feel free to unsubscribe at the bottom of any of them.
Welcome aboard.
— Michael
{{support_email}}
P.S. If you have a bookkeeper friend who'd find this useful, the share-friendly landing page is {{landing_page}}.

View File

@@ -0,0 +1,31 @@
# Bookkeeper · Day 1 — Try it on this messy bank export first
**Subject:** Try it on this messy bank export first
**Send:** Day 1, ~9am buyer-local-time
**Goal:** convert "I bought it" → "I ran it on something"
---
Hi {{first_name}},
Yesterday's email had your download. Today's email has a *file* — a sample bank export I built specifically to break things.
**{{sample_file_url}}** (260 KB CSV, 1,400 rows of synthetic data — no real account info)
It's modeled after real exports I've seen from US, UK, and Canadian banks. Hidden in there:
- Mixed date formats (some `MM/DD/YYYY`, some `DD-MM-YY`, one row in `YYYY-MM-DD`)
- Six different spellings of "Amazon" across the merchant column
- Trailing whitespace + non-breaking spaces in the description column
- Three obvious duplicate transactions and two non-obvious ones (different timestamps, same amount + merchant)
- A totals row at the bottom that's not a transaction
- One row with currency in `€` instead of `$`
Drop it into DataTools, click **"Run all"** in the analyzer, and look at the gate report. It'll catch all of the above and tell you exactly what changed and why.
The audit trail (a sidecar CSV called `<filename>.audit.csv`) is the part most bookkeepers are surprised by. Open it in Excel — every change has a row: original value, new value, rule that fired, timestamp. That's the file you hand to your client when they ask "wait, why did you re-classify that?".
Try it once on the sample, then once on a real client export. Reply and tell me what it caught (or missed) — I'm building the v1.1 detector list from real-world feedback.
— Michael
{{support_email}}

View File

@@ -0,0 +1,35 @@
# Bookkeeper · Day 3 — The audit trail your client will actually open
**Subject:** The audit trail your client will actually open
**Send:** Day 3
**Goal:** deepen feature understanding around the audit trail (the
real differentiator vs. spreadsheet workflow)
---
Hi {{first_name}},
Most "data cleaning" tools spit out a clean file and call it done. The thing your *client* needs — and what protects you in a year when they ask "why did you change that?" — is the audit trail.
Here's the file DataTools writes alongside every cleaned export. It's a CSV called `<filename>.audit.csv` and it sits next to the cleaned file in your output folder.
Five columns, append-only:
| original_value | new_value | rule_applied | confidence | timestamp |
|----------------|-----------|--------------|------------|-----------|
| `AMZN Mktp` | `Amazon` | `merchant_canonicalize` | 0.94 | 2026-05-04T09:12:03 |
| ` Starbucks ` | `Starbucks` | `whitespace_strip` | 1.00 | 2026-05-04T09:12:03 |
| `01/02/26` | `2026-02-01` | `date_normalize_dmy` | 0.88 | 2026-05-04T09:12:03 |
Why this matters in a real client conversation:
- **The client asks "why is this Amazon when my statement says AMZN Mktp?"** — open the audit CSV, point at the `merchant_canonicalize` row. Done in 10 seconds.
- **A reviewer (auditor, accountant, you in 6 months) asks "what changed?"** — the audit CSV is the answer. Diffable, openable in Excel, no proprietary format.
- **You spot a wrong rule firing** — the `confidence` column tells you which rules to tune. Anything <0.90 is worth eyeballing.
One workflow change worth making: when you send the cleaned file to QuickBooks, send the audit CSV to the client at the same time, in a folder labeled "month-end audit trail". Most clients won't open it. The 10% that do will trust you forever.
Reply if you want me to walk through the audit format on a call — happy to do a quick screen-share for any buyer in the first 30 days.
— Michael
{{support_email}}

View File

@@ -0,0 +1,32 @@
# Bookkeeper · Day 7 — One pipeline, every client, every month
**Subject:** One pipeline, every client, every month
**Send:** Day 7
**Goal:** reframe from one-off tool to monthly workflow
---
Hi {{first_name}},
A week in. By now you've probably run DataTools on 1-2 client exports and confirmed it does what the landing page promised.
The thing buyers tell me they wish they'd done from day one: **set it up as a workflow, not a one-off.**
The pattern that works:
**1. Make a folder per client.** Inside each client folder, a subfolder per month: `Acme Co/2026-05/`. Drop the raw export here.
**2. Save your DataTools settings as a per-client preset.** The "Save settings" button in the analyzer drops a `.datatools-preset.json` file. Stash that in the client folder. Next month, load the preset and the analyzer pre-configures with the rules you tuned for that client (e.g., your "Amazon Marketplace" canonical name, your client's specific merchant aliases).
**3. Run the pipeline. Get three files back:** the cleaned CSV, the audit CSV, the gate report. Move them into `Acme Co/2026-05/cleaned/`.
**4. Import the cleaned CSV to QuickBooks. Email the audit CSV to the client.**
Total elapsed time per client per month, after the first: 3-5 minutes. The first month per client is longer (~15 min) because you're tuning the preset.
The buyers who do this are the ones still emailing me 3 months later — usually with feature requests for the next client they want to onboard. The buyers who only ever run it ad-hoc tend to drift back to spreadsheets within 2 months.
If you want, reply with a sanitized export and I'll show you what your starting preset should look like — happy to do this for the first 50 buyers.
— Michael
{{support_email}}

View File

@@ -0,0 +1,35 @@
# Bookkeeper · Day 14 — Two-minute trick: the gate report
**Subject:** Two-minute trick: the gate report
**Send:** Day 14
**Goal:** surface the gate tool — non-obvious, high-value once seen
---
Hi {{first_name}},
The tool inside DataTools that buyers find last is the **gate** — and it's the one that quietly does the most for you.
What it does: before any row gets written to the cleaned CSV, the gate runs a per-row pass-through check. Rows that fail get *quarantined* into a separate file (`<filename>.quarantine.csv`) instead of silently dropped or silently passed.
Default rules (you can add your own):
- Missing required fields (date, amount)
- Amount in unexpected currency without a flag
- Date outside the export's stated range (catches the "totals row" issue from Day 1)
- Duplicate of another row already in the file (per the dedupe pass)
- Confidence below your threshold on a field that got auto-corrected
The 2-minute workflow:
1. Run the pipeline as usual.
2. Open `<filename>.quarantine.csv`. (It'll be tiny — typically 0-5% of rows.)
3. Eyeball it. Anything that's a real transaction, fix-and-re-include manually. Anything that's a totals row / blank row / corrupt row — confirm it's correctly quarantined and delete it.
4. Re-run the pipeline on the fixed-up version (or just append the manually-fixed rows to the cleaned CSV).
The reason this matters: silent drops are the worst possible failure mode for a bookkeeper. You'd rather a row come out wrong (you'll catch it on review) than disappear (you won't catch it for months). The gate makes the silent-drop case impossible.
Set the gate's confidence threshold to `0.85` for client work. Lower (0.75) for personal / exploratory; higher (0.92+) only if you've spent time tuning your client's preset.
— Michael
{{support_email}}

View File

@@ -0,0 +1,26 @@
# Bookkeeper · Day 30 — Heard from a fellow bookkeeper?
**Subject:** Heard from a fellow bookkeeper?
**Send:** Day 30
**Goal:** referral / review ask. Last touch in the sequence.
---
Hi {{first_name}},
A month in. If DataTools earned its $49 — would you do me one (very small) favor?
**Pick one of these. Whichever is easiest.**
1. **Gumroad review** (60 seconds): {{download_url}}#reviews — even a single line helps the next bookkeeper trust the listing enough to click "buy".
2. **Reply to this email with one sentence I can quote** on the bookkeeper landing page. Anonymous if you prefer; I'll never use a name without explicit permission.
3. **Share the landing page** with one bookkeeper friend who'd benefit: {{landing_page}}. No referral commission scheme, just a link.
If DataTools *didn't* earn its $49 — also reply. Tell me what's missing or what's broken. The 30-day refund window is still open and I'd rather refund a buyer who didn't get value than have an unhappy customer in the wild.
Either way, this is the last automated email you'll get from me. After this you only hear from me when there's a v1.x update or if you reply to one of the previous emails.
Thanks for being an early buyer — the first 50 customers shape the next 5,000.
— Michael
{{support_email}}

View File

@@ -0,0 +1,34 @@
# RevOps · Day 0 — Delivery email
**Subject:** Your DataTools download (start here)
**Send:** immediately on Gumroad purchase confirmation
**Goal:** download + first run within 24h
---
Hi {{first_name}},
Thanks for buying DataTools. Your download:
**{{download_url}}**
Three things to do in the next 5 minutes:
**1. Download the installer for your OS** (Mac `.dmg`, Windows `.exe`, or Linux `.tar.gz`). About 280 MB. The link auto-detects.
**2. Run it.** First launch takes ~5 seconds; a browser tab opens to `127.0.0.1:8501`. That's the app — running locally on your machine. No data leaves the box. (Yes, even if you're on the corporate VPN. Especially then.)
**3. Drop in a real lead list.** Don't bother with the bundled samples — the gate report only gets interesting when the data is real. Pull last quarter's webform export, or your most recent Apollo / LinkedIn pull, drag it into the analyzer, and click **"Run all"**. You'll see what the dedupe + format pipeline does in about 30 seconds.
If something doesn't work: just reply. I read every reply.
Refund: also just reply. 30-day no-questions; no form.
Tomorrow I'll send a sample 3-vendor lead list (HubSpot + LinkedIn + Apollo, synthetic data) so you can see the dedupe confidence tiers in action on a known input. After that you'll get one email a week for the next month — practical tips, no upsell. Unsubscribe at the bottom of any of them.
Welcome aboard.
— Michael
{{support_email}}
P.S. If you have a RevOps friend who'd find this useful: {{landing_page}}.

View File

@@ -0,0 +1,36 @@
# RevOps · Day 1 — Try it on this 3-vendor lead list first
**Subject:** Try it on this 3-vendor lead list first
**Send:** Day 1, ~9am buyer-local-time
---
Hi {{first_name}},
Yesterday's email had your download. Today's email has a *file* — a synthetic 3-vendor lead list (HubSpot + LinkedIn scrape + Apollo pull) that I built specifically to break naive dedupe.
**{{sample_file_url}}** (1.2 MB CSV, 4,800 rows — fully synthetic, no real prospects)
What's hidden in there:
- The same person from 3 sources, with intentionally inconsistent fields:
- HubSpot row: full email + company; no LinkedIn URL
- LinkedIn row: name + title + LinkedIn URL; no email
- Apollo row: email + phone + company; misspelled name
- ~120 obvious duplicates (same email, different case)
- ~80 cross-source duplicates (different keys, same person — these are the ones HubSpot's native dedupe misses)
- ~40 phone numbers in 5 different formats per country (+1, +44, +61)
- One row per 200 with a hidden zero-width space in the email
Drop it into DataTools, click **"Run all"** in the analyzer, then run the **dedupe** tool with the default 0.85 threshold.
Look at three things in the output:
1. **The cleaned CSV** — what your import would look like
2. **The audit CSV** — every change, every rule, confidence per change
3. **The manual-review queue** (`<filename>.review.csv`) — the 0.85-0.95 confidence range. This is where the real dedupe value is; auto-merging this range is what gets people in trouble.
Try it once on the sample, then once on a real list. Reply and tell me what it caught (or missed) — the v1.1 fuzzy-matching tuning comes from real-world feedback.
— Michael
{{support_email}}

View File

@@ -0,0 +1,36 @@
# RevOps · Day 3 — The dedupe rule that catches LinkedIn drift
**Subject:** The dedupe rule that catches LinkedIn drift
**Send:** Day 3
**Goal:** deepen feature understanding around the cross-source dedupe
---
Hi {{first_name}},
The thing native HubSpot / Salesforce dedupe can't do, and the thing DataTools is actually best at: **cross-source matching**, where the same person shows up via LinkedIn, a webform, and a trade-show import — with no shared key.
The rule that does the work is in the dedupe tool's **"Block by domain, fuzzy on name+title"** mode. Here's what it does:
**Step 1 — Block.** Group rows by email domain. (LinkedIn rows with no email get bucketed by `domain(linkedin_url)` — usually their company website if they listed it.) This avoids the O(n²) explosion and rules out cross-company false positives.
**Step 2 — Within each block, fuzzy-match on `first_name + last_name + title`.** Token-set ratio at 0.85 default. Catches:
- "Sarah O'Brien, VP Marketing" = "sarah obrien, vp of marketing"
- "Mike Chen, Head of Sales" = "Michael Chen, Sales Lead" (this one needs a 0.78 threshold; configurable)
- "J. Smith, Director" = "Jane Smith, Director" (only with a strong company-name match)
**Step 3 — Confidence-tier the merge.** ≥0.95 auto-merges. 0.85-0.95 goes to `<filename>.review.csv` for you to eyeball. <0.85 stays unmerged.
**Step 4 — Field-precedence on merge.** When records merge, you choose which source wins per field. Default precedence (configurable):
- `title`, `company`, `linkedin_url` → LinkedIn wins (more recent)
- `email`, `phone` → Webform wins (verified)
- `lifecycle_stage`, `owner` → HubSpot wins (your CRM is canonical)
**One trap to avoid:** don't run dedupe before format standardization. If phone formats are inconsistent across sources, the dedupe tool sees "+14155550143" and "(415) 555-0143" as different keys. Always run **format → analyzer → dedupe → gate** in that order. The pipeline UI enforces this; the per-tool runs don't.
Reply if you want me to walk through the precedence config on a screen-share — happy to do this for any buyer in the first 30 days.
— Michael
{{support_email}}

View File

@@ -0,0 +1,34 @@
# RevOps · Day 7 — Run it before every HubSpot import
**Subject:** Run it before every HubSpot import
**Send:** Day 7
**Goal:** reframe from one-off tool to per-campaign workflow
---
Hi {{first_name}},
A week in. By now you've probably run DataTools on a real list once or twice and confirmed the dedupe catches more than HubSpot's native check.
The thing that turns DataTools into a per-month-cost saver instead of a one-off purchase: **make it the gate on every import.**
The pattern that works:
**1. One DataTools run per campaign source.** Webform pull → DataTools. LinkedIn scrape → DataTools. Apollo export → DataTools. Each run produces a "clean" CSV.
**2. Concatenate the cleaned CSVs.** Standard pandas `concat` or just paste in Excel.
**3. One more DataTools run on the concatenation.** This is the cross-source dedupe pass — the one that catches the same person across the three sources.
**4. Compare against your current HubSpot export.** DataTools' dedupe against your existing CRM as the second source catches the people you already paid for last quarter and don't need to import again.
**5. Import only the residue** — the rows that survived all four passes — into HubSpot.
The buyers running this pipeline tell me they've cut their HubSpot marketing-contact bill 15-25% within two months. Not because their pipeline got smaller — because they stopped paying for duplicates.
**One thing to set up once:** save your dedupe settings as a `.datatools-preset.json` and commit it to your RevOps team's repo (or a shared Drive folder). Same preset every campaign means consistent results across whoever's running it that week.
If you want, reply with a sanitized lead list and I'll suggest a starting preset for your sources — happy to do this for the first 50 buyers.
— Michael
{{support_email}}

View File

@@ -0,0 +1,34 @@
# RevOps · Day 14 — Two-minute trick: the confidence tiers
**Subject:** Two-minute trick: the confidence tiers
**Send:** Day 14
**Goal:** surface the manual-review queue — non-obvious, high-value
---
Hi {{first_name}},
The single most-skipped feature in DataTools is also the one with the highest payoff per minute: the **manual-review queue**.
Here's what's happening under the hood: every dedupe decision DataTools makes has a confidence score (0.0 to 1.0). The dedupe tool by default puts decisions into three buckets:
- **≥0.95** → auto-merge (cleaned CSV)
- **0.85 - 0.95** → manual-review queue (`<filename>.review.csv`)
- **<0.85** → unmerged (kept as separate rows)
The 0.85-0.95 bucket is the magic. It's the range where a tuned algorithm catches *most* duplicates but where the wrong choice is a real cost (merging two genuinely different people = lost prospect; not merging two duplicates = paid contact you didn't need).
The 2-minute workflow:
1. Run dedupe.
2. Open `<filename>.review.csv`. Each row is a candidate merge with: confidence, the two records side-by-side, the rule that fired.
3. Eyeball each row. Mark `keep_merge` (Y/N) in the rightmost column.
4. Re-run dedupe with the `--apply-review-decisions <filename>.review.csv` flag (or click "Apply review decisions" in the GUI).
5. Final cleaned CSV reflects your manual choices.
For a 5,000-row lead list, the review queue is typically 20-60 rows. ~3 minutes of work. The output is dramatically better than auto-merge-everything-≥0.85, which is what most tools (including HubSpot's) do silently.
**Pro move:** save your `keep_merge` decisions over time. After 3-4 campaigns you'll have a corpus of "yes-merges" and "no-merges" you can use to retune the auto-merge threshold for *your* data. Most teams find their sweet spot is somewhere in 0.88-0.92.
— Michael
{{support_email}}

View File

@@ -0,0 +1,26 @@
# RevOps · Day 30 — Heard from another RevOps lead?
**Subject:** Heard from another RevOps lead?
**Send:** Day 30
**Goal:** referral / review ask
---
Hi {{first_name}},
A month in. If DataTools earned its $49 — would you do me one small favor?
**Pick the one that's easiest.**
1. **Gumroad review** (60 seconds): {{download_url}}#reviews — every line helps the next RevOps lead trust the listing enough to click "buy".
2. **Reply to this email with one sentence I can quote** on the RevOps landing page. Anonymous if you prefer; I'll never use a name without explicit permission.
3. **Share the landing page** with one RevOps friend who'd benefit: {{landing_page}}. No referral commission, just a link.
If DataTools *didn't* earn its $49 — also reply. Tell me what's missing or broken. The 30-day refund window is still open and I'd rather refund than have an unhappy customer in the wild.
Either way, this is the last automated email you'll get from me. After this you only hear from me when there's a v1.x update or if you reply to one of the previous emails.
Thanks for being an early buyer — the first 50 customers shape the next 5,000.
— Michael
{{support_email}}

View File

@@ -0,0 +1,34 @@
# Shopify-pet · Day 0 — Delivery email
**Subject:** Your DataTools download (start here)
**Send:** immediately on Gumroad purchase confirmation
**Goal:** download + first run within 24h
---
Hi {{first_name}},
Thanks for buying DataTools. Your download:
**{{download_url}}**
Three things to do in the next 5 minutes:
**1. Download the installer for your OS** (Mac `.dmg`, Windows `.exe`, or Linux `.tar.gz`). About 280 MB. The link auto-detects.
**2. Run it.** First launch takes ~5 seconds; a browser tab opens to `127.0.0.1:8501`. That's the app — running locally on your machine. No data leaves the box. Your customer list never goes to a server.
**3. Drop in a real Shopify customer export.** Don't bother with the bundled samples. Customers > Export > "All customers" > CSV in Shopify admin. Drag it into DataTools' analyzer, click **"Run all"**. You'll see what it catches — typically a few hundred phone-format issues, some hidden-character emails, and a handful of cross-row duplicates — in about 30 seconds.
If something doesn't work: reply to this email. Goes to my inbox.
Refund: also reply. 30-day no-questions; no form.
Tomorrow I'll send a sample Shopify customer export with the tricky cases pre-built in, so you can see what the cleanup catches on a known input. After that you'll get one email a week for the next month with one tip each. Unsubscribe at the bottom of any of them.
Welcome aboard.
— Michael
{{support_email}}
P.S. Got a fellow store owner who'd find this useful? {{landing_page}}.

View File

@@ -0,0 +1,32 @@
# Shopify-pet · Day 1 — Try it on this Shopify customer export first
**Subject:** Try it on this Shopify customer export first
**Send:** Day 1, ~9am buyer-local-time
---
Hi {{first_name}},
Yesterday's email had your download. Today's email has a *file* — a synthetic Shopify customer export I built specifically to break things Klaviyo silently chokes on.
**{{sample_file_url}}** (480 KB CSV, 2,200 rows — fully synthetic, no real customer data)
What's hidden in there:
- Phone numbers in 6 different formats (`(415) 555-0143`, `415.555.0143`, `4155550143`, `+44 20 7946 0958` without country field, `+1-415-555-0143 ext 12`, `415 555 0143`)
- Email addresses with embedded zero-width spaces (looks identical to a clean email; Klaviyo treats as different addresses)
- ~80 obvious customer duplicates (same email, different case)
- ~40 cross-row duplicates (different email, same name + same shipping address — usually the same person ordering with two emails)
- Shipping addresses with mixed `St.` / `Street` / `St` / `STREET` for the same street name
- 12 customers from outside North America with country field blank
Drop it into DataTools. Click **"Run all"** in the analyzer. Then run **format → dedupe → text-clean → gate** in that order.
Look at the **gate report** at the end — it'll tell you exactly which rows would have broken Klaviyo, with a one-line "why" per row.
If you want to see the difference: import the **raw** file to a test Klaviyo list, then import the **cleaned** file to a different test list. Compare the SMS-deliverable count. The delta is what you've been losing every month.
Reply and tell me what it caught (or missed) — v1.1 detector improvements come from real-world feedback.
— Michael
{{support_email}}

View File

@@ -0,0 +1,33 @@
# Shopify-pet · Day 3 — The phone-format step Klaviyo cares about
**Subject:** The phone-format step Klaviyo cares about
**Send:** Day 3
**Goal:** deepen feature understanding around the format standardizer
---
Hi {{first_name}},
The single biggest source of "Klaviyo dropped this customer silently" is phone formatting. DataTools fixes this in one tool — the **format standardizer** — but the *settings* matter.
Klaviyo (and basically every modern SMS platform) wants phones in **E.164** format: `+` then country code then number, no spaces, no dashes, no extension. Like: `+14155550143`.
Three settings in DataTools' format standardizer that get this right:
**1. Set "Phone output format" to `E.164`.** Default is `national` (`(415) 555-0143`) — fine for display, broken for Klaviyo. Change it once; the preset remembers.
**2. Set "Default country" per row, not per file.** This is the non-obvious one. For each customer:
- If the `country` field has a value (e.g., "Canada", "CA", "Canadá"), use it.
- If blank, fall back to the country in the *shipping address*.
- If still blank, fall back to the file-level default (you set this — typically your store's primary market).
DataTools does this automatically when you check "Use per-row country detection". *Skip this and ~30% of international customers will end up with US country codes prepended to their numbers — which Klaviyo accepts but routes wrong, and your SMS never arrives.*
**3. Set "Quarantine un-parseable phones" to ON.** Don't drop them silently; don't pass them to Klaviyo broken. Send them to `<filename>.quarantine.csv` so you can fix the worst 10-20 by hand and re-include them.
The combination — E.164 + per-row country + quarantine — typically takes a Shopify export from "60-70% of phones survive Klaviyo's import" to "97-99%". On a 10,000-customer list, that's 2,500 - 3,500 more customers reachable per campaign.
Reply if you want me to walk through these settings on a screen-share — happy to do this for any buyer in the first 30 days.
— Michael
{{support_email}}

View File

@@ -0,0 +1,35 @@
# Shopify-pet · Day 7 — Run it before every Klaviyo sync
**Subject:** Run it before every Klaviyo sync
**Send:** Day 7
**Goal:** reframe from one-off tool to per-sync workflow
---
Hi {{first_name}},
A week in. By now you've probably run DataTools on a real customer export once or twice and seen the cleanup catch things you'd been losing in Klaviyo for months.
The thing that turns DataTools into a recurring win instead of a one-off purchase: **run it before every sync, not just the first time.**
The pattern that works for most stores:
**1. Pick a cadence.** Most stores I talk to do this monthly; high-volume stores do it weekly. The cadence should match your "I'm planning a campaign" rhythm.
**2. The Sunday-morning ritual:**
- Pull a fresh customer export from Shopify (Customers > Export > "All customers")
- Drop into DataTools
- Run the pipeline (analyzer → format → text-clean → dedupe → gate)
- Review the gate quarantine file (typically 0.5-2% of rows)
- Push the cleaned CSV to Klaviyo (their CSV import or via their API)
**3. Save your settings as a preset.** The "Save settings" button writes a `.datatools-preset.json`. Keep it in your store's Drive / Notion / wherever your shop docs live. Next month, load preset, run pipeline, done in 4 minutes.
**4. After 3 months, retune the preset.** Look at your manual-review queue across the 3 runs. If you're consistently approving 0.86-confidence merges, drop the auto-merge threshold to 0.85. If you're rejecting 0.92 merges, raise it to 0.94. The preset improves with use.
The store owners doing this monthly tell me their open rates go up 8-15% in the first 90 days — not from new content, just from the email actually reaching the inbox.
If you want, reply with a sanitized export and I'll suggest a starting preset for your store — happy to do this for the first 50 buyers.
— Michael
{{support_email}}

View File

@@ -0,0 +1,32 @@
# Shopify-pet · Day 14 — Two-minute trick: hidden-character cleanup
**Subject:** Two-minute trick: hidden-character cleanup
**Send:** Day 14
**Goal:** surface the text cleaner — non-obvious, high-value
---
Hi {{first_name}},
The tool inside DataTools that buyers find last is the **text cleaner** — and on Shopify customer exports it's usually the one with the most "wait, that was a problem?" moments.
What it catches: invisible characters that got into your customer data when customers typed on their phones. The most common offenders:
- **Zero-width space** (`U+200B`) inside emails — Klaviyo treats `sarah@acme.com` (with hidden char) and `sarah@acme.com` (without) as different addresses
- **Non-breaking space** (`U+00A0`) inside addresses — Shopify accepts it, Klaviyo accepts it, but USPS address validation fails on it
- **BOM marker** (`U+FEFF`) at the start of CSV cells — usually from a customer pasting from Word or a PDF
- **Right-to-left mark** (`U+200F`) — rare, but appears in customer names from Hebrew/Arabic locales
The 2-minute workflow:
1. After the format standardizer pass, run the text cleaner.
2. It produces an additional sidecar file: `<filename>.hidden-chars.csv` — every cell where it found a hidden char, with a "what was hidden where" annotation.
3. Skim it. Most are fine to silently strip (zero-width spaces, BOMs). For rare ones (right-to-left marks in a name), confirm before stripping — sometimes they're load-bearing.
4. Click "Apply cleanup". The text cleaner replaces the hidden chars in the cleaned CSV.
The reason this matters: **dedupe runs after text-clean.** Two emails with a hidden char difference look identical in the GUI but get treated as two separate customers — and your dedupe pass won't catch them unless the text cleaner ran first.
The pipeline order baked into the GUI is: `analyzer → format → text-clean → dedupe → gate`. Stick to it; per-tool runs out of order are the most common source of "wait, why didn't dedupe catch this?".
— Michael
{{support_email}}

View File

@@ -0,0 +1,26 @@
# Shopify-pet · Day 30 — Heard from another store owner?
**Subject:** Heard from another store owner?
**Send:** Day 30
**Goal:** referral / review ask
---
Hi {{first_name}},
A month in. If DataTools earned its $49 — would you do me one small favor?
**Pick the one that's easiest.**
1. **Gumroad review** (60 seconds): {{download_url}}#reviews — every line helps the next Shopify owner trust the listing enough to click "buy".
2. **Reply to this email with one sentence I can quote** on the landing page. Anonymous if you prefer; I'll never use a name without explicit permission.
3. **Share the landing page** with one fellow store owner who'd benefit: {{landing_page}}. No referral commission, just a link.
If DataTools *didn't* earn its $49 — also reply. Tell me what's missing or broken. The 30-day refund window is still open and I'd rather refund than have an unhappy customer in the wild.
Either way, this is the last automated email you'll get from me. After this you only hear from me when there's a v1.x update or if you reply to one of the previous emails.
Thanks for being an early buyer — the first 50 customers shape the next 5,000.
— Michael
{{support_email}}