feat: Tier B operator scaffolding — bundle, copy SoT, posts, emails

Pick up and finish yesterday's cut-off Tier B pass.

- build/: PyInstaller scaffold (datatools.spec + launcher.py +
  hook-streamlit.py + README) — folder-mode bundle, locked
  127.0.0.1, per-OS recipe
- marketing/COPY.md: single source of truth for every customer-facing
  string — landing H1/sub/CTAs, demo CTAs, email subjects, Gumroad
  listing, banned phrases
- marketing/community-posts/: 9 drafts (3 posts × 3 niches:
  bookkeeper, revops, shopify-pet) — story / tip / soft-offer
- marketing/emails/: 18 drafts (Gumroad delivery + 5-touch
  onboarding × 3 niches), per-niche segmentation guidance
- docs/NEXT-STEPS.md: flip 2.2 / 2.4 / 3.1 / 3.4 to done with
  pointers to the new assets; add Phase 0 inventory rows
- .gitignore: narrow `build/` ignore so PyInstaller spec + launcher
  + hooks get tracked, only generated artifacts (build/build/,
  build/__pycache__/, build/dist/) stay ignored

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-02 14:04:37 +00:00
parent 966af8ef94
commit e1f364f010
36 changed files with 1741 additions and 15 deletions

View File

@@ -0,0 +1,34 @@
# RevOps · Day 0 — Delivery email
**Subject:** Your DataTools download (start here)
**Send:** immediately on Gumroad purchase confirmation
**Goal:** download + first run within 24h
---
Hi {{first_name}},
Thanks for buying DataTools. Your download:
**{{download_url}}**
Three things to do in the next 5 minutes:
**1. Download the installer for your OS** (Mac `.dmg`, Windows `.exe`, or Linux `.tar.gz`). About 280 MB. The link auto-detects.
**2. Run it.** First launch takes ~5 seconds; a browser tab opens to `127.0.0.1:8501`. That's the app — running locally on your machine. No data leaves the box. (Yes, even if you're on the corporate VPN. Especially then.)
**3. Drop in a real lead list.** Don't bother with the bundled samples — the gate report only gets interesting when the data is real. Pull last quarter's webform export, or your most recent Apollo / LinkedIn pull, drag it into the analyzer, and click **"Run all"**. You'll see what the dedupe + format pipeline does in about 30 seconds.
If something doesn't work: just reply. I read every reply.
Refund: also just reply. 30-day no-questions; no form.
Tomorrow I'll send a sample 3-vendor lead list (HubSpot + LinkedIn + Apollo, synthetic data) so you can see the dedupe confidence tiers in action on a known input. After that you'll get one email a week for the next month — practical tips, no upsell. Unsubscribe at the bottom of any of them.
Welcome aboard.
— Michael
{{support_email}}
P.S. If you have a RevOps friend who'd find this useful: {{landing_page}}.

View File

@@ -0,0 +1,36 @@
# RevOps · Day 1 — Try it on this 3-vendor lead list first
**Subject:** Try it on this 3-vendor lead list first
**Send:** Day 1, ~9am buyer-local-time
---
Hi {{first_name}},
Yesterday's email had your download. Today's email has a *file* — a synthetic 3-vendor lead list (HubSpot + LinkedIn scrape + Apollo pull) that I built specifically to break naive dedupe.
**{{sample_file_url}}** (1.2 MB CSV, 4,800 rows — fully synthetic, no real prospects)
What's hidden in there:
- The same person from 3 sources, with intentionally inconsistent fields:
- HubSpot row: full email + company; no LinkedIn URL
- LinkedIn row: name + title + LinkedIn URL; no email
- Apollo row: email + phone + company; misspelled name
- ~120 obvious duplicates (same email, different case)
- ~80 cross-source duplicates (different keys, same person — these are the ones HubSpot's native dedupe misses)
- ~40 phone numbers in 5 different formats per country (+1, +44, +61)
- One row per 200 with a hidden zero-width space in the email
Drop it into DataTools, click **"Run all"** in the analyzer, then run the **dedupe** tool with the default 0.85 threshold.
Look at three things in the output:
1. **The cleaned CSV** — what your import would look like
2. **The audit CSV** — every change, every rule, confidence per change
3. **The manual-review queue** (`<filename>.review.csv`) — the 0.85-0.95 confidence range. This is where the real dedupe value is; auto-merging this range is what gets people in trouble.
Try it once on the sample, then once on a real list. Reply and tell me what it caught (or missed) — the v1.1 fuzzy-matching tuning comes from real-world feedback.
— Michael
{{support_email}}

View File

@@ -0,0 +1,36 @@
# RevOps · Day 3 — The dedupe rule that catches LinkedIn drift
**Subject:** The dedupe rule that catches LinkedIn drift
**Send:** Day 3
**Goal:** deepen feature understanding around the cross-source dedupe
---
Hi {{first_name}},
The thing native HubSpot / Salesforce dedupe can't do, and the thing DataTools is actually best at: **cross-source matching**, where the same person shows up via LinkedIn, a webform, and a trade-show import — with no shared key.
The rule that does the work is in the dedupe tool's **"Block by domain, fuzzy on name+title"** mode. Here's what it does:
**Step 1 — Block.** Group rows by email domain. (LinkedIn rows with no email get bucketed by `domain(linkedin_url)` — usually their company website if they listed it.) This avoids the O(n²) explosion and rules out cross-company false positives.
**Step 2 — Within each block, fuzzy-match on `first_name + last_name + title`.** Token-set ratio at 0.85 default. Catches:
- "Sarah O'Brien, VP Marketing" = "sarah obrien, vp of marketing"
- "Mike Chen, Head of Sales" = "Michael Chen, Sales Lead" (this one needs a 0.78 threshold; configurable)
- "J. Smith, Director" = "Jane Smith, Director" (only with a strong company-name match)
**Step 3 — Confidence-tier the merge.** ≥0.95 auto-merges. 0.85-0.95 goes to `<filename>.review.csv` for you to eyeball. <0.85 stays unmerged.
**Step 4 — Field-precedence on merge.** When records merge, you choose which source wins per field. Default precedence (configurable):
- `title`, `company`, `linkedin_url` → LinkedIn wins (more recent)
- `email`, `phone` → Webform wins (verified)
- `lifecycle_stage`, `owner` → HubSpot wins (your CRM is canonical)
**One trap to avoid:** don't run dedupe before format standardization. If phone formats are inconsistent across sources, the dedupe tool sees "+14155550143" and "(415) 555-0143" as different keys. Always run **format → analyzer → dedupe → gate** in that order. The pipeline UI enforces this; the per-tool runs don't.
Reply if you want me to walk through the precedence config on a screen-share — happy to do this for any buyer in the first 30 days.
— Michael
{{support_email}}

View File

@@ -0,0 +1,34 @@
# RevOps · Day 7 — Run it before every HubSpot import
**Subject:** Run it before every HubSpot import
**Send:** Day 7
**Goal:** reframe from one-off tool to per-campaign workflow
---
Hi {{first_name}},
A week in. By now you've probably run DataTools on a real list once or twice and confirmed the dedupe catches more than HubSpot's native check.
The thing that turns DataTools into a per-month-cost saver instead of a one-off purchase: **make it the gate on every import.**
The pattern that works:
**1. One DataTools run per campaign source.** Webform pull → DataTools. LinkedIn scrape → DataTools. Apollo export → DataTools. Each run produces a "clean" CSV.
**2. Concatenate the cleaned CSVs.** Standard pandas `concat` or just paste in Excel.
**3. One more DataTools run on the concatenation.** This is the cross-source dedupe pass — the one that catches the same person across the three sources.
**4. Compare against your current HubSpot export.** DataTools' dedupe against your existing CRM as the second source catches the people you already paid for last quarter and don't need to import again.
**5. Import only the residue** — the rows that survived all four passes — into HubSpot.
The buyers running this pipeline tell me they've cut their HubSpot marketing-contact bill 15-25% within two months. Not because their pipeline got smaller — because they stopped paying for duplicates.
**One thing to set up once:** save your dedupe settings as a `.datatools-preset.json` and commit it to your RevOps team's repo (or a shared Drive folder). Same preset every campaign means consistent results across whoever's running it that week.
If you want, reply with a sanitized lead list and I'll suggest a starting preset for your sources — happy to do this for the first 50 buyers.
— Michael
{{support_email}}

View File

@@ -0,0 +1,34 @@
# RevOps · Day 14 — Two-minute trick: the confidence tiers
**Subject:** Two-minute trick: the confidence tiers
**Send:** Day 14
**Goal:** surface the manual-review queue — non-obvious, high-value
---
Hi {{first_name}},
The single most-skipped feature in DataTools is also the one with the highest payoff per minute: the **manual-review queue**.
Here's what's happening under the hood: every dedupe decision DataTools makes has a confidence score (0.0 to 1.0). The dedupe tool by default puts decisions into three buckets:
- **≥0.95** → auto-merge (cleaned CSV)
- **0.85 - 0.95** → manual-review queue (`<filename>.review.csv`)
- **<0.85** → unmerged (kept as separate rows)
The 0.85-0.95 bucket is the magic. It's the range where a tuned algorithm catches *most* duplicates but where the wrong choice is a real cost (merging two genuinely different people = lost prospect; not merging two duplicates = paid contact you didn't need).
The 2-minute workflow:
1. Run dedupe.
2. Open `<filename>.review.csv`. Each row is a candidate merge with: confidence, the two records side-by-side, the rule that fired.
3. Eyeball each row. Mark `keep_merge` (Y/N) in the rightmost column.
4. Re-run dedupe with the `--apply-review-decisions <filename>.review.csv` flag (or click "Apply review decisions" in the GUI).
5. Final cleaned CSV reflects your manual choices.
For a 5,000-row lead list, the review queue is typically 20-60 rows. ~3 minutes of work. The output is dramatically better than auto-merge-everything-≥0.85, which is what most tools (including HubSpot's) do silently.
**Pro move:** save your `keep_merge` decisions over time. After 3-4 campaigns you'll have a corpus of "yes-merges" and "no-merges" you can use to retune the auto-merge threshold for *your* data. Most teams find their sweet spot is somewhere in 0.88-0.92.
— Michael
{{support_email}}

View File

@@ -0,0 +1,26 @@
# RevOps · Day 30 — Heard from another RevOps lead?
**Subject:** Heard from another RevOps lead?
**Send:** Day 30
**Goal:** referral / review ask
---
Hi {{first_name}},
A month in. If DataTools earned its $49 — would you do me one small favor?
**Pick the one that's easiest.**
1. **Gumroad review** (60 seconds): {{download_url}}#reviews — every line helps the next RevOps lead trust the listing enough to click "buy".
2. **Reply to this email with one sentence I can quote** on the RevOps landing page. Anonymous if you prefer; I'll never use a name without explicit permission.
3. **Share the landing page** with one RevOps friend who'd benefit: {{landing_page}}. No referral commission, just a link.
If DataTools *didn't* earn its $49 — also reply. Tell me what's missing or broken. The 30-day refund window is still open and I'd rather refund than have an unhappy customer in the wild.
Either way, this is the last automated email you'll get from me. After this you only hear from me when there's a v1.x update or if you reply to one of the previous emails.
Thanks for being an early buyer — the first 50 customers shape the next 5,000.
— Michael
{{support_email}}