feat: Tier B operator scaffolding — bundle, copy SoT, posts, emails

Pick up and finish yesterday's cut-off Tier B pass. - build/: PyInstaller scaffold (datatools.spec + launcher.py + hook-streamlit.py + README) — folder-mode bundle, locked 127.0.0.1, per-OS recipe - marketing/COPY.md: single source of truth for every customer-facing string — landing H1/sub/CTAs, demo CTAs, email subjects, Gumroad listing, banned phrases - marketing/community-posts/: 9 drafts (3 posts × 3 niches: bookkeeper, revops, shopify-pet) — story / tip / soft-offer - marketing/emails/: 18 drafts (Gumroad delivery + 5-touch onboarding × 3 niches), per-niche segmentation guidance - docs/NEXT-STEPS.md: flip 2.2 / 2.4 / 3.1 / 3.4 to done with pointers to the new assets; add Phase 0 inventory rows - .gitignore: narrow `build/` ignore so PyInstaller spec + launcher + hooks get tracked, only generated artifacts (build/build/, build/__pycache__/, build/dist/) stay ignored Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 14:04:37 +00:00
parent 966af8ef94
commit e1f364f010
36 changed files with 1741 additions and 15 deletions
--- a/marketing/emails/revops/00-delivery.md
+++ b/marketing/emails/revops/00-delivery.md
@@ -0,0 +1,34 @@
+# RevOps · Day 0 — Delivery email
+
+**Subject:** Your DataTools download (start here)
+**Send:** immediately on Gumroad purchase confirmation
+**Goal:** download + first run within 24h
+
+---
+
+Hi {{first_name}},
+
+Thanks for buying DataTools. Your download:
+
+→ **{{download_url}}**
+
+Three things to do in the next 5 minutes:
+
+**1. Download the installer for your OS** (Mac `.dmg`, Windows `.exe`, or Linux `.tar.gz`). About 280 MB. The link auto-detects.
+
+**2. Run it.** First launch takes ~5 seconds; a browser tab opens to `127.0.0.1:8501`. That's the app — running locally on your machine. No data leaves the box. (Yes, even if you're on the corporate VPN. Especially then.)
+
+**3. Drop in a real lead list.** Don't bother with the bundled samples — the gate report only gets interesting when the data is real. Pull last quarter's webform export, or your most recent Apollo / LinkedIn pull, drag it into the analyzer, and click **"Run all"**. You'll see what the dedupe + format pipeline does in about 30 seconds.
+
+If something doesn't work: just reply. I read every reply.
+
+Refund: also just reply. 30-day no-questions; no form.
+
+Tomorrow I'll send a sample 3-vendor lead list (HubSpot + LinkedIn + Apollo, synthetic data) so you can see the dedupe confidence tiers in action on a known input. After that you'll get one email a week for the next month — practical tips, no upsell. Unsubscribe at the bottom of any of them.
+
+Welcome aboard.
+
+— Michael
+{{support_email}}
+
+P.S. If you have a RevOps friend who'd find this useful: {{landing_page}}.
--- a/marketing/emails/revops/01-day1.md
+++ b/marketing/emails/revops/01-day1.md
@@ -0,0 +1,36 @@
+# RevOps · Day 1 — Try it on this 3-vendor lead list first
+
+**Subject:** Try it on this 3-vendor lead list first
+**Send:** Day 1, ~9am buyer-local-time
+
+---
+
+Hi {{first_name}},
+
+Yesterday's email had your download. Today's email has a *file* — a synthetic 3-vendor lead list (HubSpot + LinkedIn scrape + Apollo pull) that I built specifically to break naive dedupe.
+
+→ **{{sample_file_url}}** (1.2 MB CSV, 4,800 rows — fully synthetic, no real prospects)
+
+What's hidden in there:
+
+- The same person from 3 sources, with intentionally inconsistent fields:
+  - HubSpot row: full email + company; no LinkedIn URL
+  - LinkedIn row: name + title + LinkedIn URL; no email
+  - Apollo row: email + phone + company; misspelled name
+- ~120 obvious duplicates (same email, different case)
+- ~80 cross-source duplicates (different keys, same person — these are the ones HubSpot's native dedupe misses)
+- ~40 phone numbers in 5 different formats per country (+1, +44, +61)
+- One row per 200 with a hidden zero-width space in the email
+
+Drop it into DataTools, click **"Run all"** in the analyzer, then run the **dedupe** tool with the default 0.85 threshold.
+
+Look at three things in the output:
+
+1. **The cleaned CSV** — what your import would look like
+2. **The audit CSV** — every change, every rule, confidence per change
+3. **The manual-review queue** (`<filename>.review.csv`) — the 0.85-0.95 confidence range. This is where the real dedupe value is; auto-merging this range is what gets people in trouble.
+
+Try it once on the sample, then once on a real list. Reply and tell me what it caught (or missed) — the v1.1 fuzzy-matching tuning comes from real-world feedback.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/revops/02-day3.md
+++ b/marketing/emails/revops/02-day3.md
@@ -0,0 +1,36 @@
+# RevOps · Day 3 — The dedupe rule that catches LinkedIn drift
+
+**Subject:** The dedupe rule that catches LinkedIn drift
+**Send:** Day 3
+**Goal:** deepen feature understanding around the cross-source dedupe
+
+---
+
+Hi {{first_name}},
+
+The thing native HubSpot / Salesforce dedupe can't do, and the thing DataTools is actually best at: **cross-source matching**, where the same person shows up via LinkedIn, a webform, and a trade-show import — with no shared key.
+
+The rule that does the work is in the dedupe tool's **"Block by domain, fuzzy on name+title"** mode. Here's what it does:
+
+**Step 1 — Block.** Group rows by email domain. (LinkedIn rows with no email get bucketed by `domain(linkedin_url)` — usually their company website if they listed it.) This avoids the O(n²) explosion and rules out cross-company false positives.
+
+**Step 2 — Within each block, fuzzy-match on `first_name + last_name + title`.** Token-set ratio at 0.85 default. Catches:
+
+- "Sarah O'Brien, VP Marketing" = "sarah obrien, vp of marketing"
+- "Mike Chen, Head of Sales" = "Michael Chen, Sales Lead" (this one needs a 0.78 threshold; configurable)
+- "J. Smith, Director" = "Jane Smith, Director" (only with a strong company-name match)
+
+**Step 3 — Confidence-tier the merge.** ≥0.95 auto-merges. 0.85-0.95 goes to `<filename>.review.csv` for you to eyeball. <0.85 stays unmerged.
+
+**Step 4 — Field-precedence on merge.** When records merge, you choose which source wins per field. Default precedence (configurable):
+
+- `title`, `company`, `linkedin_url` → LinkedIn wins (more recent)
+- `email`, `phone` → Webform wins (verified)
+- `lifecycle_stage`, `owner` → HubSpot wins (your CRM is canonical)
+
+**One trap to avoid:** don't run dedupe before format standardization. If phone formats are inconsistent across sources, the dedupe tool sees "+14155550143" and "(415) 555-0143" as different keys. Always run **format → analyzer → dedupe → gate** in that order. The pipeline UI enforces this; the per-tool runs don't.
+
+Reply if you want me to walk through the precedence config on a screen-share — happy to do this for any buyer in the first 30 days.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/revops/03-day7.md
+++ b/marketing/emails/revops/03-day7.md
@@ -0,0 +1,34 @@
+# RevOps · Day 7 — Run it before every HubSpot import
+
+**Subject:** Run it before every HubSpot import
+**Send:** Day 7
+**Goal:** reframe from one-off tool to per-campaign workflow
+
+---
+
+Hi {{first_name}},
+
+A week in. By now you've probably run DataTools on a real list once or twice and confirmed the dedupe catches more than HubSpot's native check.
+
+The thing that turns DataTools into a per-month-cost saver instead of a one-off purchase: **make it the gate on every import.**
+
+The pattern that works:
+
+**1. One DataTools run per campaign source.** Webform pull → DataTools. LinkedIn scrape → DataTools. Apollo export → DataTools. Each run produces a "clean" CSV.
+
+**2. Concatenate the cleaned CSVs.** Standard pandas `concat` or just paste in Excel.
+
+**3. One more DataTools run on the concatenation.** This is the cross-source dedupe pass — the one that catches the same person across the three sources.
+
+**4. Compare against your current HubSpot export.** DataTools' dedupe against your existing CRM as the second source catches the people you already paid for last quarter and don't need to import again.
+
+**5. Import only the residue** — the rows that survived all four passes — into HubSpot.
+
+The buyers running this pipeline tell me they've cut their HubSpot marketing-contact bill 15-25% within two months. Not because their pipeline got smaller — because they stopped paying for duplicates.
+
+**One thing to set up once:** save your dedupe settings as a `.datatools-preset.json` and commit it to your RevOps team's repo (or a shared Drive folder). Same preset every campaign means consistent results across whoever's running it that week.
+
+If you want, reply with a sanitized lead list and I'll suggest a starting preset for your sources — happy to do this for the first 50 buyers.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/revops/04-day14.md
+++ b/marketing/emails/revops/04-day14.md
@@ -0,0 +1,34 @@
+# RevOps · Day 14 — Two-minute trick: the confidence tiers
+
+**Subject:** Two-minute trick: the confidence tiers
+**Send:** Day 14
+**Goal:** surface the manual-review queue — non-obvious, high-value
+
+---
+
+Hi {{first_name}},
+
+The single most-skipped feature in DataTools is also the one with the highest payoff per minute: the **manual-review queue**.
+
+Here's what's happening under the hood: every dedupe decision DataTools makes has a confidence score (0.0 to 1.0). The dedupe tool by default puts decisions into three buckets:
+
+- **≥0.95** → auto-merge (cleaned CSV)
+- **0.85 - 0.95** → manual-review queue (`<filename>.review.csv`)
+- **<0.85** → unmerged (kept as separate rows)
+
+The 0.85-0.95 bucket is the magic. It's the range where a tuned algorithm catches *most* duplicates but where the wrong choice is a real cost (merging two genuinely different people = lost prospect; not merging two duplicates = paid contact you didn't need).
+
+The 2-minute workflow:
+
+1. Run dedupe.
+2. Open `<filename>.review.csv`. Each row is a candidate merge with: confidence, the two records side-by-side, the rule that fired.
+3. Eyeball each row. Mark `keep_merge` (Y/N) in the rightmost column.
+4. Re-run dedupe with the `--apply-review-decisions <filename>.review.csv` flag (or click "Apply review decisions" in the GUI).
+5. Final cleaned CSV reflects your manual choices.
+
+For a 5,000-row lead list, the review queue is typically 20-60 rows. ~3 minutes of work. The output is dramatically better than auto-merge-everything-≥0.85, which is what most tools (including HubSpot's) do silently.
+
+**Pro move:** save your `keep_merge` decisions over time. After 3-4 campaigns you'll have a corpus of "yes-merges" and "no-merges" you can use to retune the auto-merge threshold for *your* data. Most teams find their sweet spot is somewhere in 0.88-0.92.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/revops/05-day30.md
+++ b/marketing/emails/revops/05-day30.md
@@ -0,0 +1,26 @@
+# RevOps · Day 30 — Heard from another RevOps lead?
+
+**Subject:** Heard from another RevOps lead?
+**Send:** Day 30
+**Goal:** referral / review ask
+
+---
+
+Hi {{first_name}},
+
+A month in. If DataTools earned its $49 — would you do me one small favor?
+
+**Pick the one that's easiest.**
+
+1. **Gumroad review** (60 seconds): {{download_url}}#reviews — every line helps the next RevOps lead trust the listing enough to click "buy".
+2. **Reply to this email with one sentence I can quote** on the RevOps landing page. Anonymous if you prefer; I'll never use a name without explicit permission.
+3. **Share the landing page** with one RevOps friend who'd benefit: {{landing_page}}. No referral commission, just a link.
+
+If DataTools *didn't* earn its $49 — also reply. Tell me what's missing or broken. The 30-day refund window is still open and I'd rather refund than have an unhappy customer in the wild.
+
+Either way, this is the last automated email you'll get from me. After this you only hear from me when there's a v1.x update or if you reply to one of the previous emails.
+
+Thanks for being an early buyer — the first 50 customers shape the next 5,000.
+
+— Michael
+{{support_email}}