feat: Tier B operator scaffolding — bundle, copy SoT, posts, emails

Pick up and finish yesterday's cut-off Tier B pass.

- build/: PyInstaller scaffold (datatools.spec + launcher.py +
  hook-streamlit.py + README) — folder-mode bundle, locked
  127.0.0.1, per-OS recipe
- marketing/COPY.md: single source of truth for every customer-facing
  string — landing H1/sub/CTAs, demo CTAs, email subjects, Gumroad
  listing, banned phrases
- marketing/community-posts/: 9 drafts (3 posts × 3 niches:
  bookkeeper, revops, shopify-pet) — story / tip / soft-offer
- marketing/emails/: 18 drafts (Gumroad delivery + 5-touch
  onboarding × 3 niches), per-niche segmentation guidance
- docs/NEXT-STEPS.md: flip 2.2 / 2.4 / 3.1 / 3.4 to done with
  pointers to the new assets; add Phase 0 inventory rows
- .gitignore: narrow `build/` ignore so PyInstaller spec + launcher
  + hooks get tracked, only generated artifacts (build/build/,
  build/__pycache__/, build/dist/) stay ignored

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-02 14:04:37 +00:00
parent 966af8ef94
commit e1f364f010
36 changed files with 1741 additions and 15 deletions

View File

@@ -0,0 +1,39 @@
# RevOps · Post 1 — Story
**Where to post:** r/revops, r/sales, RevGenius Slack, Modern Sales Pros,
Pavilion communities, LinkedIn (your own feed).
**Format:** ~400 words. Tactical war-story style. Don't pitch in the body.
---
## Title
We were paying HubSpot for 4,200 duplicate contacts. Here's the dedupe pipeline that caught them.
## Body
Last quarter I ran a count on our HubSpot instance: ~4,200 contacts that were almost-certainly the same person as another contact already in the system. Our HubSpot bill is per-marketing-contact, so this was a real number. ($X/month — pick your tier.)
The problem is that HubSpot's native "find duplicates" tool is exact-match-only on a small set of fields. It misses:
- "Sarah O'Brien" vs "Sarah Obrien" (apostrophe / no-apostrophe)
- "+1 (415) 555-0143" vs "415-555-0143" vs "4155550143" (phone formats)
- "sarah@acme.com" vs "Sarah@acme.com" (case)
- Same person from a LinkedIn scrape (no phone) + a webform fill (no LinkedIn URL) + a trade-show import (only email + company)
Here's the 4-step pipeline I run before *every* HubSpot import now. You can build the first 3 with Python + pandas + rapidfuzz; the 4th is the one that matters and is the easiest to skip:
**Step 1 — Normalize before comparing.** Lowercase emails, strip phone formatting to E.164, trim whitespace, normalize unicode (NFKC). This alone catches ~40% of dupes.
**Step 2 — Fuzzy-match on name + company, blocked by email domain.** Don't fuzzy-match across the whole list (O(n²) and full of false positives). Block by email domain first — only compare contacts within the same company. Use rapidfuzz token-set ratio at threshold 85.
**Step 3 — Cross-source merge logic.** When LinkedIn-source and webform-source records match, *the LinkedIn one wins on title/company* (more recent), *the webform one wins on phone/email* (verified). Document this rule somewhere your team can read it.
**Step 4 — Confidence tiers, not yes/no.** Don't auto-merge anything below 95% confidence. Auto-merge 95-100. Queue 85-95 for manual review. Drop everything below 85. The manual queue is the magic — it catches the cases the algorithm doesn't dare touch and trains you on what your data actually looks like.
I eventually wrapped all this into a desktop tool I called DataTools because I got tired of re-running the script every campaign. Local-only, $49 if anyone wants it: datatools.app/revops. But the 4-step framework above is the real takeaway — works regardless of what tool you use.
What's your dedupe pipeline look like?
— {{your-name}}