Files
datatools-dev/marketing/emails/revops/02-day3.md
Michael e1f364f010 feat: Tier B operator scaffolding — bundle, copy SoT, posts, emails
Pick up and finish yesterday's cut-off Tier B pass.

- build/: PyInstaller scaffold (datatools.spec + launcher.py +
  hook-streamlit.py + README) — folder-mode bundle, locked
  127.0.0.1, per-OS recipe
- marketing/COPY.md: single source of truth for every customer-facing
  string — landing H1/sub/CTAs, demo CTAs, email subjects, Gumroad
  listing, banned phrases
- marketing/community-posts/: 9 drafts (3 posts × 3 niches:
  bookkeeper, revops, shopify-pet) — story / tip / soft-offer
- marketing/emails/: 18 drafts (Gumroad delivery + 5-touch
  onboarding × 3 niches), per-niche segmentation guidance
- docs/NEXT-STEPS.md: flip 2.2 / 2.4 / 3.1 / 3.4 to done with
  pointers to the new assets; add Phase 0 inventory rows
- .gitignore: narrow `build/` ignore so PyInstaller spec + launcher
  + hooks get tracked, only generated artifacts (build/build/,
  build/__pycache__/, build/dist/) stay ignored

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 14:04:37 +00:00

2.1 KiB

RevOps · Day 3 — The dedupe rule that catches LinkedIn drift

Subject: The dedupe rule that catches LinkedIn drift Send: Day 3 Goal: deepen feature understanding around the cross-source dedupe


Hi {{first_name}},

The thing native HubSpot / Salesforce dedupe can't do, and the thing DataTools is actually best at: cross-source matching, where the same person shows up via LinkedIn, a webform, and a trade-show import — with no shared key.

The rule that does the work is in the dedupe tool's "Block by domain, fuzzy on name+title" mode. Here's what it does:

Step 1 — Block. Group rows by email domain. (LinkedIn rows with no email get bucketed by domain(linkedin_url) — usually their company website if they listed it.) This avoids the O(n²) explosion and rules out cross-company false positives.

Step 2 — Within each block, fuzzy-match on first_name + last_name + title. Token-set ratio at 0.85 default. Catches:

  • "Sarah O'Brien, VP Marketing" = "sarah obrien, vp of marketing"
  • "Mike Chen, Head of Sales" = "Michael Chen, Sales Lead" (this one needs a 0.78 threshold; configurable)
  • "J. Smith, Director" = "Jane Smith, Director" (only with a strong company-name match)

Step 3 — Confidence-tier the merge. ≥0.95 auto-merges. 0.85-0.95 goes to <filename>.review.csv for you to eyeball. <0.85 stays unmerged.

Step 4 — Field-precedence on merge. When records merge, you choose which source wins per field. Default precedence (configurable):

  • title, company, linkedin_url → LinkedIn wins (more recent)
  • email, phone → Webform wins (verified)
  • lifecycle_stage, owner → HubSpot wins (your CRM is canonical)

One trap to avoid: don't run dedupe before format standardization. If phone formats are inconsistent across sources, the dedupe tool sees "+14155550143" and "(415) 555-0143" as different keys. Always run format → analyzer → dedupe → gate in that order. The pipeline UI enforces this; the per-tool runs don't.

Reply if you want me to walk through the precedence config on a screen-share — happy to do this for any buyer in the first 30 days.

— Michael {{support_email}}