Files
Michael e1f364f010 feat: Tier B operator scaffolding — bundle, copy SoT, posts, emails
Pick up and finish yesterday's cut-off Tier B pass.

- build/: PyInstaller scaffold (datatools.spec + launcher.py +
  hook-streamlit.py + README) — folder-mode bundle, locked
  127.0.0.1, per-OS recipe
- marketing/COPY.md: single source of truth for every customer-facing
  string — landing H1/sub/CTAs, demo CTAs, email subjects, Gumroad
  listing, banned phrases
- marketing/community-posts/: 9 drafts (3 posts × 3 niches:
  bookkeeper, revops, shopify-pet) — story / tip / soft-offer
- marketing/emails/: 18 drafts (Gumroad delivery + 5-touch
  onboarding × 3 niches), per-niche segmentation guidance
- docs/NEXT-STEPS.md: flip 2.2 / 2.4 / 3.1 / 3.4 to done with
  pointers to the new assets; add Phase 0 inventory rows
- .gitignore: narrow `build/` ignore so PyInstaller spec + launcher
  + hooks get tracked, only generated artifacts (build/build/,
  build/__pycache__/, build/dist/) stay ignored

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 14:04:37 +00:00

1.7 KiB

RevOps · Day 1 — Try it on this 3-vendor lead list first

Subject: Try it on this 3-vendor lead list first Send: Day 1, ~9am buyer-local-time


Hi {{first_name}},

Yesterday's email had your download. Today's email has a file — a synthetic 3-vendor lead list (HubSpot + LinkedIn scrape + Apollo pull) that I built specifically to break naive dedupe.

{{sample_file_url}} (1.2 MB CSV, 4,800 rows — fully synthetic, no real prospects)

What's hidden in there:

  • The same person from 3 sources, with intentionally inconsistent fields:
    • HubSpot row: full email + company; no LinkedIn URL
    • LinkedIn row: name + title + LinkedIn URL; no email
    • Apollo row: email + phone + company; misspelled name
  • ~120 obvious duplicates (same email, different case)
  • ~80 cross-source duplicates (different keys, same person — these are the ones HubSpot's native dedupe misses)
  • ~40 phone numbers in 5 different formats per country (+1, +44, +61)
  • One row per 200 with a hidden zero-width space in the email

Drop it into DataTools, click "Run all" in the analyzer, then run the dedupe tool with the default 0.85 threshold.

Look at three things in the output:

  1. The cleaned CSV — what your import would look like
  2. The audit CSV — every change, every rule, confidence per change
  3. The manual-review queue (<filename>.review.csv) — the 0.85-0.95 confidence range. This is where the real dedupe value is; auto-merging this range is what gets people in trouble.

Try it once on the sample, then once on a real list. Reply and tell me what it caught (or missed) — the v1.1 fuzzy-matching tuning comes from real-world feedback.

— Michael {{support_email}}