Pick up and finish yesterday's cut-off Tier B pass. - build/: PyInstaller scaffold (datatools.spec + launcher.py + hook-streamlit.py + README) — folder-mode bundle, locked 127.0.0.1, per-OS recipe - marketing/COPY.md: single source of truth for every customer-facing string — landing H1/sub/CTAs, demo CTAs, email subjects, Gumroad listing, banned phrases - marketing/community-posts/: 9 drafts (3 posts × 3 niches: bookkeeper, revops, shopify-pet) — story / tip / soft-offer - marketing/emails/: 18 drafts (Gumroad delivery + 5-touch onboarding × 3 niches), per-niche segmentation guidance - docs/NEXT-STEPS.md: flip 2.2 / 2.4 / 3.1 / 3.4 to done with pointers to the new assets; add Phase 0 inventory rows - .gitignore: narrow `build/` ignore so PyInstaller spec + launcher + hooks get tracked, only generated artifacts (build/build/, build/__pycache__/, build/dist/) stay ignored Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1.7 KiB
1.7 KiB
RevOps · Day 1 — Try it on this 3-vendor lead list first
Subject: Try it on this 3-vendor lead list first Send: Day 1, ~9am buyer-local-time
Hi {{first_name}},
Yesterday's email had your download. Today's email has a file — a synthetic 3-vendor lead list (HubSpot + LinkedIn scrape + Apollo pull) that I built specifically to break naive dedupe.
→ {{sample_file_url}} (1.2 MB CSV, 4,800 rows — fully synthetic, no real prospects)
What's hidden in there:
- The same person from 3 sources, with intentionally inconsistent fields:
- HubSpot row: full email + company; no LinkedIn URL
- LinkedIn row: name + title + LinkedIn URL; no email
- Apollo row: email + phone + company; misspelled name
- ~120 obvious duplicates (same email, different case)
- ~80 cross-source duplicates (different keys, same person — these are the ones HubSpot's native dedupe misses)
- ~40 phone numbers in 5 different formats per country (+1, +44, +61)
- One row per 200 with a hidden zero-width space in the email
Drop it into DataTools, click "Run all" in the analyzer, then run the dedupe tool with the default 0.85 threshold.
Look at three things in the output:
- The cleaned CSV — what your import would look like
- The audit CSV — every change, every rule, confidence per change
- The manual-review queue (
<filename>.review.csv) — the 0.85-0.95 confidence range. This is where the real dedupe value is; auto-merging this range is what gets people in trouble.
Try it once on the sample, then once on a real list. Reply and tell me what it caught (or missed) — the v1.1 fuzzy-matching tuning comes from real-world feedback.
— Michael {{support_email}}