datatools-dev/marketing/emails/revops/01-day1.md

# RevOps · Day 1 — Try it on this 3-vendor lead list first

**Subject:** Try it on this 3-vendor lead list first
**Send:** Day 1, ~9am buyer-local-time

---

Hi {{first_name}},

Yesterday's email had your download. Today's email has a *file* — a synthetic 3-vendor lead list (HubSpot + LinkedIn scrape + Apollo pull) that I built specifically to break naive dedupe.

→ **{{sample_file_url}}** (1.2 MB CSV, 4,800 rows — fully synthetic, no real prospects)

What's hidden in there:

- The same person from 3 sources, with intentionally inconsistent fields:
  - HubSpot row: full email + company; no LinkedIn URL
  - LinkedIn row: name + title + LinkedIn URL; no email
  - Apollo row: email + phone + company; misspelled name
- ~120 obvious duplicates (same email, different case)
- ~80 cross-source duplicates (different keys, same person — these are the ones HubSpot's native dedupe misses)
- ~40 phone numbers in 5 different formats per country (+1, +44, +61)
- One row per 200 with a hidden zero-width space in the email

Drop it into DataTools, click **"Run all"** in the analyzer, then run the **dedupe** tool with the default 0.85 threshold.

Look at three things in the output:

1. **The cleaned CSV** — what your import would look like
2. **The audit CSV** — every change, every rule, confidence per change
3. **The manual-review queue** (`<filename>.review.csv`) — the 0.85-0.95 confidence range. This is where the real dedupe value is; auto-merging this range is what gets people in trouble.

Try it once on the sample, then once on a real list. Reply and tell me what it caught (or missed) — the v1.1 fuzzy-matching tuning comes from real-world feedback.

— Michael
{{support_email}}