# RevOps · Day 1 — Try it on this 3-vendor lead list first **Subject:** Try it on this 3-vendor lead list first **Send:** Day 1, ~9am buyer-local-time --- Hi {{first_name}}, Yesterday's email had your download. Today's email has a *file* — a synthetic 3-vendor lead list (HubSpot + LinkedIn scrape + Apollo pull) that I built specifically to break naive dedupe. → **{{sample_file_url}}** (1.2 MB CSV, 4,800 rows — fully synthetic, no real prospects) What's hidden in there: - The same person from 3 sources, with intentionally inconsistent fields: - HubSpot row: full email + company; no LinkedIn URL - LinkedIn row: name + title + LinkedIn URL; no email - Apollo row: email + phone + company; misspelled name - ~120 obvious duplicates (same email, different case) - ~80 cross-source duplicates (different keys, same person — these are the ones HubSpot's native dedupe misses) - ~40 phone numbers in 5 different formats per country (+1, +44, +61) - One row per 200 with a hidden zero-width space in the email Drop it into DataTools, click **"Run all"** in the analyzer, then run the **dedupe** tool with the default 0.85 threshold. Look at three things in the output: 1. **The cleaned CSV** — what your import would look like 2. **The audit CSV** — every change, every rule, confidence per change 3. **The manual-review queue** (`.review.csv`) — the 0.85-0.95 confidence range. This is where the real dedupe value is; auto-merging this range is what gets people in trouble. Try it once on the sample, then once on a real list. Reply and tell me what it caught (or missed) — the v1.1 fuzzy-matching tuning comes from real-world feedback. — Michael {{support_email}}