Files
datatools-dev/layout-review/01_deduplicator.html
Michael dd0942d710 feat(layout-review): journey-level redesign — front door, taught order, consistency
Addresses the journey-level review (the app felt like 12 tools sharing a
stylesheet, not one guided product). File-partitioned changes:

Navigation (shell.js): rename Home -> "Start here" with front-door
emphasis (.dt-nav-start); reorder Data Cleaners into pipeline order
(Clean Text -> Standardize -> Fix Missing -> Find Duplicates); new
"Finance" group (Reconcile, PDF to CSV); all stubs moved to a bottom
"Coming soon" group, no longer interleaved with working tools.

Front door (home.html): a prominent primary "Clean these files for me"
that runs the recommended pipeline in order, above the existing
per-finding cards (reframed as "fix one thing at a time").

Shared tokens (app.css): .dt-next-step suggestion strip + .dt-nav-start.

Teach the order: a slim .dt-next-step strip at the end of each linear
cleaner page points to the next pipeline step (Map Columns -> Start here;
orchestrator/Finance pages correctly omit it).

Local-first: the green "Runs 100% locally" pill now sits in every working
tool page's header (home + 8 tools), where client data is entered.

Plain English: jargon relabeled on input controls (coerce, E.164,
NFC/NFKC, sentinels, survivor rule), technical terms kept in tooltips and
audit/output cells only.

Stubs (06/08/07): rebuilt to one identical skeleton — info line + plain
feature list + a real "Notify me when this ships" button; every disabled
control and uploader removed (a dimmed dropzone reads as broken).

Intake: full dropzone+chip replaced with the compact "Using <file>" banner
on Clean Text, Fix Missing, Find Duplicates, and both Reconcile sides.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:44:11 +00:00

193 lines
10 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Layout review — Find Duplicates</title>
<link rel="stylesheet" href="app.css">
</head>
<body data-page="01_deduplicator">
<div class="dt-app">
<aside class="dt-sidebar" id="dt-sidebar"></aside>
<main class="dt-main">
<div class="dt-review-banner">
<span class="dt-mi">visibility</span>
<span>Static layout preview of <strong>Find Duplicates</strong>, shown with a file imported and a completed run (results + match-group review). <a href="index.html">All pages →</a></span>
</div>
<div class="dt-main-inner">
<!-- Tool header -->
<div class="dt-tool-header">
<h1>Find Duplicates</h1>
<div style="display:flex;align-items:center;gap:12px;margin-top:6px">
<span class="dt-privacy-pill">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor">
<rect x="4" y="11" width="16" height="10" rx="2"/>
<path d="M8 11V7a4 4 0 018 0v4"/>
</svg>
Runs 100% locally
</span>
<button class="dt-help-btn" style="margin-top:0"><span class="dt-mi">help_outline</span> Help</button>
</div>
</div>
<p class="dt-tool-caption">Find rows that repeat, then keep one and remove the extras.</p>
<div class="dt-spacer"></div>
<!-- File pickup banner (using file from upload screen) -->
<div class="dt-alert info">
<span class="dt-mi">description</span>
<span>Using <strong>customers_export.csv</strong> from the upload screen.</span>
</div>
<button class="dt-btn" style="margin-bottom:4px">Use a different file</button>
<!-- Delimiter selector — delimited-text only (CSV/TSV); omitted for XLSX/XLS.
Shown here because the staged file is customers_export.csv. -->
<div class="dt-field" style="max-width:320px">
<label class="dt-label">Delimiter</label>
<div class="dt-select">Comma (,)</div>
<div class="dt-help-text">Auto-detected on upload. Change if the preview looks wrong.</div>
</div>
<!-- Preview expander (collapsed after a result exists) -->
<details class="dt-expander">
<summary>Preview: customers_export.csv</summary>
<div class="dt-expander-body">
<p class="dt-caption">18,442 rows, 6 columns</p>
<div class="dt-table-wrap">
<table class="dt-table">
<thead><tr><th class="idx"></th><th>name</th><th>email</th><th>city</th><th>phone</th><th>signup_date</th></tr></thead>
<tbody>
<tr><td class="idx">0</td><td>Jane Doe</td><td>jane@acme.io</td><td>Austin</td><td>512-555-0190</td><td>2024-01-04</td></tr>
<tr><td class="idx">1</td><td>jane doe</td><td>JANE@ACME.IO</td><td>austin</td><td>(512) 555-0190</td><td>01/04/2024</td></tr>
<tr><td class="idx">2</td><td>Bob Smith</td><td>bob@globex.com</td><td>Denver</td><td>720-555-7781</td><td>2024-02-11</td></tr>
<tr><td class="idx">3</td><td>R. Smith</td><td>bob@globex.com</td><td>Denver</td><td>720-555-7781</td><td>2024-02-11</td></tr>
</tbody>
</table>
</div>
</div>
</details>
<!-- Basic controls (visible by default) -->
<div class="dt-cols-2">
<div class="dt-field"><label class="dt-label">Match threshold</label>
<div class="dt-slider"><div class="track"><div class="fill" style="width:70%"></div><div class="knob" style="left:70%"></div></div><div class="val">85</div></div>
<div class="dt-help-text">Higher means rows must look more alike to count as a duplicate.</div></div>
<div class="dt-field"><label class="dt-label">When duplicates are found, keep</label>
<div class="dt-select">the most-complete row</div>
<div class="dt-help-text">Which row survives in each group of duplicates.</div></div>
</div>
<!-- Advanced options (single expander; basics live above) -->
<details class="dt-expander">
<summary>Advanced options</summary>
<div class="dt-expander-body">
<p class="dt-help-text" style="margin-top:0">Leave these empty to auto-detect which columns to compare. Otherwise, list the columns that must match <strong>exactly</strong> and the ones that only need to match <strong>approximately</strong> — together these are the columns used to find duplicates.</p>
<div class="dt-cols-2">
<div>
<div class="dt-field"><label class="dt-label">Columns that must match exactly</label>
<div class="dt-multiselect"><span class="dt-ms-chip">email <span class="x"></span></span></div></div>
<div class="dt-field"><label class="dt-label">Columns to match approximately</label>
<div class="dt-multiselect"><span class="dt-ms-chip">name <span class="x"></span></span></div></div>
</div>
<div>
<div class="dt-field"><label class="dt-label">Approximate-match algorithm</label><div class="dt-select">jaro_winkler</div></div>
</div>
</div>
<div class="dt-check on" style="margin-top:6px"><span class="box"><span class="dt-mi">check</span></span> Merge mode — fill missing fields in the surviving row</div>
</div>
</details>
<hr class="dt-divider">
<button class="dt-btn dt-btn-primary dt-btn-block">Find Duplicates</button>
<hr class="dt-divider">
<!-- Results -->
<h2>Results</h2>
<div class="dt-metrics">
<div class="dt-metric"><div class="label">Original rows</div><div class="value">18,442</div></div>
<div class="dt-metric"><div class="label">Duplicate rows</div><div class="value">312</div><div class="delta down">312 removed</div></div>
<div class="dt-metric"><div class="label">Match groups</div><div class="value">147</div></div>
<div class="dt-metric"><div class="label">Rows kept</div><div class="value">18,130</div></div>
</div>
<p class="dt-caption">Preview of an auto-resolved run: each group keeps its auto-picked survivor. Review the groups below to override any pending picks before the final download.</p>
<div class="dt-btn-row" style="max-width:560px">
<button class="dt-btn">Download auto-resolved CSV</button>
<button class="dt-btn">Download removed rows</button>
</div>
<hr class="dt-divider">
<!-- Match groups -->
<h2>Match Groups</h2>
<div class="dt-cols-3" style="max-width:520px">
<button class="dt-btn">Accept All</button>
<button class="dt-btn">Reject All</button>
<button class="dt-btn">Clear Decisions</button>
</div>
<p class="dt-caption" style="margin-top:8px">Differing columns are highlighted. The survivor row is kept; uncheck a row to split it out of the group.</p>
<!-- Match group card 1 -->
<div class="dt-match-card">
<div class="dt-match-head">
<span class="title">Group 1 · 2 rows</span>
<span class="conf"><span class="dt-count-pill success">98% match</span></span>
</div>
<div class="dt-match-body">
<div class="dt-table-wrap">
<table class="dt-table">
<thead><tr><th>keep</th><th>name</th><th>email</th><th>city</th><th>phone</th><th>signup_date</th></tr></thead>
<tbody>
<tr class="dt-keep-row"><td><span class="dt-keep-tag">keep</span></td><td>Jane Doe</td><td>jane@acme.io</td><td>Austin</td><td>512-555-0190</td><td>2024-01-04</td></tr>
<tr><td><span class="dt-caption">remove</span></td><td class="dt-cell-flag">jane doe</td><td class="dt-cell-flag">JANE@ACME.IO</td><td class="dt-cell-flag">austin</td><td>(512) 555-0190</td><td class="dt-cell-flag">01/04/2024</td></tr>
</tbody>
</table>
</div>
</div>
</div>
<!-- Match group card 2 -->
<div class="dt-match-card">
<div class="dt-match-head">
<span class="title">Group 2 · 2 rows</span>
<span class="conf"><span class="dt-count-pill warn">87% match</span></span>
</div>
<div class="dt-match-body">
<div class="dt-table-wrap">
<table class="dt-table">
<thead><tr><th>keep</th><th>name</th><th>email</th><th>city</th><th>phone</th><th>signup_date</th></tr></thead>
<tbody>
<tr class="dt-keep-row"><td><span class="dt-keep-tag">keep</span></td><td>Bob Smith</td><td>bob@globex.com</td><td>Denver</td><td>720-555-7781</td><td>2024-02-11</td></tr>
<tr><td><span class="dt-caption">remove</span></td><td class="dt-cell-flag">R. Smith</td><td>bob@globex.com</td><td>Denver</td><td>720-555-7781</td><td>2024-02-11</td></tr>
</tbody>
</table>
</div>
</div>
</div>
<p class="dt-caption" style="margin-top:14px">Decisions: 1 merged, 1 pending · Pending groups keep their auto-picked survivor unless you review them.</p>
<button class="dt-btn dt-btn-primary dt-btn-block" style="margin-top:8px">Apply Review Decisions &amp; Download Final CSV</button>
<!-- Processing log -->
<details class="dt-expander" style="margin-top:18px">
<summary>Processing Log</summary>
<div class="dt-expander-body">
<div class="dt-code">[00:00.01] Loaded 18,442 rows from customers_export.csv
[00:00.04] Strategy: exact(email) + fuzzy(name, jaro_winkler ≥ 85)
[00:00.91] Compared 18,442 rows → 147 match groups
[00:01.02] Survivor rule: most-complete · merge=on
[00:01.05] 312 rows flagged for removal</div>
</div>
</details>
<div class="dt-next-step"><span class="dt-mi">arrow_forward</span><span>Duplicates handled — your file is cleaned. Review the result or <a href="home.html">Back to Start here →</a></span><button class="dt-next-step-dismiss" title="Dismiss"></button></div>
</div>
</main>
</div>
<footer class="dt-footer" id="dt-footer"></footer>
<script src="shell.js"></script>
</body>
</html>