Find Duplicates (01_deduplicator): - Delete the redundant outer Options wrapper; surface threshold + survivor rule directly, push the rest behind a single Advanced pane. - Disambiguate competing primaries: top result is an auto-resolved preview (secondary download), review decisions are the single primary. - Plain-English match labels (exact / approximate); clarify the third. - Lift the match-card caption to a one-time instruction; note delimiter is delimited-text-only. Quality Check (08_validator_reporter) — stub: - Remove the dead disabled "Load rules file (JSON)" uploader so the stub invites a single action; keep the informative feature list. Map Columns (05_column_mapper): - Regroup schema -> mapping -> strategy/advanced (core task contiguous). - Make preset-vs-Advanced precedence legible (Custom + modified marker). - Adopt the compact file-intake banner; drop the duplicate resolved- mapping table; fix the add-row gutter style. Combine Files (07_multi_file_merger) — stub: - Actually disable the Merge CTA (add the disabled attribute). PDF to CSV (10_pdf_extractor): - Drop page/raw from the default preview to match export + fix the horizontal clip; surface raw via per-row affordance + overflow-x. - Move the column selector above the download button; give auto-excluded rows a reason; align the files card to Home; de-dupe the row count. Automated Workflows (09_pipeline_runner): - Replace hand-edited JSON step config with per-step control expanders; JSON moved behind Advanced import/export. - Editing the table marks the mode modified; fold the empty error column into the status pill; render summaries as plain English; collapse the explainer by default. Cross-cutting items (stub standardization on page 10, shared disabled- field token, remaining intake rollout) deferred to a holistic pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
191 lines
9.9 KiB
HTML
191 lines
9.9 KiB
HTML
<!doctype html>
|
||
<html lang="en">
|
||
<head>
|
||
<meta charset="utf-8">
|
||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||
<title>Layout review — Find Duplicates</title>
|
||
<link rel="stylesheet" href="app.css">
|
||
</head>
|
||
<body data-page="01_deduplicator">
|
||
<div class="dt-app">
|
||
<aside class="dt-sidebar" id="dt-sidebar"></aside>
|
||
<main class="dt-main">
|
||
<div class="dt-review-banner">
|
||
<span class="dt-mi">visibility</span>
|
||
<span>Static layout preview of <strong>Find Duplicates</strong>, shown with a file imported and a completed run (results + match-group review). <a href="index.html">All pages →</a></span>
|
||
</div>
|
||
<div class="dt-main-inner">
|
||
|
||
<!-- Tool header -->
|
||
<div class="dt-tool-header">
|
||
<h1>Find Duplicates</h1>
|
||
<button class="dt-help-btn"><span class="dt-mi">help_outline</span> Help</button>
|
||
</div>
|
||
<p class="dt-tool-caption">Find rows that repeat, then keep one and remove the extras.</p>
|
||
|
||
<div class="dt-spacer"></div>
|
||
|
||
<!-- Upload (file staged) -->
|
||
<label class="dt-label">Import CSV or Excel file</label>
|
||
<div class="dt-uploader">
|
||
<div class="dt-uploader-text">
|
||
<span class="hint"><span class="dt-mi" style="vertical-align:-4px">upload_file</span> Drag and drop file here</span>
|
||
<span class="sub">Up to 1.5 GB · CSV, TSV, XLSX, XLS · encoding & delimiter auto-detected</span>
|
||
</div>
|
||
<button class="dt-btn">Browse files</button>
|
||
</div>
|
||
<div class="dt-file-chip">
|
||
<span class="dt-file-icon-chip"><svg viewBox="0 0 24 24" fill="none" stroke="currentColor"><path d="M14 2H6a2 2 0 00-2 2v16a2 2 0 002 2h12a2 2 0 002-2V8z"/><path d="M14 2v6h6"/></svg></span>
|
||
<span class="name">customers_export.csv</span>
|
||
<span class="size">2.1 MB</span>
|
||
<button class="dt-btn dt-btn-tertiary" title="Remove">✕</button>
|
||
</div>
|
||
|
||
<!-- Delimiter selector — delimited-text only (CSV/TSV); omitted for XLSX/XLS.
|
||
Shown here because the staged file is customers_export.csv. -->
|
||
<div class="dt-field" style="max-width:320px">
|
||
<label class="dt-label">Delimiter</label>
|
||
<div class="dt-select">Comma (,)</div>
|
||
<div class="dt-help-text">Auto-detected on upload. Change if the preview looks wrong.</div>
|
||
</div>
|
||
|
||
<!-- Preview expander (collapsed after a result exists) -->
|
||
<details class="dt-expander">
|
||
<summary>Preview: customers_export.csv</summary>
|
||
<div class="dt-expander-body">
|
||
<p class="dt-caption">18,442 rows, 6 columns</p>
|
||
<div class="dt-table-wrap">
|
||
<table class="dt-table">
|
||
<thead><tr><th class="idx"></th><th>name</th><th>email</th><th>city</th><th>phone</th><th>signup_date</th></tr></thead>
|
||
<tbody>
|
||
<tr><td class="idx">0</td><td>Jane Doe</td><td>jane@acme.io</td><td>Austin</td><td>512-555-0190</td><td>2024-01-04</td></tr>
|
||
<tr><td class="idx">1</td><td>jane doe</td><td>JANE@ACME.IO</td><td>austin</td><td>(512) 555-0190</td><td>01/04/2024</td></tr>
|
||
<tr><td class="idx">2</td><td>Bob Smith</td><td>bob@globex.com</td><td>Denver</td><td>720-555-7781</td><td>2024-02-11</td></tr>
|
||
<tr><td class="idx">3</td><td>R. Smith</td><td>bob@globex.com</td><td>Denver</td><td>720-555-7781</td><td>2024-02-11</td></tr>
|
||
</tbody>
|
||
</table>
|
||
</div>
|
||
</div>
|
||
</details>
|
||
|
||
<!-- Basic controls (visible by default) -->
|
||
<div class="dt-cols-2">
|
||
<div class="dt-field"><label class="dt-label">Match threshold</label>
|
||
<div class="dt-slider"><div class="track"><div class="fill" style="width:70%"></div><div class="knob" style="left:70%"></div></div><div class="val">85</div></div>
|
||
<div class="dt-help-text">Higher means rows must look more alike to count as a duplicate.</div></div>
|
||
<div class="dt-field"><label class="dt-label">When duplicates are found, keep</label>
|
||
<div class="dt-select">the most-complete row</div>
|
||
<div class="dt-help-text">Which row survives in each group of duplicates.</div></div>
|
||
</div>
|
||
|
||
<!-- Advanced options (single expander; basics live above) -->
|
||
<details class="dt-expander">
|
||
<summary>Advanced options</summary>
|
||
<div class="dt-expander-body">
|
||
<p class="dt-help-text" style="margin-top:0">Leave these empty to auto-detect which columns to compare. Otherwise, list the columns that must match <strong>exactly</strong> and the ones that only need to match <strong>approximately</strong> — together these are the columns used to find duplicates.</p>
|
||
<div class="dt-cols-2">
|
||
<div>
|
||
<div class="dt-field"><label class="dt-label">Columns that must match exactly</label>
|
||
<div class="dt-multiselect"><span class="dt-ms-chip">email <span class="x">✕</span></span></div></div>
|
||
<div class="dt-field"><label class="dt-label">Columns to match approximately</label>
|
||
<div class="dt-multiselect"><span class="dt-ms-chip">name <span class="x">✕</span></span></div></div>
|
||
</div>
|
||
<div>
|
||
<div class="dt-field"><label class="dt-label">Approximate-match algorithm</label><div class="dt-select">jaro_winkler</div></div>
|
||
</div>
|
||
</div>
|
||
<div class="dt-check on" style="margin-top:6px"><span class="box"><span class="dt-mi">check</span></span> Merge mode — fill missing fields in the surviving row</div>
|
||
</div>
|
||
</details>
|
||
|
||
<hr class="dt-divider">
|
||
<button class="dt-btn dt-btn-primary dt-btn-block">Find Duplicates</button>
|
||
|
||
<hr class="dt-divider">
|
||
|
||
<!-- Results -->
|
||
<h2>Results</h2>
|
||
<div class="dt-metrics">
|
||
<div class="dt-metric"><div class="label">Original rows</div><div class="value">18,442</div></div>
|
||
<div class="dt-metric"><div class="label">Duplicate rows</div><div class="value">312</div><div class="delta down">−312 removed</div></div>
|
||
<div class="dt-metric"><div class="label">Match groups</div><div class="value">147</div></div>
|
||
<div class="dt-metric"><div class="label">Rows kept</div><div class="value">18,130</div></div>
|
||
</div>
|
||
<p class="dt-caption">Preview of an auto-resolved run: each group keeps its auto-picked survivor. Review the groups below to override any pending picks before the final download.</p>
|
||
<div class="dt-btn-row" style="max-width:560px">
|
||
<button class="dt-btn">Download auto-resolved CSV</button>
|
||
<button class="dt-btn">Download removed rows</button>
|
||
</div>
|
||
|
||
<hr class="dt-divider">
|
||
|
||
<!-- Match groups -->
|
||
<h2>Match Groups</h2>
|
||
<div class="dt-cols-3" style="max-width:520px">
|
||
<button class="dt-btn">Accept All</button>
|
||
<button class="dt-btn">Reject All</button>
|
||
<button class="dt-btn">Clear Decisions</button>
|
||
</div>
|
||
<p class="dt-caption" style="margin-top:8px">Differing columns are highlighted. The survivor row is kept; uncheck a row to split it out of the group.</p>
|
||
|
||
<!-- Match group card 1 -->
|
||
<div class="dt-match-card">
|
||
<div class="dt-match-head">
|
||
<span class="title">Group 1 · 2 rows</span>
|
||
<span class="conf"><span class="dt-count-pill success">98% match</span></span>
|
||
</div>
|
||
<div class="dt-match-body">
|
||
<div class="dt-table-wrap">
|
||
<table class="dt-table">
|
||
<thead><tr><th>keep</th><th>name</th><th>email</th><th>city</th><th>phone</th><th>signup_date</th></tr></thead>
|
||
<tbody>
|
||
<tr class="dt-keep-row"><td><span class="dt-keep-tag">keep</span></td><td>Jane Doe</td><td>jane@acme.io</td><td>Austin</td><td>512-555-0190</td><td>2024-01-04</td></tr>
|
||
<tr><td><span class="dt-caption">remove</span></td><td class="dt-cell-flag">jane doe</td><td class="dt-cell-flag">JANE@ACME.IO</td><td class="dt-cell-flag">austin</td><td>(512) 555-0190</td><td class="dt-cell-flag">01/04/2024</td></tr>
|
||
</tbody>
|
||
</table>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- Match group card 2 -->
|
||
<div class="dt-match-card">
|
||
<div class="dt-match-head">
|
||
<span class="title">Group 2 · 2 rows</span>
|
||
<span class="conf"><span class="dt-count-pill warn">87% match</span></span>
|
||
</div>
|
||
<div class="dt-match-body">
|
||
<div class="dt-table-wrap">
|
||
<table class="dt-table">
|
||
<thead><tr><th>keep</th><th>name</th><th>email</th><th>city</th><th>phone</th><th>signup_date</th></tr></thead>
|
||
<tbody>
|
||
<tr class="dt-keep-row"><td><span class="dt-keep-tag">keep</span></td><td>Bob Smith</td><td>bob@globex.com</td><td>Denver</td><td>720-555-7781</td><td>2024-02-11</td></tr>
|
||
<tr><td><span class="dt-caption">remove</span></td><td class="dt-cell-flag">R. Smith</td><td>bob@globex.com</td><td>Denver</td><td>720-555-7781</td><td>2024-02-11</td></tr>
|
||
</tbody>
|
||
</table>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<p class="dt-caption" style="margin-top:14px">Decisions: 1 merged, 1 pending · Pending groups keep their auto-picked survivor unless you review them.</p>
|
||
<button class="dt-btn dt-btn-primary dt-btn-block" style="margin-top:8px">Apply Review Decisions & Download Final CSV</button>
|
||
|
||
<!-- Processing log -->
|
||
<details class="dt-expander" style="margin-top:18px">
|
||
<summary>Processing Log</summary>
|
||
<div class="dt-expander-body">
|
||
<div class="dt-code">[00:00.01] Loaded 18,442 rows from customers_export.csv
|
||
[00:00.04] Strategy: exact(email) + fuzzy(name, jaro_winkler ≥ 85)
|
||
[00:00.91] Compared 18,442 rows → 147 match groups
|
||
[00:01.02] Survivor rule: most-complete · merge=on
|
||
[00:01.05] 312 rows flagged for removal</div>
|
||
</div>
|
||
</details>
|
||
|
||
</div>
|
||
</main>
|
||
</div>
|
||
<footer class="dt-footer" id="dt-footer"></footer>
|
||
<script src="shell.js"></script>
|
||
</body>
|
||
</html>
|