Files
datatools-dev/layout-review/01_deduplicator.html
Michael cf31d9ef14 feat(layout-review): address review findings on pages 7-12
Find Duplicates (01_deduplicator):
- Delete the redundant outer Options wrapper; surface threshold +
  survivor rule directly, push the rest behind a single Advanced pane.
- Disambiguate competing primaries: top result is an auto-resolved
  preview (secondary download), review decisions are the single primary.
- Plain-English match labels (exact / approximate); clarify the third.
- Lift the match-card caption to a one-time instruction; note delimiter
  is delimited-text-only.

Quality Check (08_validator_reporter) — stub:
- Remove the dead disabled "Load rules file (JSON)" uploader so the
  stub invites a single action; keep the informative feature list.

Map Columns (05_column_mapper):
- Regroup schema -> mapping -> strategy/advanced (core task contiguous).
- Make preset-vs-Advanced precedence legible (Custom + modified marker).
- Adopt the compact file-intake banner; drop the duplicate resolved-
  mapping table; fix the add-row gutter style.

Combine Files (07_multi_file_merger) — stub:
- Actually disable the Merge CTA (add the disabled attribute).

PDF to CSV (10_pdf_extractor):
- Drop page/raw from the default preview to match export + fix the
  horizontal clip; surface raw via per-row affordance + overflow-x.
- Move the column selector above the download button; give auto-excluded
  rows a reason; align the files card to Home; de-dupe the row count.

Automated Workflows (09_pipeline_runner):
- Replace hand-edited JSON step config with per-step control expanders;
  JSON moved behind Advanced import/export.
- Editing the table marks the mode modified; fold the empty error column
  into the status pill; render summaries as plain English; collapse the
  explainer by default.

Cross-cutting items (stub standardization on page 10, shared disabled-
field token, remaining intake rollout) deferred to a holistic pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:35:46 +00:00

191 lines
9.9 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Layout review — Find Duplicates</title>
<link rel="stylesheet" href="app.css">
</head>
<body data-page="01_deduplicator">
<div class="dt-app">
<aside class="dt-sidebar" id="dt-sidebar"></aside>
<main class="dt-main">
<div class="dt-review-banner">
<span class="dt-mi">visibility</span>
<span>Static layout preview of <strong>Find Duplicates</strong>, shown with a file imported and a completed run (results + match-group review). <a href="index.html">All pages →</a></span>
</div>
<div class="dt-main-inner">
<!-- Tool header -->
<div class="dt-tool-header">
<h1>Find Duplicates</h1>
<button class="dt-help-btn"><span class="dt-mi">help_outline</span> Help</button>
</div>
<p class="dt-tool-caption">Find rows that repeat, then keep one and remove the extras.</p>
<div class="dt-spacer"></div>
<!-- Upload (file staged) -->
<label class="dt-label">Import CSV or Excel file</label>
<div class="dt-uploader">
<div class="dt-uploader-text">
<span class="hint"><span class="dt-mi" style="vertical-align:-4px">upload_file</span> Drag and drop file here</span>
<span class="sub">Up to 1.5 GB · CSV, TSV, XLSX, XLS · encoding &amp; delimiter auto-detected</span>
</div>
<button class="dt-btn">Browse files</button>
</div>
<div class="dt-file-chip">
<span class="dt-file-icon-chip"><svg viewBox="0 0 24 24" fill="none" stroke="currentColor"><path d="M14 2H6a2 2 0 00-2 2v16a2 2 0 002 2h12a2 2 0 002-2V8z"/><path d="M14 2v6h6"/></svg></span>
<span class="name">customers_export.csv</span>
<span class="size">2.1 MB</span>
<button class="dt-btn dt-btn-tertiary" title="Remove"></button>
</div>
<!-- Delimiter selector — delimited-text only (CSV/TSV); omitted for XLSX/XLS.
Shown here because the staged file is customers_export.csv. -->
<div class="dt-field" style="max-width:320px">
<label class="dt-label">Delimiter</label>
<div class="dt-select">Comma (,)</div>
<div class="dt-help-text">Auto-detected on upload. Change if the preview looks wrong.</div>
</div>
<!-- Preview expander (collapsed after a result exists) -->
<details class="dt-expander">
<summary>Preview: customers_export.csv</summary>
<div class="dt-expander-body">
<p class="dt-caption">18,442 rows, 6 columns</p>
<div class="dt-table-wrap">
<table class="dt-table">
<thead><tr><th class="idx"></th><th>name</th><th>email</th><th>city</th><th>phone</th><th>signup_date</th></tr></thead>
<tbody>
<tr><td class="idx">0</td><td>Jane Doe</td><td>jane@acme.io</td><td>Austin</td><td>512-555-0190</td><td>2024-01-04</td></tr>
<tr><td class="idx">1</td><td>jane doe</td><td>JANE@ACME.IO</td><td>austin</td><td>(512) 555-0190</td><td>01/04/2024</td></tr>
<tr><td class="idx">2</td><td>Bob Smith</td><td>bob@globex.com</td><td>Denver</td><td>720-555-7781</td><td>2024-02-11</td></tr>
<tr><td class="idx">3</td><td>R. Smith</td><td>bob@globex.com</td><td>Denver</td><td>720-555-7781</td><td>2024-02-11</td></tr>
</tbody>
</table>
</div>
</div>
</details>
<!-- Basic controls (visible by default) -->
<div class="dt-cols-2">
<div class="dt-field"><label class="dt-label">Match threshold</label>
<div class="dt-slider"><div class="track"><div class="fill" style="width:70%"></div><div class="knob" style="left:70%"></div></div><div class="val">85</div></div>
<div class="dt-help-text">Higher means rows must look more alike to count as a duplicate.</div></div>
<div class="dt-field"><label class="dt-label">When duplicates are found, keep</label>
<div class="dt-select">the most-complete row</div>
<div class="dt-help-text">Which row survives in each group of duplicates.</div></div>
</div>
<!-- Advanced options (single expander; basics live above) -->
<details class="dt-expander">
<summary>Advanced options</summary>
<div class="dt-expander-body">
<p class="dt-help-text" style="margin-top:0">Leave these empty to auto-detect which columns to compare. Otherwise, list the columns that must match <strong>exactly</strong> and the ones that only need to match <strong>approximately</strong> — together these are the columns used to find duplicates.</p>
<div class="dt-cols-2">
<div>
<div class="dt-field"><label class="dt-label">Columns that must match exactly</label>
<div class="dt-multiselect"><span class="dt-ms-chip">email <span class="x"></span></span></div></div>
<div class="dt-field"><label class="dt-label">Columns to match approximately</label>
<div class="dt-multiselect"><span class="dt-ms-chip">name <span class="x"></span></span></div></div>
</div>
<div>
<div class="dt-field"><label class="dt-label">Approximate-match algorithm</label><div class="dt-select">jaro_winkler</div></div>
</div>
</div>
<div class="dt-check on" style="margin-top:6px"><span class="box"><span class="dt-mi">check</span></span> Merge mode — fill missing fields in the surviving row</div>
</div>
</details>
<hr class="dt-divider">
<button class="dt-btn dt-btn-primary dt-btn-block">Find Duplicates</button>
<hr class="dt-divider">
<!-- Results -->
<h2>Results</h2>
<div class="dt-metrics">
<div class="dt-metric"><div class="label">Original rows</div><div class="value">18,442</div></div>
<div class="dt-metric"><div class="label">Duplicate rows</div><div class="value">312</div><div class="delta down">312 removed</div></div>
<div class="dt-metric"><div class="label">Match groups</div><div class="value">147</div></div>
<div class="dt-metric"><div class="label">Rows kept</div><div class="value">18,130</div></div>
</div>
<p class="dt-caption">Preview of an auto-resolved run: each group keeps its auto-picked survivor. Review the groups below to override any pending picks before the final download.</p>
<div class="dt-btn-row" style="max-width:560px">
<button class="dt-btn">Download auto-resolved CSV</button>
<button class="dt-btn">Download removed rows</button>
</div>
<hr class="dt-divider">
<!-- Match groups -->
<h2>Match Groups</h2>
<div class="dt-cols-3" style="max-width:520px">
<button class="dt-btn">Accept All</button>
<button class="dt-btn">Reject All</button>
<button class="dt-btn">Clear Decisions</button>
</div>
<p class="dt-caption" style="margin-top:8px">Differing columns are highlighted. The survivor row is kept; uncheck a row to split it out of the group.</p>
<!-- Match group card 1 -->
<div class="dt-match-card">
<div class="dt-match-head">
<span class="title">Group 1 · 2 rows</span>
<span class="conf"><span class="dt-count-pill success">98% match</span></span>
</div>
<div class="dt-match-body">
<div class="dt-table-wrap">
<table class="dt-table">
<thead><tr><th>keep</th><th>name</th><th>email</th><th>city</th><th>phone</th><th>signup_date</th></tr></thead>
<tbody>
<tr class="dt-keep-row"><td><span class="dt-keep-tag">keep</span></td><td>Jane Doe</td><td>jane@acme.io</td><td>Austin</td><td>512-555-0190</td><td>2024-01-04</td></tr>
<tr><td><span class="dt-caption">remove</span></td><td class="dt-cell-flag">jane doe</td><td class="dt-cell-flag">JANE@ACME.IO</td><td class="dt-cell-flag">austin</td><td>(512) 555-0190</td><td class="dt-cell-flag">01/04/2024</td></tr>
</tbody>
</table>
</div>
</div>
</div>
<!-- Match group card 2 -->
<div class="dt-match-card">
<div class="dt-match-head">
<span class="title">Group 2 · 2 rows</span>
<span class="conf"><span class="dt-count-pill warn">87% match</span></span>
</div>
<div class="dt-match-body">
<div class="dt-table-wrap">
<table class="dt-table">
<thead><tr><th>keep</th><th>name</th><th>email</th><th>city</th><th>phone</th><th>signup_date</th></tr></thead>
<tbody>
<tr class="dt-keep-row"><td><span class="dt-keep-tag">keep</span></td><td>Bob Smith</td><td>bob@globex.com</td><td>Denver</td><td>720-555-7781</td><td>2024-02-11</td></tr>
<tr><td><span class="dt-caption">remove</span></td><td class="dt-cell-flag">R. Smith</td><td>bob@globex.com</td><td>Denver</td><td>720-555-7781</td><td>2024-02-11</td></tr>
</tbody>
</table>
</div>
</div>
</div>
<p class="dt-caption" style="margin-top:14px">Decisions: 1 merged, 1 pending · Pending groups keep their auto-picked survivor unless you review them.</p>
<button class="dt-btn dt-btn-primary dt-btn-block" style="margin-top:8px">Apply Review Decisions &amp; Download Final CSV</button>
<!-- Processing log -->
<details class="dt-expander" style="margin-top:18px">
<summary>Processing Log</summary>
<div class="dt-expander-body">
<div class="dt-code">[00:00.01] Loaded 18,442 rows from customers_export.csv
[00:00.04] Strategy: exact(email) + fuzzy(name, jaro_winkler ≥ 85)
[00:00.91] Compared 18,442 rows → 147 match groups
[00:01.02] Survivor rule: most-complete · merge=on
[00:01.05] 312 rows flagged for removal</div>
</div>
</details>
</div>
</main>
</div>
<footer class="dt-footer" id="dt-footer"></footer>
<script src="shell.js"></script>
</body>
</html>