feat(layout-review): address review findings on pages 7-12

Find Duplicates (01_deduplicator):
- Delete the redundant outer Options wrapper; surface threshold +
  survivor rule directly, push the rest behind a single Advanced pane.
- Disambiguate competing primaries: top result is an auto-resolved
  preview (secondary download), review decisions are the single primary.
- Plain-English match labels (exact / approximate); clarify the third.
- Lift the match-card caption to a one-time instruction; note delimiter
  is delimited-text-only.

Quality Check (08_validator_reporter) — stub:
- Remove the dead disabled "Load rules file (JSON)" uploader so the
  stub invites a single action; keep the informative feature list.

Map Columns (05_column_mapper):
- Regroup schema -> mapping -> strategy/advanced (core task contiguous).
- Make preset-vs-Advanced precedence legible (Custom + modified marker).
- Adopt the compact file-intake banner; drop the duplicate resolved-
  mapping table; fix the add-row gutter style.

Combine Files (07_multi_file_merger) — stub:
- Actually disable the Merge CTA (add the disabled attribute).

PDF to CSV (10_pdf_extractor):
- Drop page/raw from the default preview to match export + fix the
  horizontal clip; surface raw via per-row affordance + overflow-x.
- Move the column selector above the download button; give auto-excluded
  rows a reason; align the files card to Home; de-dupe the row count.

Automated Workflows (09_pipeline_runner):
- Replace hand-edited JSON step config with per-step control expanders;
  JSON moved behind Advanced import/export.
- Editing the table marks the mode modified; fold the empty error column
  into the status pill; render summaries as plain English; collapse the
  explainer by default.

Cross-cutting items (stub standardization on page 10, shared disabled-
field token, remaining intake rollout) deferred to a holistic pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-08 16:35:46 +00:00
parent 563d845b70
commit cf31d9ef14
6 changed files with 302 additions and 183 deletions

View File

@@ -67,69 +67,192 @@
<summary>Options</summary>
<div class="dt-expander-body">
<!-- Mode radio -->
<!-- Mode radio. Editing the steps below auto-switches the mode from the
recommended default to "Build interactively" (same precedence-visibility
pattern as Fix Missing Values: the active state is made legible, and the
default it superseded is marked "· modified"). -->
<div class="dt-field">
<label class="dt-label">How would you like to define the pipeline?</label>
<div class="dt-radio-row" style="flex-direction:column;gap:9px">
<span class="dt-radio on"><span class="dot"></span> Use the recommended default (text-clean → format → missing → dedup)</span>
<span class="dt-radio"><span class="dot"></span> Build interactively</span>
<span class="dt-radio"><span class="dot"></span> Use the recommended default (text-clean → format → missing → dedup) <span class="dt-count-pill warn" style="margin-left:4px">· modified</span></span>
<span class="dt-radio on"><span class="dot"></span> Build interactively</span>
<span class="dt-radio"><span class="dot"></span> Import a saved pipeline JSON</span>
</div>
</div>
<div class="dt-precedence">
<span class="dt-mi">edit</span>
<span>You started from the recommended default and edited a step, so the mode switched to <strong>Build interactively</strong>. The steps below are now yours to change — pick <strong>recommended default</strong> again to discard your edits and restore the suggested order.</span>
</div>
<p class="dt-caption" style="margin:10px 0">
Edit the table to add, remove, reorder (drag the row index), enable, or configure each step.
Add, remove, reorder (drag the row index), enable, or configure each step.
Open a step's <strong>Configure</strong> panel to set its options in plain language.
Tool order is recommended, not enforced — violations surface as warnings below the table.
</p>
<!-- Pipeline editor (st.data_editor: Tool selectbox · Enabled checkbox · Options JSON) -->
<!-- Pipeline editor. Each step row carries an enable toggle + a "Configure"
expander that reveals that tool's OWN controls as the editing surface
(built from .dt-* form classes). Raw per-row JSON has been removed;
JSON survives only as import/export under "Advanced" below. -->
<div class="dt-table-wrap">
<table class="dt-table">
<thead>
<tr>
<th class="idx"></th>
<th>Tool</th>
<th>Enabled</th>
<th>Options (JSON)</th>
<th>Step</th>
<th style="text-align:center">Enabled</th>
<th style="text-align:right">Configure</th>
</tr>
</thead>
<tbody>
<tr>
<td class="idx">≡ 0</td>
<td>text_clean <span class="dt-mi" style="font-size:14px;vertical-align:-2px;color:var(--ink-tertiary)">expand_more</span></td>
<td>text_clean</td>
<td><span class="dt-check on" style="margin:0;justify-content:center"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td>{"trim": true, "collapse_whitespace": true}</td>
</tr>
<tr>
<td class="idx">≡ 1</td>
<td>format_standardize <span class="dt-mi" style="font-size:14px;vertical-align:-2px;color:var(--ink-tertiary)">expand_more</span></td>
<td><span class="dt-check on" style="margin:0;justify-content:center"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td>{"column_types": {"phone": "phone", "signup_date": "date"}}</td>
</tr>
<tr>
<td class="idx">≡ 2</td>
<td>missing <span class="dt-mi" style="font-size:14px;vertical-align:-2px;color:var(--ink-tertiary)">expand_more</span></td>
<td><span class="dt-check on" style="margin:0;justify-content:center"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td>{"strategy": "flag", "sentinels": ["N/A", "—"]}</td>
</tr>
<tr>
<td class="idx">≡ 3</td>
<td>dedup <span class="dt-mi" style="font-size:14px;vertical-align:-2px;color:var(--ink-tertiary)">expand_more</span></td>
<td><span class="dt-check on" style="margin:0;justify-content:center"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td>{"survivor_rule": "most_complete", "merge": true}</td>
</tr>
<tr>
<td class="idx" style="color:var(--ink-tertiary)"></td>
<td colspan="3" style="color:var(--ink-tertiary);font-family:var(--font-sans)">Add row</td>
<td style="text-align:right;color:var(--ink-tertiary)"><span class="dt-mi" style="font-size:16px;vertical-align:-3px">tune</span> Configure <span class="dt-mi" style="font-size:14px;vertical-align:-2px">expand_more</span></td>
</tr>
</tbody>
</table>
</div>
<!-- text_clean config panel (open to show the per-step editing surface) -->
<details class="dt-expander" open style="margin:6px 0 10px">
<summary>Configure: text_clean</summary>
<div class="dt-expander-body">
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Trim leading &amp; trailing whitespace</div>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Collapse repeated spaces to one</div>
<div class="dt-check"><span class="box"></span> Normalize smart quotes &amp; dashes to plain ASCII</div>
<div class="dt-field">
<label class="dt-label">Letter case</label>
<div class="dt-select">Leave as-is</div>
</div>
</div>
</details>
<div class="dt-table-wrap">
<table class="dt-table">
<tbody>
<tr>
<td class="idx">≡ 1</td>
<td>format_standardize</td>
<td><span class="dt-check on" style="margin:0;justify-content:center"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td style="text-align:right;color:var(--ink-tertiary)"><span class="dt-mi" style="font-size:16px;vertical-align:-3px">tune</span> Configure <span class="dt-mi" style="font-size:14px;vertical-align:-2px">chevron_right</span></td>
</tr>
</tbody>
</table>
</div>
<!-- format_standardize config panel (collapsed) -->
<details class="dt-expander" style="margin:6px 0 10px">
<summary>Configure: format_standardize</summary>
<div class="dt-expander-body">
<p class="dt-caption" style="margin-bottom:8px">Choose a target format for each column. Columns left as &ldquo;Leave as-is&rdquo; are untouched.</p>
<div class="dt-table-wrap">
<table class="dt-table">
<thead><tr><th>Column</th><th>Format as</th></tr></thead>
<tbody>
<tr><td>name</td><td><span class="dt-select" style="display:inline-block;min-width:150px;padding:4px 24px 4px 10px;color:var(--ink-tertiary)">Leave as-is</span></td></tr>
<tr><td>email</td><td><span class="dt-select" style="display:inline-block;min-width:150px;padding:4px 24px 4px 10px;color:var(--ink-tertiary)">Leave as-is</span></td></tr>
<tr><td>phone</td><td><span class="dt-select" style="display:inline-block;min-width:150px;padding:4px 24px 4px 10px">Phone number</span></td></tr>
<tr><td>signup_date</td><td><span class="dt-select" style="display:inline-block;min-width:150px;padding:4px 24px 4px 10px">Date</span></td></tr>
</tbody>
</table>
</div>
</div>
</details>
<div class="dt-table-wrap">
<table class="dt-table">
<tbody>
<tr>
<td class="idx">≡ 2</td>
<td>missing</td>
<td><span class="dt-check on" style="margin:0;justify-content:center"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td style="text-align:right;color:var(--ink-tertiary)"><span class="dt-mi" style="font-size:16px;vertical-align:-3px">tune</span> Configure <span class="dt-mi" style="font-size:14px;vertical-align:-2px">chevron_right</span></td>
</tr>
</tbody>
</table>
</div>
<!-- missing config panel (collapsed) -->
<details class="dt-expander" style="margin:6px 0 10px">
<summary>Configure: missing</summary>
<div class="dt-expander-body">
<div class="dt-field">
<label class="dt-label">What should happen to blank cells?</label>
<div class="dt-radio-row" style="flex-direction:column;gap:8px">
<span class="dt-radio on"><span class="dot"></span> Flag them (mark blanks, change nothing)</span>
<span class="dt-radio"><span class="dot"></span> Fill them in (numbers → median, text → most common)</span>
<span class="dt-radio"><span class="dot"></span> Drop rows that have any blank</span>
</div>
</div>
<div class="dt-field">
<label class="dt-label">Treat these as blank (comma-separated)</label>
<div class="dt-input">N/A, —</div>
<div class="dt-help-text">Matched case-insensitively after stripping whitespace.</div>
</div>
</div>
</details>
<div class="dt-table-wrap">
<table class="dt-table">
<tbody>
<tr>
<td class="idx">≡ 3</td>
<td>dedup</td>
<td><span class="dt-check on" style="margin:0;justify-content:center"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td style="text-align:right;color:var(--ink-tertiary)"><span class="dt-mi" style="font-size:16px;vertical-align:-3px">tune</span> Configure <span class="dt-mi" style="font-size:14px;vertical-align:-2px">chevron_right</span></td>
</tr>
<tr>
<td class="idx" style="color:var(--ink-tertiary)"></td>
<td colspan="3" style="color:var(--ink-tertiary);font-family:var(--font-sans)">Add step</td>
</tr>
</tbody>
</table>
</div>
<!-- dedup config panel (collapsed) -->
<details class="dt-expander" style="margin:6px 0 10px">
<summary>Configure: dedup</summary>
<div class="dt-expander-body">
<div class="dt-field">
<label class="dt-label">When rows match, which one survives?</label>
<div class="dt-select">Keep the most complete row</div>
<div class="dt-help-text">Other options: keep the first seen, keep the last seen.</div>
</div>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Merge matched rows (fill each survivor's blanks from its duplicates)</div>
<div class="dt-field">
<label class="dt-label">Match on these columns</label>
<div class="dt-multiselect">
<span class="dt-ms-chip">email <span class="x"></span></span>
<span class="dt-ms-chip">phone <span class="x"></span></span>
</div>
</div>
</div>
</details>
<!-- Validation: pipeline is in recommended order, so no warning shown (warning block omitted) -->
<!-- Advanced: JSON is import/export only, never the per-step editing surface -->
<details class="dt-expander" style="margin-top:14px">
<summary>Advanced — import / export pipeline as JSON</summary>
<div class="dt-expander-body">
<p class="dt-caption" style="margin-bottom:8px">For sharing or version control. Editing is done in the step panels above — this is just the saved form of the same settings.</p>
<div class="dt-code">{
"version": 1,
"steps": [
{"tool": "text_clean", "enabled": true, "options": {"trim": true, "collapse_whitespace": true}},
{"tool": "format_standardize", "enabled": true, "options": {"column_types": {"phone": "phone", "signup_date": "date"}}},
{"tool": "missing", "enabled": true, "options": {"strategy": "flag", "sentinels": ["N/A", "—"]}},
{"tool": "dedup", "enabled": true, "options": {"survivor_rule": "most_complete", "merge": true, "keys": ["email", "phone"]}}
]
}</div>
<div class="dt-btn-row" style="margin-top:10px">
<button class="dt-btn"><span class="dt-mi">upload</span> Import JSON</button>
<button class="dt-btn"><span class="dt-mi">download</span> Export JSON</button>
</div>
</div>
</details>
<!-- Nested explainer expander -->
<details class="dt-expander" open style="margin-top:14px">
<details class="dt-expander" style="margin-top:14px">
<summary>Recommended tool order — why each step belongs where it does</summary>
<div class="dt-expander-body">
<p><strong>text_clean</strong> before <strong>format_standardize</strong> — format parsers (phone / currency / date) fail on smart-quote-contaminated or NBSP-padded input — clean text first</p>
@@ -161,39 +284,49 @@
</div>
<h4>Per-step summary</h4>
<!-- Standalone error column removed: status is one pill per step. A failed step
turns the pill danger and surfaces its message in a detail row directly below
that step (shown only on failure); successful steps just show a green pill.
Summaries are plain-English phrases, not raw JSON. Demo: this run completed
cleanly (all four ok, matching the metrics above) — the format_standardize
row carries a warn pill + detail row to illustrate how a non-fatal step issue
surfaces inline without a dedicated always-empty column. -->
<div class="dt-table-wrap">
<table class="dt-table">
<thead>
<tr><th>step</th><th>status</th><th>elapsed_ms</th><th>summary</th><th>error</th></tr>
<tr><th>step</th><th>status</th><th>elapsed</th><th>summary</th></tr>
</thead>
<tbody>
<tr>
<td>text_clean</td>
<td><span class="dt-count-pill success">ok</span></td>
<td>214</td>
<td>{"cells_changed": 1204, "columns": ["name", "city"]}</td>
<td></td>
<td>214 ms</td>
<td style="font-family:var(--font-sans)">1,204 cells changed in name &amp; city</td>
</tr>
<tr>
<td>format_standardize</td>
<td><span class="dt-count-pill success">ok</span></td>
<td>388</td>
<td>{"phone": 18301, "signup_date": 17996}</td>
<td><span class="dt-count-pill warn"><span class="dt-mi" style="font-size:13px;margin-right:3px">warning</span> ok · 141 skipped</span></td>
<td>388 ms</td>
<td style="font-family:var(--font-sans)">18,301 phones and 17,996 dates standardized</td>
</tr>
<tr style="background:var(--warn-fill)">
<td></td>
<td colspan="3" style="font-family:var(--font-sans);color:var(--warn);white-space:normal">
<span class="dt-mi" style="font-size:15px;vertical-align:-3px;margin-right:4px">info</span>
141 phone values didn't match any known pattern and were left unchanged. The step still completed — review them in the output preview if needed.
</td>
</tr>
<tr>
<td>missing</td>
<td><span class="dt-count-pill success">ok</span></td>
<td>121</td>
<td>{"flagged_cells": 642, "sentinels_found": ["—"]}</td>
<td></td>
<td>121 ms</td>
<td style="font-family:var(--font-sans)">642 blank cells flagged (sentinel &ldquo;&rdquo;)</td>
</tr>
<tr>
<td>dedup</td>
<td><span class="dt-count-pill success">ok</span></td>
<td>911</td>
<td>{"input_rows": 18442, "output_rows": 18130, "duplicates_removed": 312, "groups": 147}</td>
<td></td>
<td>911 ms</td>
<td style="font-family:var(--font-sans)">312 duplicates removed across 147 groups (18,442 → 18,130 rows)</td>
</tr>
</tbody>
</table>