Files
datatools-dev/layout-review/09_pipeline_runner.html
Michael cf31d9ef14 feat(layout-review): address review findings on pages 7-12
Find Duplicates (01_deduplicator):
- Delete the redundant outer Options wrapper; surface threshold +
  survivor rule directly, push the rest behind a single Advanced pane.
- Disambiguate competing primaries: top result is an auto-resolved
  preview (secondary download), review decisions are the single primary.
- Plain-English match labels (exact / approximate); clarify the third.
- Lift the match-card caption to a one-time instruction; note delimiter
  is delimited-text-only.

Quality Check (08_validator_reporter) — stub:
- Remove the dead disabled "Load rules file (JSON)" uploader so the
  stub invites a single action; keep the informative feature list.

Map Columns (05_column_mapper):
- Regroup schema -> mapping -> strategy/advanced (core task contiguous).
- Make preset-vs-Advanced precedence legible (Custom + modified marker).
- Adopt the compact file-intake banner; drop the duplicate resolved-
  mapping table; fix the add-row gutter style.

Combine Files (07_multi_file_merger) — stub:
- Actually disable the Merge CTA (add the disabled attribute).

PDF to CSV (10_pdf_extractor):
- Drop page/raw from the default preview to match export + fix the
  horizontal clip; surface raw via per-row affordance + overflow-x.
- Move the column selector above the download button; give auto-excluded
  rows a reason; align the files card to Home; de-dupe the row count.

Automated Workflows (09_pipeline_runner):
- Replace hand-edited JSON step config with per-step control expanders;
  JSON moved behind Advanced import/export.
- Editing the table marks the mode modified; fold the empty error column
  into the status pill; render summaries as plain English; collapse the
  explainer by default.

Cross-cutting items (stub standardization on page 10, shared disabled-
field token, remaining intake rollout) deferred to a holistic pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:35:46 +00:00

365 lines
21 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Layout review — Automated Workflows</title>
<link rel="stylesheet" href="app.css">
</head>
<body data-page="09_pipeline_runner">
<div class="dt-app">
<aside class="dt-sidebar" id="dt-sidebar"></aside>
<main class="dt-main">
<div class="dt-review-banner">
<span class="dt-mi">visibility</span>
<span>Static layout preview of <strong>Automated Workflows</strong> (Pipeline Runner), shown with a file imported, a four-step pipeline configured, and a completed run (results + per-step summary). <a href="index.html">All pages →</a></span>
</div>
<div class="dt-main-inner">
<!-- Tool header -->
<div class="dt-tool-header">
<h1>Automated Workflows</h1>
<button class="dt-help-btn"><span class="dt-mi">help_outline</span> Help</button>
</div>
<p class="dt-tool-caption">Run several tools in a row — save the steps once, reuse them anytime.</p>
<div class="dt-spacer"></div>
<!-- Upload (file staged) -->
<label class="dt-label">Import CSV or Excel file</label>
<div class="dt-uploader">
<div class="dt-uploader-text">
<span class="hint"><span class="dt-mi" style="vertical-align:-4px">upload_file</span> Drag and drop file here</span>
<span class="sub">Up to 1.5 GB · CSV, TSV, XLSX, XLS · encoding &amp; delimiter auto-detected</span>
</div>
<button class="dt-btn">Browse files</button>
</div>
<div class="dt-file-chip">
<span class="dt-file-icon-chip"><svg viewBox="0 0 24 24" fill="none" stroke="currentColor"><path d="M14 2H6a2 2 0 00-2 2v16a2 2 0 002 2h12a2 2 0 002-2V8z"/><path d="M14 2v6h6"/></svg></span>
<span class="name">customers_export.csv</span>
<span class="size">2.1 MB</span>
<button class="dt-btn dt-btn-tertiary" title="Remove"></button>
</div>
<!-- Preview expander (collapsed once a result exists) -->
<details class="dt-expander">
<summary>Preview: customers_export.csv</summary>
<div class="dt-expander-body">
<p class="dt-caption">18,442 rows, 6 columns</p>
<div class="dt-table-wrap">
<table class="dt-table">
<thead><tr><th class="idx"></th><th>name</th><th>email</th><th>city</th><th>phone</th><th>signup_date</th></tr></thead>
<tbody>
<tr><td class="idx">0</td><td> Jane Doe </td><td>jane@acme.io</td><td>Austin</td><td>512-555-0190</td><td>2024-01-04</td></tr>
<tr><td class="idx">1</td><td>jane doe</td><td>JANE@ACME.IO</td><td>austin</td><td>(512) 555-0190</td><td>01/04/2024</td></tr>
<tr><td class="idx">2</td><td>Bob Smith</td><td>bob@globex.com</td><td>Denver</td><td>720.555.7781</td><td>2024-02-11</td></tr>
<tr><td class="idx">3</td><td>R. Smith</td><td>bob@globex.com</td><td></td><td>720-555-7781</td><td>Feb 11 2024</td></tr>
</tbody>
</table>
</div>
</div>
</details>
<hr class="dt-divider">
<!-- Options: pipeline builder (collapsed once a result exists; opened here to show structure) -->
<details class="dt-expander" open>
<summary>Options</summary>
<div class="dt-expander-body">
<!-- Mode radio. Editing the steps below auto-switches the mode from the
recommended default to "Build interactively" (same precedence-visibility
pattern as Fix Missing Values: the active state is made legible, and the
default it superseded is marked "· modified"). -->
<div class="dt-field">
<label class="dt-label">How would you like to define the pipeline?</label>
<div class="dt-radio-row" style="flex-direction:column;gap:9px">
<span class="dt-radio"><span class="dot"></span> Use the recommended default (text-clean → format → missing → dedup) <span class="dt-count-pill warn" style="margin-left:4px">· modified</span></span>
<span class="dt-radio on"><span class="dot"></span> Build interactively</span>
<span class="dt-radio"><span class="dot"></span> Import a saved pipeline JSON</span>
</div>
</div>
<div class="dt-precedence">
<span class="dt-mi">edit</span>
<span>You started from the recommended default and edited a step, so the mode switched to <strong>Build interactively</strong>. The steps below are now yours to change — pick <strong>recommended default</strong> again to discard your edits and restore the suggested order.</span>
</div>
<p class="dt-caption" style="margin:10px 0">
Add, remove, reorder (drag the row index), enable, or configure each step.
Open a step's <strong>Configure</strong> panel to set its options in plain language.
Tool order is recommended, not enforced — violations surface as warnings below the table.
</p>
<!-- Pipeline editor. Each step row carries an enable toggle + a "Configure"
expander that reveals that tool's OWN controls as the editing surface
(built from .dt-* form classes). Raw per-row JSON has been removed;
JSON survives only as import/export under "Advanced" below. -->
<div class="dt-table-wrap">
<table class="dt-table">
<thead>
<tr>
<th class="idx"></th>
<th>Step</th>
<th style="text-align:center">Enabled</th>
<th style="text-align:right">Configure</th>
</tr>
</thead>
<tbody>
<tr>
<td class="idx">≡ 0</td>
<td>text_clean</td>
<td><span class="dt-check on" style="margin:0;justify-content:center"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td style="text-align:right;color:var(--ink-tertiary)"><span class="dt-mi" style="font-size:16px;vertical-align:-3px">tune</span> Configure <span class="dt-mi" style="font-size:14px;vertical-align:-2px">expand_more</span></td>
</tr>
</tbody>
</table>
</div>
<!-- text_clean config panel (open to show the per-step editing surface) -->
<details class="dt-expander" open style="margin:6px 0 10px">
<summary>Configure: text_clean</summary>
<div class="dt-expander-body">
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Trim leading &amp; trailing whitespace</div>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Collapse repeated spaces to one</div>
<div class="dt-check"><span class="box"></span> Normalize smart quotes &amp; dashes to plain ASCII</div>
<div class="dt-field">
<label class="dt-label">Letter case</label>
<div class="dt-select">Leave as-is</div>
</div>
</div>
</details>
<div class="dt-table-wrap">
<table class="dt-table">
<tbody>
<tr>
<td class="idx">≡ 1</td>
<td>format_standardize</td>
<td><span class="dt-check on" style="margin:0;justify-content:center"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td style="text-align:right;color:var(--ink-tertiary)"><span class="dt-mi" style="font-size:16px;vertical-align:-3px">tune</span> Configure <span class="dt-mi" style="font-size:14px;vertical-align:-2px">chevron_right</span></td>
</tr>
</tbody>
</table>
</div>
<!-- format_standardize config panel (collapsed) -->
<details class="dt-expander" style="margin:6px 0 10px">
<summary>Configure: format_standardize</summary>
<div class="dt-expander-body">
<p class="dt-caption" style="margin-bottom:8px">Choose a target format for each column. Columns left as &ldquo;Leave as-is&rdquo; are untouched.</p>
<div class="dt-table-wrap">
<table class="dt-table">
<thead><tr><th>Column</th><th>Format as</th></tr></thead>
<tbody>
<tr><td>name</td><td><span class="dt-select" style="display:inline-block;min-width:150px;padding:4px 24px 4px 10px;color:var(--ink-tertiary)">Leave as-is</span></td></tr>
<tr><td>email</td><td><span class="dt-select" style="display:inline-block;min-width:150px;padding:4px 24px 4px 10px;color:var(--ink-tertiary)">Leave as-is</span></td></tr>
<tr><td>phone</td><td><span class="dt-select" style="display:inline-block;min-width:150px;padding:4px 24px 4px 10px">Phone number</span></td></tr>
<tr><td>signup_date</td><td><span class="dt-select" style="display:inline-block;min-width:150px;padding:4px 24px 4px 10px">Date</span></td></tr>
</tbody>
</table>
</div>
</div>
</details>
<div class="dt-table-wrap">
<table class="dt-table">
<tbody>
<tr>
<td class="idx">≡ 2</td>
<td>missing</td>
<td><span class="dt-check on" style="margin:0;justify-content:center"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td style="text-align:right;color:var(--ink-tertiary)"><span class="dt-mi" style="font-size:16px;vertical-align:-3px">tune</span> Configure <span class="dt-mi" style="font-size:14px;vertical-align:-2px">chevron_right</span></td>
</tr>
</tbody>
</table>
</div>
<!-- missing config panel (collapsed) -->
<details class="dt-expander" style="margin:6px 0 10px">
<summary>Configure: missing</summary>
<div class="dt-expander-body">
<div class="dt-field">
<label class="dt-label">What should happen to blank cells?</label>
<div class="dt-radio-row" style="flex-direction:column;gap:8px">
<span class="dt-radio on"><span class="dot"></span> Flag them (mark blanks, change nothing)</span>
<span class="dt-radio"><span class="dot"></span> Fill them in (numbers → median, text → most common)</span>
<span class="dt-radio"><span class="dot"></span> Drop rows that have any blank</span>
</div>
</div>
<div class="dt-field">
<label class="dt-label">Treat these as blank (comma-separated)</label>
<div class="dt-input">N/A, —</div>
<div class="dt-help-text">Matched case-insensitively after stripping whitespace.</div>
</div>
</div>
</details>
<div class="dt-table-wrap">
<table class="dt-table">
<tbody>
<tr>
<td class="idx">≡ 3</td>
<td>dedup</td>
<td><span class="dt-check on" style="margin:0;justify-content:center"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td style="text-align:right;color:var(--ink-tertiary)"><span class="dt-mi" style="font-size:16px;vertical-align:-3px">tune</span> Configure <span class="dt-mi" style="font-size:14px;vertical-align:-2px">chevron_right</span></td>
</tr>
<tr>
<td class="idx" style="color:var(--ink-tertiary)"></td>
<td colspan="3" style="color:var(--ink-tertiary);font-family:var(--font-sans)">Add step</td>
</tr>
</tbody>
</table>
</div>
<!-- dedup config panel (collapsed) -->
<details class="dt-expander" style="margin:6px 0 10px">
<summary>Configure: dedup</summary>
<div class="dt-expander-body">
<div class="dt-field">
<label class="dt-label">When rows match, which one survives?</label>
<div class="dt-select">Keep the most complete row</div>
<div class="dt-help-text">Other options: keep the first seen, keep the last seen.</div>
</div>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Merge matched rows (fill each survivor's blanks from its duplicates)</div>
<div class="dt-field">
<label class="dt-label">Match on these columns</label>
<div class="dt-multiselect">
<span class="dt-ms-chip">email <span class="x"></span></span>
<span class="dt-ms-chip">phone <span class="x"></span></span>
</div>
</div>
</div>
</details>
<!-- Validation: pipeline is in recommended order, so no warning shown (warning block omitted) -->
<!-- Advanced: JSON is import/export only, never the per-step editing surface -->
<details class="dt-expander" style="margin-top:14px">
<summary>Advanced — import / export pipeline as JSON</summary>
<div class="dt-expander-body">
<p class="dt-caption" style="margin-bottom:8px">For sharing or version control. Editing is done in the step panels above — this is just the saved form of the same settings.</p>
<div class="dt-code">{
"version": 1,
"steps": [
{"tool": "text_clean", "enabled": true, "options": {"trim": true, "collapse_whitespace": true}},
{"tool": "format_standardize", "enabled": true, "options": {"column_types": {"phone": "phone", "signup_date": "date"}}},
{"tool": "missing", "enabled": true, "options": {"strategy": "flag", "sentinels": ["N/A", "—"]}},
{"tool": "dedup", "enabled": true, "options": {"survivor_rule": "most_complete", "merge": true, "keys": ["email", "phone"]}}
]
}</div>
<div class="dt-btn-row" style="margin-top:10px">
<button class="dt-btn"><span class="dt-mi">upload</span> Import JSON</button>
<button class="dt-btn"><span class="dt-mi">download</span> Export JSON</button>
</div>
</div>
</details>
<!-- Nested explainer expander -->
<details class="dt-expander" style="margin-top:14px">
<summary>Recommended tool order — why each step belongs where it does</summary>
<div class="dt-expander-body">
<p><strong>text_clean</strong> before <strong>format_standardize</strong> — format parsers (phone / currency / date) fail on smart-quote-contaminated or NBSP-padded input — clean text first</p>
<p><strong>text_clean</strong> before <strong>missing</strong> — sentinel detection misses cells padded with NBSP / zero-width characters — clean text first</p>
<p><strong>text_clean</strong> before <strong>dedup</strong> — fuzzy matching treats NBSP-padded values as different — clean text first</p>
<p><strong>format_standardize</strong> before <strong>missing</strong> — numeric imputation needs numeric dtypes; canonical phones / currencies improve sentinel detection</p>
<p><strong>format_standardize</strong> before <strong>dedup</strong> — canonical phones / lowercase emails enable cross-format duplicate matching</p>
<p style="margin-bottom:0"><strong>missing</strong> before <strong>dedup</strong> — deduping rows with mixed NaN sentinels produces brittle merges — resolve missing values first</p>
</div>
</details>
</div>
</details>
<hr class="dt-divider">
<!-- Run -->
<button class="dt-btn dt-btn-primary dt-btn-block">Run Pipeline</button>
<hr class="dt-divider">
<!-- Results -->
<h2>Results</h2>
<div class="dt-metrics">
<div class="dt-metric"><div class="label">Initial rows</div><div class="value">18,442</div></div>
<div class="dt-metric"><div class="label">Final rows</div><div class="value">18,130</div></div>
<div class="dt-metric"><div class="label">Steps run</div><div class="value">4</div></div>
<div class="dt-metric"><div class="label">Elapsed</div><div class="value">1.84 s</div></div>
</div>
<h4>Per-step summary</h4>
<!-- Standalone error column removed: status is one pill per step. A failed step
turns the pill danger and surfaces its message in a detail row directly below
that step (shown only on failure); successful steps just show a green pill.
Summaries are plain-English phrases, not raw JSON. Demo: this run completed
cleanly (all four ok, matching the metrics above) — the format_standardize
row carries a warn pill + detail row to illustrate how a non-fatal step issue
surfaces inline without a dedicated always-empty column. -->
<div class="dt-table-wrap">
<table class="dt-table">
<thead>
<tr><th>step</th><th>status</th><th>elapsed</th><th>summary</th></tr>
</thead>
<tbody>
<tr>
<td>text_clean</td>
<td><span class="dt-count-pill success">ok</span></td>
<td>214 ms</td>
<td style="font-family:var(--font-sans)">1,204 cells changed in name &amp; city</td>
</tr>
<tr>
<td>format_standardize</td>
<td><span class="dt-count-pill warn"><span class="dt-mi" style="font-size:13px;margin-right:3px">warning</span> ok · 141 skipped</span></td>
<td>388 ms</td>
<td style="font-family:var(--font-sans)">18,301 phones and 17,996 dates standardized</td>
</tr>
<tr style="background:var(--warn-fill)">
<td></td>
<td colspan="3" style="font-family:var(--font-sans);color:var(--warn);white-space:normal">
<span class="dt-mi" style="font-size:15px;vertical-align:-3px;margin-right:4px">info</span>
141 phone values didn't match any known pattern and were left unchanged. The step still completed — review them in the output preview if needed.
</td>
</tr>
<tr>
<td>missing</td>
<td><span class="dt-count-pill success">ok</span></td>
<td>121 ms</td>
<td style="font-family:var(--font-sans)">642 blank cells flagged (sentinel &ldquo;&rdquo;)</td>
</tr>
<tr>
<td>dedup</td>
<td><span class="dt-count-pill success">ok</span></td>
<td>911 ms</td>
<td style="font-family:var(--font-sans)">312 duplicates removed across 147 groups (18,442 → 18,130 rows)</td>
</tr>
</tbody>
</table>
</div>
<h4>Output preview (first 10 rows)</h4>
<div class="dt-table-wrap">
<table class="dt-table">
<thead><tr><th class="idx"></th><th>name</th><th>email</th><th>city</th><th>phone</th><th>signup_date</th></tr></thead>
<tbody>
<tr><td class="idx">0</td><td>Jane Doe</td><td>jane@acme.io</td><td>Austin</td><td class="dt-cell-add">+1 512-555-0190</td><td class="dt-cell-add">2024-01-04</td></tr>
<tr><td class="idx">1</td><td>Bob Smith</td><td>bob@globex.com</td><td>Denver</td><td class="dt-cell-add">+1 720-555-7781</td><td class="dt-cell-add">2024-02-11</td></tr>
<tr><td class="idx">2</td><td>Carla Reyes</td><td>carla@initech.co</td><td>Phoenix</td><td class="dt-cell-add">+1 480-555-3320</td><td class="dt-cell-add">2024-03-02</td></tr>
<tr><td class="idx">3</td><td>Dan Okafor</td><td>dan@umbrella.net</td><td><span class="dt-cell-flag">⚑ missing</span></td><td class="dt-cell-add">+1 206-555-7745</td><td class="dt-cell-add">2024-03-18</td></tr>
<tr><td class="idx">4</td><td>Emily Tran</td><td>emily@hooli.com</td><td>Seattle</td><td class="dt-cell-add">+1 206-555-1182</td><td class="dt-cell-add">2024-04-05</td></tr>
</tbody>
</table>
</div>
<hr class="dt-divider">
<!-- Downloads (3 columns) -->
<div class="dt-cols-3">
<button class="dt-btn dt-btn-primary"><span class="dt-mi">download</span> Download cleaned CSV</button>
<button class="dt-btn"><span class="dt-mi">download</span> Download pipeline JSON</button>
<button class="dt-btn"><span class="dt-mi">download</span> Download run audit</button>
</div>
</div>
</main>
</div>
<footer class="dt-footer" id="dt-footer"></footer>
<script src="shell.js"></script>
</body>
</html>