Files
datatools-dev/layout-review/02_text_cleaner.html
Michael 48251b625f refactor(layout-review): consolidate tool-header actions + align reconcile downloads
Consistency pass over the parallel-agent work:
- Replace 4 divergent inline header wrappers (flex/inline-flex, gap
  10/12px, margin-top present/absent across 8 tool pages) with one shared
  .dt-tool-header-actions class; strip the now-redundant per-button
  margin-top:0. Every tool header now aligns the local-first pill + Help
  button identically.
- Reconcile downloads row: reorder to the page's exceptions-first order
  (Review, Unmatched left, Unmatched right, Matched) to match the tabs and
  metric strip, and drop the lone competing primary — the four are
  parallel exports of equal weight.

Audited and confirmed already-consistent: compact intake banner, privacy
pill markup, .dt-next-step strips, the three coming-soon stubs, primary
CTAs, and the 3-download CSV/audit/config pattern.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:50:25 +00:00

224 lines
14 KiB
HTML

<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Layout review — Clean Text</title>
<link rel="stylesheet" href="app.css">
<style>
/* Hidden-character badges — mirrors src/core/text_clean.py:hidden_char_css(),
not part of app.css so reproduced inline against the same palette. */
.hidden-char { display: inline-block; padding: 0 2px; margin: 0 1px; border-radius: 3px; font-family: var(--font-mono); font-size: 0.85em; cursor: help; }
.hidden-char.hidden-whitespace { background: #fff3cd; color: #856404; border: 1px solid #ffeaa7; }
.hidden-char.hidden-special { background: #d1ecf1; color: #0c5460; border: 1px solid #bee5eb; }
.hidden-char.hidden-control { background: #f8d7da; color: #721c24; border: 1px solid #f5c6cb; }
</style>
</head>
<body data-page="02_text_cleaner">
<div class="dt-app">
<aside class="dt-sidebar" id="dt-sidebar"></aside>
<main class="dt-main">
<div class="dt-review-banner">
<span class="dt-mi">visibility</span>
<span>Static layout preview of <strong>Clean Text</strong>, shown with a file imported and a completed run (results metrics, changes-by-column, before/after examples, cleaned preview, downloads). <a href="index.html">All pages →</a></span>
</div>
<div class="dt-main-inner">
<!-- Tool header -->
<div class="dt-tool-header">
<h1>Clean Text</h1>
<div class="dt-tool-header-actions">
<span class="dt-privacy-pill">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor">
<rect x="4" y="11" width="16" height="10" rx="2"/>
<path d="M8 11V7a4 4 0 018 0v4"/>
</svg>
Runs 100% locally
</span>
<button class="dt-help-btn"><span class="dt-mi">help_outline</span> Help</button>
</div>
</div>
<p class="dt-tool-caption">Trim extra spaces and strip out odd characters.</p>
<div class="dt-spacer"></div>
<!-- File pickup banner (using file from upload screen) -->
<div class="dt-alert info">
<span class="dt-mi">description</span>
<span>Using <strong>contacts_messy.csv</strong> from the upload screen.</span>
</div>
<button class="dt-btn" style="margin-bottom:4px">Use a different file</button>
<!-- Preview expander (collapsed once a result exists) -->
<details class="dt-expander">
<summary>Preview: contacts_messy.csv</summary>
<div class="dt-expander-body">
<p class="dt-caption">4,120 rows, 4 columns</p>
<div class="dt-check on" style="margin-top:2px"><span class="box"><span class="dt-mi">check</span></span> Show hidden characters</div>
<div style="display:flex;flex-wrap:wrap;align-items:center;gap:14px;margin-top:6px;font-size:12px;color:var(--ink-secondary)">
<span style="display:inline-flex;align-items:center;gap:6px"><span class="hidden-char hidden-whitespace" style="cursor:default">·</span> Whitespace</span>
<span style="display:inline-flex;align-items:center;gap:6px"><span class="hidden-char hidden-special" style="cursor:default"></span> Smart / special</span>
<span style="display:inline-flex;align-items:center;gap:6px"><span class="hidden-char hidden-control" style="cursor:default"></span> Control</span>
</div>
<div class="dt-table-wrap" style="margin-top:8px">
<table class="dt-table">
<thead><tr><th class="idx"></th><th>name</th><th>email</th><th>company</th><th>notes</th></tr></thead>
<tbody>
<tr><td class="idx">0</td><td><span class="hidden-char hidden-whitespace" title="U+0020 SP LEAD">·</span>Jane Doe<span class="hidden-char hidden-whitespace" title="U+0020 SP TRAIL">·</span></td><td>jane@acme.io</td><td>Acme<span class="hidden-char hidden-whitespace" title="U+00A0 NBSP">·</span>Inc.</td><td>VIP<span class="hidden-char hidden-special" title="U+201D RIGHT DOUBLE QUOTE"></span></td></tr>
<tr><td class="idx">1</td><td>Bob&nbsp;&nbsp;Smith</td><td>bob@globex.com<span class="hidden-char hidden-special" title="U+200B ZWSP"></span></td><td>Globex</td><td><span class="hidden-char hidden-control" title="U+0007 CTRL"></span></td></tr>
<tr><td class="idx">2</td><td>Ana López</td><td>ana@initech.com</td><td>Initech<span class="hidden-char hidden-whitespace" title="U+0020 SP TRAIL">·</span></td><td>follow&nbsp;up</td></tr>
<tr><td class="idx">3</td><td><span class="hidden-char hidden-whitespace" title="U+0009 TAB"></span>Wei Chen</td><td>WEI@umbrella.co</td><td>Umbrella</td><td>“key<span class="hidden-char hidden-special" title="U+2014 EM DASH"></span>account”</td></tr>
</tbody>
</table>
</div>
</div>
</details>
<hr class="dt-divider">
<!-- Options expander (collapsed once a result exists) -->
<details class="dt-expander">
<summary>Options</summary>
<div class="dt-expander-body">
<div class="dt-field">
<label class="dt-label">Preset</label>
<div class="dt-radio-row">
<span class="dt-radio on"><span class="dot"></span> excel-hygiene (recommended)</span>
<span class="dt-radio"><span class="dot"></span> minimal</span>
<span class="dt-radio"><span class="dot"></span> paranoid</span>
</div>
<div class="dt-help-text">
minimal: trim and collapse whitespace only — no character substitutions.<br>
excel-hygiene: trim, collapse whitespace, fold smart quotes, strip invisible chars, normalize line endings, and normalize accented characters.<br>
paranoid: everything in excel-hygiene plus strip control characters, strip BOM, and normalize accented and look-alike characters (lossy).
</div>
</div>
<details class="dt-expander">
<summary>Advanced options</summary>
<div class="dt-expander-body">
<div class="dt-cols-2">
<div>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Trim leading/trailing whitespace</div>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Collapse internal whitespace</div>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Normalize line endings (\r\n → \n)</div>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Strip control characters</div>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Strip BOM</div>
</div>
<div>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Fold smart characters (curly quotes, em-dash, NBSP)</div>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Strip zero-width / invisible characters</div>
<div class="dt-check on" title="Unicode NFC normalization"><span class="box"><span class="dt-mi">check</span></span> Normalize accented characters (NFC)</div>
<div class="dt-check" title="Unicode NFKC compatibility fold"><span class="box"></span> Normalize accented and look-alike characters (lossy: ① → 1, fi → fi)</div>
</div>
</div>
<h4>Scope</h4>
<div class="dt-field">
<label class="dt-label">Columns to clean (default: all string columns)</label>
<div class="dt-multiselect">
<span class="dt-ms-chip">name <span class="x"></span></span>
<span class="dt-ms-chip">email <span class="x"></span></span>
<span class="dt-ms-chip">company <span class="x"></span></span>
<span class="dt-ms-chip">notes <span class="x"></span></span>
</div>
</div>
<div class="dt-field">
<label class="dt-label">Columns to skip even if they look like text</label>
<div class="dt-multiselect"><span class="dt-ms-placeholder">Choose columns to leave untouched</span></div>
</div>
<h4>Case conversion</h4>
<div class="dt-field" style="max-width:360px">
<label class="dt-label">Apply case conversion to selected columns</label>
<div class="dt-select">None</div>
</div>
</div>
</details>
</div>
</details>
<hr class="dt-divider">
<button class="dt-btn dt-btn-primary dt-btn-block">Clean Text</button>
<hr class="dt-divider">
<!-- Results -->
<h2>Results</h2>
<div class="dt-metrics">
<div class="dt-metric"><div class="label">Cells scanned</div><div class="value">16,480</div></div>
<div class="dt-metric"><div class="label">Cells changed</div><div class="value">3,947</div></div>
<div class="dt-metric"><div class="label">% changed</div><div class="value">24.0%</div></div>
<div class="dt-metric"><div class="label">Columns processed</div><div class="value">4</div></div>
</div>
<div class="dt-field">
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Show hidden characters (NBSP, ZWSP, smart quotes, control chars…)</div>
<div class="dt-help-text">Same setting as “Show hidden characters” in the preview above — toggling either updates both.</div>
</div>
<h4>Changes by column</h4>
<div class="dt-table-wrap" style="max-width:360px">
<table class="dt-table">
<thead><tr><th>column</th><th>cells_changed</th></tr></thead>
<tbody>
<tr><td>company</td><td>1,604</td></tr>
<tr><td>name</td><td>1,210</td></tr>
<tr><td>notes</td><td>982</td></tr>
<tr><td>email</td><td>151</td></tr>
</tbody>
</table>
</div>
<h4>Examples (first 25 changes)</h4>
<div class="dt-table-wrap">
<table class="dt-table">
<thead><tr><th>Row</th><th>Column</th><th>Before</th><th>After</th><th>Ops applied</th></tr></thead>
<tbody>
<tr><td>1</td><td>name</td><td><span class="hidden-char hidden-whitespace" title="U+0020 SP LEAD">·</span>Jane Doe<span class="hidden-char hidden-whitespace" title="U+0020 SP TRAIL">·</span></td><td>Jane Doe</td><td>trim</td></tr>
<tr><td>1</td><td>company</td><td>Acme<span class="hidden-char hidden-whitespace" title="U+00A0 NBSP">·</span>Inc.</td><td>Acme Inc.</td><td>fold_smart</td></tr>
<tr><td>1</td><td>notes</td><td>VIP<span class="hidden-char hidden-special" title="U+201D RIGHT DOUBLE QUOTE"></span></td><td>VIP"</td><td>fold_smart</td></tr>
<tr><td>2</td><td>name</td><td>Bob<span class="hidden-char hidden-whitespace" title="U+0020 SP">·</span><span class="hidden-char hidden-whitespace" title="U+0020 SP">·</span>Smith</td><td>Bob Smith</td><td>collapse_ws</td></tr>
<tr><td>2</td><td>email</td><td>bob@globex.com<span class="hidden-char hidden-special" title="U+200B ZWSP"></span></td><td>bob@globex.com</td><td>strip_zero_width</td></tr>
<tr><td>2</td><td>notes</td><td><span class="hidden-char hidden-control" title="U+0007 CTRL"></span></td><td></td><td>strip_control</td></tr>
<tr><td>3</td><td>company</td><td>Initech<span class="hidden-char hidden-whitespace" title="U+0020 SP TRAIL">·</span></td><td>Initech</td><td>trim</td></tr>
<tr><td>4</td><td>name</td><td><span class="hidden-char hidden-whitespace" title="U+0009 TAB"></span>Wei Chen</td><td>Wei Chen</td><td>trim</td></tr>
<tr><td>4</td><td>notes</td><td>“key<span class="hidden-char hidden-special" title="U+2014 EM DASH"></span>account”</td><td>"key-account"</td><td>fold_smart, nfc</td></tr>
</tbody>
</table>
</div>
<h4>Cleaned preview (first 10 rows)</h4>
<div class="dt-table-wrap">
<table class="dt-table">
<thead><tr><th class="idx"></th><th>name</th><th>email</th><th>company</th><th>notes</th></tr></thead>
<tbody>
<tr><td class="idx">0</td><td class="dt-cell-add">Jane Doe</td><td>jane@acme.io</td><td class="dt-cell-add">Acme Inc.</td><td class="dt-cell-add">VIP"</td></tr>
<tr><td class="idx">1</td><td class="dt-cell-add">Bob Smith</td><td class="dt-cell-add">bob@globex.com</td><td>Globex</td><td class="dt-cell-add"></td></tr>
<tr><td class="idx">2</td><td>Ana López</td><td>ana@initech.com</td><td class="dt-cell-add">Initech</td><td>follow up</td></tr>
<tr><td class="idx">3</td><td class="dt-cell-add">Wei Chen</td><td>WEI@umbrella.co</td><td>Umbrella</td><td class="dt-cell-add">"key-account"</td></tr>
</tbody>
</table>
</div>
<p class="dt-caption">Changed cells highlighted. Toggle “Show hidden characters” to inspect the invisibles being removed.</p>
<hr class="dt-divider">
<!-- Downloads -->
<div class="dt-cols-3">
<button class="dt-btn dt-btn-primary">Download cleaned CSV</button>
<button class="dt-btn">Download changes audit</button>
<button class="dt-btn">Download config JSON</button>
</div>
<!-- Next-step suggestion -->
<div class="dt-next-step"><span class="dt-mi">arrow_forward</span><span>Text cleaned. Next, most files need: <a href="03_format_standardizer.html">Standardize Formats →</a></span><button class="dt-next-step-dismiss" title="Dismiss"></button></div>
</div>
</main>
</div>
<footer class="dt-footer" id="dt-footer"></footer>
<script src="shell.js"></script>
</body>
</html>