Compare commits

...

11 Commits

Author SHA1 Message Date
28ab51a869 Merge ui-redesign: journey-level UX redesign + live-app port
Brings the design-review mockups and the highest-leverage live-app
changes into main:
- layout-review/ mockups: 12-page review addressed; front door, taught
  pipeline order, consistent intake, coming-soon stubs, shared tokens.
- Live src/gui/: nav reordered to pipeline order with new Finance +
  Coming-soon groups; Home is the "Start here" front door with a
  one-click "Clean these files for me" pipeline runner; local-first
  pill on every working tool header.
- DECISIONS.md: PDF to CSV + Reconcile kept in-bundle under Finance.

Full suite green: 2441 passed, 91 skipped, 0 failed.

Follow-ups tracked (not blockers): streamlit-run visual verification of
the live UI; i18n keys for the front-door copy (English literals today);
rebuild the live coming-soon stub page bodies.
2026-06-08 17:41:30 +00:00
1895074b8f test+fix(gui): retire the now-empty "analysis" nav section
The journey-level nav restructure moved Home to a standalone "Start
here" entry and Reconcile into the "Finance" group, leaving the
"analysis" section with zero tools. Two registry tests encoded the old
layout and failed:
- test_every_section_has_at_least_one_tool[analysis] (empty section)
- test_reconciler_present (asserted section == "analysis")

Drop "analysis" from the Section literal, SECTION_LABELS, and app.py's
by_section bucket — it's genuinely dead now (home isn't a registry Tool).
Update the presence tests to assert Reconcile + PDF to CSV live in
"finance". The section-invariant tests (every section non-empty, has a
label, no orphan labels) are preserved and pass.

Full suite: 2441 passed, 91 skipped, 0 failed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 17:11:02 +00:00
d807d3c11b feat(gui): add the one-click "Clean these files for me" front door
Issue #1 (the make-or-break UX fix): after the analyzer runs, Home now
leads with a primary "Clean these files for me" CTA that runs the
recommended pipeline (Clean Text -> Standardize -> Fix Missing -> Find
Duplicates, in order) on every imported file and hands back a cleaned
CSV per file — collapsing "which tool, what order" to one click. The
existing per-finding cards remain, reframed as "Or fix issues one at a
time" for users who want manual control.

- Reuses the core API verbatim (recommended_pipeline + run_pipeline);
  reader mirrors 9_Pipeline_Runner._read_uploaded so files load the same
  way the standalone orchestrator loads them.
- Per-file errors are captured so one bad file doesn't kill the batch;
  cleaned CSVs are cached in session_state so downloads survive reruns
  and are pruned when a file is removed or re-analyzed.

Verified: the read -> run_pipeline -> CSV data path executes correctly
(compile + a non-Streamlit functional smoke test). The Streamlit UI
scaffolding (button / download_button / progress / session_state)
mirrors the proven runner page but still needs a `streamlit run` check.
Front-door copy is English literals for now; i18n keys are a follow-up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 17:06:30 +00:00
09ec01e98b feat(gui): port journey-level nav + local-first pill to the live app
Brings the live Streamlit app in line with the finalized layout-review
mockups (structural/low-risk changes; verified by compile + registry
sanity, still pending a streamlit-run visual check):

- tools_registry: Data Cleaners now in pipeline order (Clean Text ->
  Standardize -> Fix Missing -> Find Duplicates); new "finance" section
  (Reconcile, PDF to CSV) and "coming_soon" section (Find Unusual,
  Quality Check, Combine Files). Adds those to the Section type +
  SECTION_LABELS.
- app.py: Home becomes the "Start here" front door — a standalone,
  unlabeled top entry (play_circle icon) ahead of the hidden
  Activate/Logs/Close pages; nav groups reordered cleaners ->
  transformations -> automations -> finance -> coming soon.
- _legacy.py: render_tool_header now shows the "Runs 100% locally"
  privacy pill (right-aligned, Ready tools only — omitted on Coming
  Soon stubs); accent emphasis CSS for the Start-here nav link.
- i18n: add nav.start_here_title, nav.section_finance,
  nav.section_coming_soon to en + es packs.
- DECISIONS.md: log the PDF/Reconcile in-bundle (Finance group) call.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 17:01:57 +00:00
48251b625f refactor(layout-review): consolidate tool-header actions + align reconcile downloads
Consistency pass over the parallel-agent work:
- Replace 4 divergent inline header wrappers (flex/inline-flex, gap
  10/12px, margin-top present/absent across 8 tool pages) with one shared
  .dt-tool-header-actions class; strip the now-redundant per-button
  margin-top:0. Every tool header now aligns the local-first pill + Help
  button identically.
- Reconcile downloads row: reorder to the page's exceptions-first order
  (Review, Unmatched left, Unmatched right, Matched) to match the tabs and
  metric strip, and drop the lone competing primary — the four are
  parallel exports of equal weight.

Audited and confirmed already-consistent: compact intake banner, privacy
pill markup, .dt-next-step strips, the three coming-soon stubs, primary
CTAs, and the 3-download CSV/audit/config pattern.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:50:25 +00:00
dd0942d710 feat(layout-review): journey-level redesign — front door, taught order, consistency
Addresses the journey-level review (the app felt like 12 tools sharing a
stylesheet, not one guided product). File-partitioned changes:

Navigation (shell.js): rename Home -> "Start here" with front-door
emphasis (.dt-nav-start); reorder Data Cleaners into pipeline order
(Clean Text -> Standardize -> Fix Missing -> Find Duplicates); new
"Finance" group (Reconcile, PDF to CSV); all stubs moved to a bottom
"Coming soon" group, no longer interleaved with working tools.

Front door (home.html): a prominent primary "Clean these files for me"
that runs the recommended pipeline in order, above the existing
per-finding cards (reframed as "fix one thing at a time").

Shared tokens (app.css): .dt-next-step suggestion strip + .dt-nav-start.

Teach the order: a slim .dt-next-step strip at the end of each linear
cleaner page points to the next pipeline step (Map Columns -> Start here;
orchestrator/Finance pages correctly omit it).

Local-first: the green "Runs 100% locally" pill now sits in every working
tool page's header (home + 8 tools), where client data is entered.

Plain English: jargon relabeled on input controls (coerce, E.164,
NFC/NFKC, sentinels, survivor rule), technical terms kept in tooltips and
audit/output cells only.

Stubs (06/08/07): rebuilt to one identical skeleton — info line + plain
feature list + a real "Notify me when this ships" button; every disabled
control and uploader removed (a dimmed dropzone reads as broken).

Intake: full dropzone+chip replaced with the compact "Using <file>" banner
on Clean Text, Fix Missing, Find Duplicates, and both Reconcile sides.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:44:11 +00:00
cf31d9ef14 feat(layout-review): address review findings on pages 7-12
Find Duplicates (01_deduplicator):
- Delete the redundant outer Options wrapper; surface threshold +
  survivor rule directly, push the rest behind a single Advanced pane.
- Disambiguate competing primaries: top result is an auto-resolved
  preview (secondary download), review decisions are the single primary.
- Plain-English match labels (exact / approximate); clarify the third.
- Lift the match-card caption to a one-time instruction; note delimiter
  is delimited-text-only.

Quality Check (08_validator_reporter) — stub:
- Remove the dead disabled "Load rules file (JSON)" uploader so the
  stub invites a single action; keep the informative feature list.

Map Columns (05_column_mapper):
- Regroup schema -> mapping -> strategy/advanced (core task contiguous).
- Make preset-vs-Advanced precedence legible (Custom + modified marker).
- Adopt the compact file-intake banner; drop the duplicate resolved-
  mapping table; fix the add-row gutter style.

Combine Files (07_multi_file_merger) — stub:
- Actually disable the Merge CTA (add the disabled attribute).

PDF to CSV (10_pdf_extractor):
- Drop page/raw from the default preview to match export + fix the
  horizontal clip; surface raw via per-row affordance + overflow-x.
- Move the column selector above the download button; give auto-excluded
  rows a reason; align the files card to Home; de-dupe the row count.

Automated Workflows (09_pipeline_runner):
- Replace hand-edited JSON step config with per-step control expanders;
  JSON moved behind Advanced import/export.
- Editing the table marks the mode modified; fold the empty error column
  into the status pill; render summaries as plain English; collapse the
  explainer by default.

Cross-cutting items (stub standardization on page 10, shared disabled-
field token, remaining intake rollout) deferred to a holistic pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:35:46 +00:00
563d845b70 feat(layout-review): address review findings on pages 4-6
Find Unusual Values (06_outlier_detector) — coming-soon stub:
- Anchor the disabled Method on IQR (multiplier 1.5), not Z-score, per
  the logged robustness decision.
- Drop the redundant feature bullet list (kept alert + greyed controls
  + disabled button); also fixes the MAD-only-in-bullets mismatch.
- Remove the live uploader that dead-ended into disabled controls.

Clean Text (02_text_cleaner):
- Add an inline hidden-character legend (3 swatches reusing the actual
  badge classes) beside the canonical "Show hidden characters" toggle.
- Unify the two hidden-char toggles: preview one is canonical; the
  Results bare checkbox is wrapped in a field + bound note.
- Describe all three presets (minimal / excel-hygiene / paranoid).
- Give "Changes by column" a real "column" header instead of the
  grey index-gutter style.

Standardize Formats (03_format_standardizer):
- Make preset-vs-control precedence legible: preset shows Custom with a
  "modified" marker + base tag, diverging controls flag the winning
  value (same pattern as Fix Missing Values).
- Replace the dead-end unparseable alert with a real "Unparseable
  cells (47)" expander the alert now points to.
- Honest preview caption: "5 of 6 columns (notes skipped)".
Intake pattern (the cross-page reference) left untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:27:42 +00:00
be1e263223 feat(layout-review): address Fix Missing Values review findings
- Pin down strategy precedence: add a resolution-order legend
  (per-column -> global -> preset), dim/strike the preset radios when
  a global strategy overrides them, and add a "Resolves to" column to
  the per-column override table so the winning value is legible.
- Make the demo state honest: Global strategy = median is what drives
  the 1,043 fills, resolving the detect-only contradiction.
- Surface the missingness profile as an always-visible block above the
  (now-open) Options expander — diagnostic before configuration.
- Stop highlighting unchanged before/after cells (respondent_id 0->0);
  show "(global)" placeholders in unset per-column override cells.
- Fold the standalone "Strategy applied per column" table into the
  before/after table as a strategy column; inset maxed slider knobs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:23:32 +00:00
7ebfd0f153 feat(layout-review): address Reconcile page review findings
- Fix doubled "Invert right amount sign" label: keep the field label,
  strip the checkbox caption to the box only (also evens the 3-up row).
- Reorder results exceptions-first: tabs and metric strip both run
  Review -> Unmatched left -> Unmatched right -> Matched, with Review
  the default active tab and its table as the inline content; Matched
  demoted to a trailing context expander.
- Surface the "references must match left count" rule with an inline
  validation indicator under the right reference field instead of a
  label note alone.
- Mark the required Amount join key with the .req accent star on both
  sides so it reads distinct from the optional date/description pickers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:17:20 +00:00
2592604067 feat(layout-review): address Home page review findings
- Findings card no longer truncates silently: panel #1 gains a
  .dt-finding-more overflow control ("Show all 8 findings · 5 more").
- Replace the dead "Files analyzed: 3" stat (restated the section meta
  + visible rows) with "Rows scanned" — info not already on screen.
- Collapsed findings panels use a real .is-collapsed state variant
  instead of inline margin-bottom:-16px hacks, so states can't drift.
- Action bar buttons are content-sized; drop the 340px island that
  jarred against the full-width divider/stats below it.

Branding kept as deliberate landing-style treatment on Home (per
review decision); interior tool pages remain title-only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:14:04 +00:00
22 changed files with 1047 additions and 614 deletions

33
DECISIONS.md Normal file
View File

@@ -0,0 +1,33 @@
# Product & architecture decisions
A running log of decisions that aren't obvious from the code and would
otherwise be re-litigated. Newest first.
## 2026-06-08 — PDF to CSV and Reconcile stay in the bundle, under a "Finance" group
**Decision:** `10_pdf_extractor` (PDF to CSV) and `11_reconciler` (Reconcile
Two Files) remain part of the DataTools suite. In the sidebar they are
segregated into their own **Finance** section, distinct from the
file-cleaning tools.
**Context / why this needed deciding:**
- Both tools sit outside the documented 9-script cleaning architecture
(TECHNICAL.md / USER-GUIDE.md stop at the orchestrator).
- They occupy the "reconciliation / manual data-entry" territory the
product's honest-positioning note explicitly placed outside a
file-cleaning tool's scope.
- A journey-level UX review flagged that every extra tool in the main
sidebar raises the "which tool do I need?" load for a non-technical
buyer, so tools serving a different job should live in a clearly
different place.
**Resolution:** Keep them in-bundle (they're built, useful, and ship
today) but group them under "Finance" so the cleaning flow stays
uncluttered. Revisit only if a separate finance-focused product emerges.
**Implications:**
- `tools_registry.py`: Reconcile + PDF to CSV carry a `finance` section.
- Sidebar order: Start here → Data Cleaners → Transformations →
Automations → Finance → Coming soon.
- This is the source-of-truth realization of the `layout-review/`
mockups (see `layout-review/shell.js`).

View File

@@ -19,29 +19,30 @@
<!-- Tool header -->
<div class="dt-tool-header">
<h1>Find Duplicates</h1>
<button class="dt-help-btn"><span class="dt-mi">help_outline</span> Help</button>
<div class="dt-tool-header-actions">
<span class="dt-privacy-pill">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor">
<rect x="4" y="11" width="16" height="10" rx="2"/>
<path d="M8 11V7a4 4 0 018 0v4"/>
</svg>
Runs 100% locally
</span>
<button class="dt-help-btn"><span class="dt-mi">help_outline</span> Help</button>
</div>
</div>
<p class="dt-tool-caption">Find rows that repeat, then keep one and remove the extras.</p>
<div class="dt-spacer"></div>
<!-- Upload (file staged) -->
<label class="dt-label">Import CSV or Excel file</label>
<div class="dt-uploader">
<div class="dt-uploader-text">
<span class="hint"><span class="dt-mi" style="vertical-align:-4px">upload_file</span> Drag and drop file here</span>
<span class="sub">Up to 1.5 GB · CSV, TSV, XLSX, XLS · encoding &amp; delimiter auto-detected</span>
</div>
<button class="dt-btn">Browse files</button>
</div>
<div class="dt-file-chip">
<span class="dt-file-icon-chip"><svg viewBox="0 0 24 24" fill="none" stroke="currentColor"><path d="M14 2H6a2 2 0 00-2 2v16a2 2 0 002 2h12a2 2 0 002-2V8z"/><path d="M14 2v6h6"/></svg></span>
<span class="name">customers_export.csv</span>
<span class="size">2.1 MB</span>
<button class="dt-btn dt-btn-tertiary" title="Remove"></button>
<!-- File pickup banner (using file from upload screen) -->
<div class="dt-alert info">
<span class="dt-mi">description</span>
<span>Using <strong>customers_export.csv</strong> from the upload screen.</span>
</div>
<button class="dt-btn" style="margin-bottom:4px">Use a different file</button>
<!-- Delimiter selector (CSV) -->
<!-- Delimiter selector — delimited-text only (CSV/TSV); omitted for XLSX/XLS.
Shown here because the staged file is customers_export.csv. -->
<div class="dt-field" style="max-width:320px">
<label class="dt-label">Delimiter</label>
<div class="dt-select">Comma (,)</div>
@@ -67,32 +68,33 @@
</div>
</details>
<!-- Options expander -->
<!-- Basic controls (visible by default) -->
<div class="dt-cols-2">
<div class="dt-field"><label class="dt-label">Match threshold</label>
<div class="dt-slider"><div class="track"><div class="fill" style="width:70%"></div><div class="knob" style="left:70%"></div></div><div class="val">85</div></div>
<div class="dt-help-text">Higher means rows must look more alike to count as a duplicate.</div></div>
<div class="dt-field"><label class="dt-label">When duplicates are found, keep</label>
<div class="dt-select">the most-complete row</div>
<div class="dt-help-text">Which row survives in each group of duplicates.</div></div>
</div>
<!-- Advanced options (single expander; basics live above) -->
<details class="dt-expander">
<summary>Options</summary>
<summary>Advanced options</summary>
<div class="dt-expander-body">
<details class="dt-expander" style="margin-top:0">
<summary>Advanced Options</summary>
<div class="dt-expander-body">
<div class="dt-cols-2">
<div>
<div class="dt-field"><label class="dt-label">Match on columns</label>
<div class="dt-multiselect"><span class="dt-ms-placeholder">Leave empty to auto-detect</span></div></div>
<div class="dt-field"><label class="dt-label">Strong keys</label>
<div class="dt-multiselect"><span class="dt-ms-chip">email <span class="x"></span></span></div></div>
<div class="dt-field"><label class="dt-label">Fuzzy columns</label>
<div class="dt-multiselect"><span class="dt-ms-chip">name <span class="x"></span></span></div></div>
</div>
<div>
<div class="dt-field"><label class="dt-label">Fuzzy algorithm</label><div class="dt-select">jaro_winkler</div></div>
<div class="dt-field"><label class="dt-label">Similarity threshold</label>
<div class="dt-slider"><div class="track"><div class="fill" style="width:70%"></div><div class="knob" style="left:70%"></div></div><div class="val">85</div></div></div>
<div class="dt-field"><label class="dt-label">Survivor rule</label><div class="dt-select">most-complete</div></div>
</div>
</div>
<div class="dt-check on" style="margin-top:6px"><span class="box"><span class="dt-mi">check</span></span> Merge mode — fill missing fields in the surviving row</div>
<p class="dt-help-text" style="margin-top:0">Leave these empty to auto-detect which columns to compare. Otherwise, list the columns that must match <strong>exactly</strong> and the ones that only need to match <strong>approximately</strong> — together these are the columns used to find duplicates.</p>
<div class="dt-cols-2">
<div>
<div class="dt-field"><label class="dt-label">Columns that must match exactly</label>
<div class="dt-multiselect"><span class="dt-ms-chip">email <span class="x"></span></span></div></div>
<div class="dt-field"><label class="dt-label">Columns to match approximately</label>
<div class="dt-multiselect"><span class="dt-ms-chip">name <span class="x"></span></span></div></div>
</div>
</details>
<div>
<div class="dt-field"><label class="dt-label">Approximate-match algorithm</label><div class="dt-select">jaro_winkler</div></div>
</div>
</div>
<div class="dt-check on" style="margin-top:6px"><span class="box"><span class="dt-mi">check</span></span> Merge mode — fill missing fields in the surviving row</div>
</div>
</details>
@@ -109,8 +111,9 @@
<div class="dt-metric"><div class="label">Match groups</div><div class="value">147</div></div>
<div class="dt-metric"><div class="label">Rows kept</div><div class="value">18,130</div></div>
</div>
<p class="dt-caption">Preview of an auto-resolved run: each group keeps its auto-picked survivor. Review the groups below to override any pending picks before the final download.</p>
<div class="dt-btn-row" style="max-width:560px">
<button class="dt-btn dt-btn-primary">Download deduplicated CSV</button>
<button class="dt-btn">Download auto-resolved CSV</button>
<button class="dt-btn">Download removed rows</button>
</div>
@@ -123,6 +126,7 @@
<button class="dt-btn">Reject All</button>
<button class="dt-btn">Clear Decisions</button>
</div>
<p class="dt-caption" style="margin-top:8px">Differing columns are highlighted. The survivor row is kept; uncheck a row to split it out of the group.</p>
<!-- Match group card 1 -->
<div class="dt-match-card">
@@ -140,7 +144,6 @@
</tbody>
</table>
</div>
<p class="dt-caption">Differing columns highlighted. The survivor row is kept; uncheck rows to split the group.</p>
</div>
</div>
@@ -163,8 +166,8 @@
</div>
</div>
<p class="dt-caption" style="margin-top:14px">Decisions: 1 merged, 1 pending</p>
<button class="dt-btn dt-btn-primary dt-btn-block" style="margin-top:8px">Apply Review Decisions &amp; Download</button>
<p class="dt-caption" style="margin-top:14px">Decisions: 1 merged, 1 pending · Pending groups keep their auto-picked survivor unless you review them.</p>
<button class="dt-btn dt-btn-primary dt-btn-block" style="margin-top:8px">Apply Review Decisions &amp; Download Final CSV</button>
<!-- Processing log -->
<details class="dt-expander" style="margin-top:18px">
@@ -178,6 +181,8 @@
</div>
</details>
<div class="dt-next-step"><span class="dt-mi">arrow_forward</span><span>Duplicates handled — your file is cleaned. Review the result or <a href="home.html">Back to Start here →</a></span><button class="dt-next-step-dismiss" title="Dismiss"></button></div>
</div>
</main>
</div>

View File

@@ -27,34 +27,39 @@
<!-- Tool header -->
<div class="dt-tool-header">
<h1>Clean Text</h1>
<button class="dt-help-btn"><span class="dt-mi">help_outline</span> Help</button>
<div class="dt-tool-header-actions">
<span class="dt-privacy-pill">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor">
<rect x="4" y="11" width="16" height="10" rx="2"/>
<path d="M8 11V7a4 4 0 018 0v4"/>
</svg>
Runs 100% locally
</span>
<button class="dt-help-btn"><span class="dt-mi">help_outline</span> Help</button>
</div>
</div>
<p class="dt-tool-caption">Trim extra spaces and strip out odd characters.</p>
<div class="dt-spacer"></div>
<!-- Upload (file staged) -->
<label class="dt-label">Import CSV or Excel file</label>
<div class="dt-uploader">
<div class="dt-uploader-text">
<span class="hint"><span class="dt-mi" style="vertical-align:-4px">upload_file</span> Drag and drop file here</span>
<span class="sub">Up to 1.5 GB · CSV, TSV, XLSX, XLS · encoding auto-detected</span>
</div>
<button class="dt-btn">Browse files</button>
</div>
<div class="dt-file-chip">
<span class="dt-file-icon-chip"><svg viewBox="0 0 24 24" fill="none" stroke="currentColor"><path d="M14 2H6a2 2 0 00-2 2v16a2 2 0 002 2h12a2 2 0 002-2V8z"/><path d="M14 2v6h6"/></svg></span>
<span class="name">contacts_messy.csv</span>
<span class="size">684 KB</span>
<button class="dt-btn dt-btn-tertiary" title="Remove"></button>
<!-- File pickup banner (using file from upload screen) -->
<div class="dt-alert info">
<span class="dt-mi">description</span>
<span>Using <strong>contacts_messy.csv</strong> from the upload screen.</span>
</div>
<button class="dt-btn" style="margin-bottom:4px">Use a different file</button>
<!-- Preview expander (collapsed once a result exists) -->
<details class="dt-expander">
<summary>Preview: contacts_messy.csv</summary>
<div class="dt-expander-body">
<p class="dt-caption">4,120 rows, 4 columns</p>
<div class="dt-check on" style="margin-top:2px"><span class="box"><span class="dt-mi">check</span></span> Show hidden characters in preview</div>
<div class="dt-check on" style="margin-top:2px"><span class="box"><span class="dt-mi">check</span></span> Show hidden characters</div>
<div style="display:flex;flex-wrap:wrap;align-items:center;gap:14px;margin-top:6px;font-size:12px;color:var(--ink-secondary)">
<span style="display:inline-flex;align-items:center;gap:6px"><span class="hidden-char hidden-whitespace" style="cursor:default">·</span> Whitespace</span>
<span style="display:inline-flex;align-items:center;gap:6px"><span class="hidden-char hidden-special" style="cursor:default"></span> Smart / special</span>
<span style="display:inline-flex;align-items:center;gap:6px"><span class="hidden-char hidden-control" style="cursor:default"></span> Control</span>
</div>
<div class="dt-table-wrap" style="margin-top:8px">
<table class="dt-table">
<thead><tr><th class="idx"></th><th>name</th><th>email</th><th>company</th><th>notes</th></tr></thead>
@@ -82,7 +87,11 @@
<span class="dt-radio"><span class="dot"></span> minimal</span>
<span class="dt-radio"><span class="dot"></span> paranoid</span>
</div>
<div class="dt-help-text">excel-hygiene: trim, collapse whitespace, fold smart quotes, strip invisible chars, normalize line endings, NFC.</div>
<div class="dt-help-text">
minimal: trim and collapse whitespace only — no character substitutions.<br>
excel-hygiene: trim, collapse whitespace, fold smart quotes, strip invisible chars, normalize line endings, and normalize accented characters.<br>
paranoid: everything in excel-hygiene plus strip control characters, strip BOM, and normalize accented and look-alike characters (lossy).
</div>
</div>
<details class="dt-expander">
@@ -99,8 +108,8 @@
<div>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Fold smart characters (curly quotes, em-dash, NBSP)</div>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Strip zero-width / invisible characters</div>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Unicode NFC normalization</div>
<div class="dt-check"><span class="box"></span> Unicode NFKC compat fold (lossy: ① → 1, fi → fi)</div>
<div class="dt-check on" title="Unicode NFC normalization"><span class="box"><span class="dt-mi">check</span></span> Normalize accented characters (NFC)</div>
<div class="dt-check" title="Unicode NFKC compatibility fold"><span class="box"></span> Normalize accented and look-alike characters (lossy: ① → 1, fi → fi)</div>
</div>
</div>
@@ -143,17 +152,20 @@
<div class="dt-metric"><div class="label">Columns processed</div><div class="value">4</div></div>
</div>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Show hidden characters (NBSP, ZWSP, smart quotes, control chars…)</div>
<div class="dt-field">
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Show hidden characters (NBSP, ZWSP, smart quotes, control chars…)</div>
<div class="dt-help-text">Same setting as “Show hidden characters” in the preview above — toggling either updates both.</div>
</div>
<h4>Changes by column</h4>
<div class="dt-table-wrap" style="max-width:360px">
<table class="dt-table">
<thead><tr><th class="idx"></th><th>cells_changed</th></tr></thead>
<thead><tr><th>column</th><th>cells_changed</th></tr></thead>
<tbody>
<tr><td class="idx">company</td><td>1,604</td></tr>
<tr><td class="idx">name</td><td>1,210</td></tr>
<tr><td class="idx">notes</td><td>982</td></tr>
<tr><td class="idx">email</td><td>151</td></tr>
<tr><td>company</td><td>1,604</td></tr>
<tr><td>name</td><td>1,210</td></tr>
<tr><td>notes</td><td>982</td></tr>
<tr><td>email</td><td>151</td></tr>
</tbody>
</table>
</div>
@@ -199,6 +211,9 @@
<button class="dt-btn">Download config JSON</button>
</div>
<!-- Next-step suggestion -->
<div class="dt-next-step"><span class="dt-mi">arrow_forward</span><span>Text cleaned. Next, most files need: <a href="03_format_standardizer.html">Standardize Formats →</a></span><button class="dt-next-step-dismiss" title="Dismiss"></button></div>
</div>
</main>
</div>

View File

@@ -19,7 +19,16 @@
<!-- Tool header -->
<div class="dt-tool-header">
<h1>Standardize Formats</h1>
<button class="dt-help-btn"><span class="dt-mi">help_outline</span> Help</button>
<div class="dt-tool-header-actions">
<span class="dt-privacy-pill">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor">
<rect x="4" y="11" width="16" height="10" rx="2"/>
<path d="M8 11V7a4 4 0 018 0v4"/>
</svg>
Runs 100% locally
</span>
<button class="dt-help-btn"><span class="dt-mi">help_outline</span> Help</button>
</div>
</div>
<p class="dt-tool-caption">Make dates, phones, currency, and names look the same throughout.</p>
@@ -76,18 +85,23 @@
<hr class="dt-divider">
<h3>Format options</h3>
<!-- Standards preset radio (vertical) -->
<!-- Standards preset radio (vertical). Demo state: preset has auto-switched
to Custom because individual controls below diverge from the European base. -->
<div class="dt-field">
<label class="dt-label">Standards preset</label>
<div style="display:flex;flex-direction:column;gap:8px;margin-top:4px">
<span class="dt-radio on"><span class="dot"></span> US (default) — ISO 8601 dates · E.164 phones · USD</span>
<span class="dt-radio"><span class="dot"></span> European — DMY input · INTL phones · EUR comma decimal</span>
<span class="dt-radio" title="E.164 phones"><span class="dot"></span> US (default) — ISO 8601 dates · international-format phones (+1…) · USD</span>
<span class="dt-radio"><span class="dot"></span> European — DMY input · INTL phones · EUR comma decimal <span class="dt-count-pill info" style="margin-left:4px">base</span></span>
<span class="dt-radio"><span class="dot"></span> UK — DD/MM/YYYY · GB phones · Yes/No booleans</span>
<span class="dt-radio"><span class="dot"></span> ISO Strict — ISO 8601 · bare-number currency · true/false</span>
<span class="dt-radio"><span class="dot"></span> Legacy US — MM/DD/YYYY · National phones · Yes/No</span>
<span class="dt-radio"><span class="dot"></span> Custom — keep current settings</span>
<span class="dt-radio on"><span class="dot"></span> Custom — based on <strong>European</strong>, 2 controls changed <span class="dt-count-pill warn" style="margin-left:4px">modified</span></span>
</div>
<div class="dt-help-text">Pick a published standard or regional convention as the baseline. Every option below is still individually overridable.</div>
<div class="dt-precedence" style="margin-top:10px">
<span class="dt-mi">rule</span>
<span>Individual controls win over the preset. You started from <strong>European</strong>, then changed <strong>Ambiguous input order</strong> and <strong>Decimal separator</strong> below — so the preset is now <strong>Custom</strong>. The controls' current values are what actually run.</span>
</div>
<div class="dt-help-text">Pick a published standard or regional convention as the baseline. Every option below is still individually overridable; overriding any one switches the preset to Custom.</div>
</div>
<!-- Two-column format options -->
@@ -97,15 +111,16 @@
<h4 style="margin-top:0"><strong>Dates</strong></h4>
<div class="dt-field"><label class="dt-label">Output format</label><div class="dt-select">YYYY-MM-DD (ISO)</div></div>
<div class="dt-field">
<label class="dt-label">Ambiguous input order (e.g. 01/02/2024)</label>
<label class="dt-label">Ambiguous input order (e.g. 01/02/2024) <span class="dt-count-pill warn" style="margin-left:4px">changed</span></label>
<div class="dt-radio-row">
<span class="dt-radio on"><span class="dot"></span> MDY (US)</span>
<span class="dt-radio"><span class="dot"></span> DMY (EU)</span>
</div>
<div class="dt-help-text">Winning value: <strong>MDY</strong>. Overrides the European base (DMY) — <code>01/02/2024</code> reads as <strong>2024-01-02</strong>.</div>
</div>
<h4><strong>Phones</strong></h4>
<div class="dt-field"><label class="dt-label">Output format</label><div class="dt-select">E.164 (+15551234567)</div></div>
<div class="dt-field"><label class="dt-label" title="E.164">Output format</label><div class="dt-select" title="E.164">Standard international format (+15551234567)</div></div>
<div class="dt-field">
<label class="dt-label">Default region (ISO-2)</label>
<div class="dt-input">US</div>
@@ -117,11 +132,12 @@
<div>
<h4 style="margin-top:0"><strong>Currency</strong></h4>
<div class="dt-field">
<label class="dt-label">Decimal separator in input</label>
<label class="dt-label">Decimal separator in input <span class="dt-count-pill warn" style="margin-left:4px">changed</span></label>
<div class="dt-radio-row">
<span class="dt-radio on"><span class="dot"></span> dot (1,234.56)</span>
<span class="dt-radio"><span class="dot"></span> comma (1.234,56)</span>
</div>
<div class="dt-help-text">Winning value: <strong>dot</strong>. Overrides the European base (comma) — <code>$1,234.5</code> reads as <strong>1234.50</strong>.</div>
</div>
<div class="dt-field" style="max-width:200px"><label class="dt-label">Round to decimals</label><div class="dt-input">2</div></div>
<div class="dt-check"><span class="box"></span> Preserve original precision (don't round)</div>
@@ -154,9 +170,30 @@
<div class="dt-alert info">
<span class="dt-mi">info</span>
<span>47 cell(s) in typed columns didn't match a recognizable shape and were left as-is. Check the changes audit below to find them, or re-classify the column to <strong>(skip)</strong>.</span>
<span>47 cell(s) in typed columns didn't match a recognizable shape and were left as-is. See <strong>Unparseable cells</strong> below to review them, or re-classify the column to <strong>(skip)</strong>. (They aren't in the changes audit — nothing was changed.)</span>
</div>
<!-- Unparseable cells surface (the alert points here; these are left-as-is, so they never appear in the CHANGES audit) -->
<details class="dt-expander">
<summary>Unparseable cells (47)</summary>
<div class="dt-expander-body">
<p class="dt-caption">Cells in typed columns that didn't match a recognizable shape and were left unchanged.</p>
<div class="dt-table-wrap">
<table class="dt-table">
<thead><tr><th>row</th><th>column</th><th>field_type</th><th>value (left as-is)</th></tr></thead>
<tbody>
<tr><td>318</td><td>signup_date</td><td>date</td><td class="dt-cell-flag">soon</td></tr>
<tr><td>902</td><td>phone</td><td>phone</td><td class="dt-cell-flag">ext. 4471</td></tr>
<tr><td>1,544</td><td>amount</td><td>currency</td><td class="dt-cell-flag">TBD</td></tr>
<tr><td>2,087</td><td>active</td><td>boolean</td><td class="dt-cell-flag">maybe</td></tr>
<tr><td>3,610</td><td>signup_date</td><td>date</td><td class="dt-cell-flag">00/00/0000</td></tr>
</tbody>
</table>
</div>
<p class="dt-caption" style="margin-top:8px">… and 42 more.</p>
</div>
</details>
<!-- Changes by column -->
<p style="margin-bottom:6px"><strong>Changes by column</strong></p>
<div class="dt-table-wrap" style="max-width:520px">
@@ -194,6 +231,7 @@
<!-- Standardized preview -->
<p style="margin:14px 0 6px"><strong>Standardized preview (first 10 rows)</strong></p>
<p class="dt-caption" style="margin:0 0 6px">Showing 5 of 6 columns — <code>notes</code> is set to (skip), so it's omitted here.</p>
<div class="dt-table-wrap">
<table class="dt-table">
<thead><tr><th class="idx"></th><th>full_name</th><th>phone</th><th>amount</th><th>signup_date</th><th>active</th></tr></thead>
@@ -215,6 +253,9 @@
<button class="dt-btn">Download config JSON</button>
</div>
<!-- Next-step suggestion -->
<div class="dt-next-step"><span class="dt-mi">arrow_forward</span><span>Formats standardized. Next, most files need: <a href="04_missing_handler.html">Fix Missing Values →</a></span><button class="dt-next-step-dismiss" title="Dismiss"></button></div>
</div>
</main>
</div>

View File

@@ -19,28 +19,27 @@
<!-- Tool header -->
<div class="dt-tool-header">
<h1>Fix Missing Values</h1>
<button class="dt-help-btn"><span class="dt-mi">help_outline</span> Help</button>
<div class="dt-tool-header-actions">
<span class="dt-privacy-pill">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor">
<rect x="4" y="11" width="16" height="10" rx="2"/>
<path d="M8 11V7a4 4 0 018 0v4"/>
</svg>
Runs 100% locally
</span>
<button class="dt-help-btn"><span class="dt-mi">help_outline</span> Help</button>
</div>
</div>
<p class="dt-tool-caption">Find blank cells (even hidden ones) and fill them in or remove them.</p>
<div class="dt-spacer"></div>
<!-- Upload (file staged) -->
<p class="dt-caption">Tip: files imported on the Home screen are picked up here automatically.</p>
<label class="dt-label">Import CSV or Excel file</label>
<div class="dt-uploader">
<div class="dt-uploader-text">
<span class="hint"><span class="dt-mi" style="vertical-align:-4px">upload_file</span> Drag and drop file here</span>
<span class="sub">Up to 1.5 GB · CSV, TSV, XLSX, XLS</span>
</div>
<button class="dt-btn">Browse files</button>
</div>
<div class="dt-file-chip">
<span class="dt-file-icon-chip"><svg viewBox="0 0 24 24" fill="none" stroke="currentColor"><path d="M14 2H6a2 2 0 00-2 2v16a2 2 0 002 2h12a2 2 0 002-2V8z"/><path d="M14 2v6h6"/></svg></span>
<span class="name">survey_responses.csv</span>
<span class="size">684 KB</span>
<button class="dt-btn dt-btn-tertiary" title="Remove"></button>
<!-- File pickup banner (using file from upload screen) -->
<div class="dt-alert info">
<span class="dt-mi">description</span>
<span>Using <strong>survey_responses.csv</strong> from the upload screen.</span>
</div>
<button class="dt-btn" style="margin-bottom:4px">Use a different file</button>
<!-- Preview expander (collapsed after a result exists) -->
<details class="dt-expander">
@@ -63,39 +62,44 @@
<hr class="dt-divider">
<!-- Options expander (Missingness profile + Strategy) -->
<details class="dt-expander">
<!-- Missingness profile — always visible: see the damage before configuring -->
<h2>Missingness profile</h2>
<div class="dt-metrics">
<div class="dt-metric"><div class="label">Rows</div><div class="value">2,150</div></div>
<div class="dt-metric"><div class="label">Cells missing</div><div class="value">1,043</div></div>
<div class="dt-metric"><div class="label">% cells missing</div><div class="value">8.1%</div></div>
<div class="dt-metric"><div class="label">Complete rows</div><div class="value">1,388</div></div>
</div>
<div class="dt-table-wrap">
<table class="dt-table">
<thead><tr><th>column</th><th>dtype</th><th>missing</th><th>missing_pct</th><th>disguised</th><th>has_missing</th></tr></thead>
<tbody>
<tr><td>respondent_id</td><td>object</td><td>0</td><td>0.0%</td><td>0</td><td>False</td></tr>
<tr><td>age</td><td>float64</td><td>187</td><td>8.7%</td><td>61</td><td>True</td></tr>
<tr><td>region</td><td>object</td><td>142</td><td>6.6%</td><td>142</td><td>True</td></tr>
<tr><td>income</td><td>float64</td><td>329</td><td>15.3%</td><td>118</td><td>True</td></tr>
<tr><td>satisfaction</td><td>float64</td><td>95</td><td>4.4%</td><td>40</td><td>True</td></tr>
<tr><td>comments</td><td>object</td><td>290</td><td>13.5%</td><td>290</td><td>True</td></tr>
</tbody>
</table>
</div>
<hr class="dt-divider">
<!-- Options expander (Strategy) — configuration follows the diagnostic -->
<details class="dt-expander" open>
<summary>Options</summary>
<div class="dt-expander-body">
<h3>Missingness profile</h3>
<div class="dt-metrics">
<div class="dt-metric"><div class="label">Rows</div><div class="value">2,150</div></div>
<div class="dt-metric"><div class="label">Cells missing</div><div class="value">1,043</div></div>
<div class="dt-metric"><div class="label">% cells missing</div><div class="value">8.1%</div></div>
<div class="dt-metric"><div class="label">Complete rows</div><div class="value">1,388</div></div>
</div>
<div class="dt-table-wrap">
<table class="dt-table">
<thead><tr><th>column</th><th>dtype</th><th>missing</th><th>missing_pct</th><th>disguised</th><th>has_missing</th></tr></thead>
<tbody>
<tr><td>respondent_id</td><td>object</td><td>0</td><td>0.0%</td><td>0</td><td>False</td></tr>
<tr><td>age</td><td>float64</td><td>187</td><td>8.7%</td><td>61</td><td>True</td></tr>
<tr><td>region</td><td>object</td><td>142</td><td>6.6%</td><td>142</td><td>True</td></tr>
<tr><td>income</td><td>float64</td><td>329</td><td>15.3%</td><td>118</td><td>True</td></tr>
<tr><td>satisfaction</td><td>float64</td><td>95</td><td>4.4%</td><td>40</td><td>True</td></tr>
<tr><td>comments</td><td>object</td><td>290</td><td>13.5%</td><td>290</td><td>True</td></tr>
</tbody>
</table>
</div>
<hr class="dt-divider">
<h3>Strategy</h3>
<div class="dt-precedence">
<span class="dt-mi">layers</span>
<span>Resolution order: <strong>per-column override</strong><strong>global strategy</strong><strong>preset</strong>. The most specific setting wins; layers it overrides are dimmed.</span>
</div>
<div class="dt-field">
<label class="dt-label">Preset</label>
<div class="dt-radio-row" style="flex-direction:column;gap:10px">
<div class="dt-help-text" style="color:var(--warn);display:flex;align-items:center;gap:5px;margin-bottom:8px"><span class="dt-mi" style="font-family:'Material Symbols Outlined';font-size:15px;line-height:1">info</span> Overridden by <strong>Global strategy → median</strong> (set under Advanced options). Presets apply only when global is &ldquo;(use preset)&rdquo;.</div>
<div class="dt-radio-row is-overridden" style="flex-direction:column;gap:10px">
<span class="dt-radio on"><span class="dot"></span> detect-only (standardize sentinels to NaN, no fill or drop)</span>
<span class="dt-radio"><span class="dot"></span> safe-fill (numeric → median, categorical → mode)</span>
<span class="dt-radio"><span class="dot"></span> drop-incomplete (drop any row with missing)</span>
@@ -112,16 +116,16 @@
<h4>Detection</h4>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Standardize disguised nulls to NaN</div>
<div class="dt-field">
<label class="dt-label">Sentinel values (comma-separated)</label>
<label class="dt-label" title="Sentinel values">Blanks in disguise (N/A, dash, NULL) — comma-separated</label>
<div class="dt-input">N/A, n/a, NA, NULL, null, None, -, --, ?, #N/A</div>
<div class="dt-help-text">Matched case-insensitively after stripping whitespace.</div>
<div class="dt-help-text">Text that really means &ldquo;empty.&rdquo; Matched case-insensitively after stripping whitespace.</div>
</div>
</div>
<div>
<h4>Strategy override</h4>
<div class="dt-field">
<label class="dt-label">Global strategy</label>
<div class="dt-select">(use preset)</div>
<div class="dt-select">median</div>
<div class="dt-help-text">drop_row / drop_col use the thresholds below. mean / median / interpolate are numeric only — non-numeric columns fall back to the categorical strategy.</div>
</div>
<div class="dt-field">
@@ -135,11 +139,11 @@
<div class="dt-cols-2">
<div class="dt-field">
<label class="dt-label">Row drop threshold (drop rows with ≥ this fraction missing across selected cols)</label>
<div class="dt-slider"><div class="track"><div class="fill" style="width:100%"></div><div class="knob" style="left:100%"></div></div><div class="val">1.00</div></div>
<div class="dt-slider"><div class="track"><div class="fill" style="width:100%"></div><div class="knob" style="left:calc(100% - 8px)"></div></div><div class="val">1.00</div></div>
</div>
<div class="dt-field">
<label class="dt-label">Column drop threshold (drop columns with ≥ this fraction missing)</label>
<div class="dt-slider"><div class="track"><div class="fill" style="width:100%"></div><div class="knob" style="left:100%"></div></div><div class="val">1.00</div></div>
<div class="dt-slider"><div class="track"><div class="fill" style="width:100%"></div><div class="knob" style="left:calc(100% - 8px)"></div></div><div class="val">1.00</div></div>
</div>
</div>
@@ -164,13 +168,13 @@
<p class="dt-caption">Set a different strategy for specific columns. Leave any row blank to use the global strategy.</p>
<div class="dt-table-wrap">
<table class="dt-table">
<thead><tr><th>Column</th><th>Override</th></tr></thead>
<thead><tr><th>Column</th><th>Override</th><th>Resolves to</th></tr></thead>
<tbody>
<tr><td>age</td><td><span class="dt-select" style="display:inline-block;min-width:160px;padding:4px 24px 4px 10px">median</span></td></tr>
<tr><td>region</td><td><span class="dt-select" style="display:inline-block;min-width:160px;padding:4px 24px 4px 10px">mode</span></td></tr>
<tr><td>income</td><td><span class="dt-select" style="display:inline-block;min-width:160px;padding:4px 24px 4px 10px"></span></td></tr>
<tr><td>satisfaction</td><td><span class="dt-select" style="display:inline-block;min-width:160px;padding:4px 24px 4px 10px"></span></td></tr>
<tr><td>comments</td><td><span class="dt-select" style="display:inline-block;min-width:160px;padding:4px 24px 4px 10px">constant</span></td></tr>
<tr><td>age</td><td><span class="dt-select" style="display:inline-block;min-width:160px;padding:4px 24px 4px 10px;color:var(--ink-tertiary)">(global)</span></td><td>median <span style="color:var(--ink-tertiary);font-size:11px">· global</span></td></tr>
<tr><td>region</td><td><span class="dt-select" style="display:inline-block;min-width:160px;padding:4px 24px 4px 10px;color:var(--ink-tertiary)">(global)</span></td><td>mode <span style="color:var(--ink-tertiary);font-size:11px">· global → categorical fallback</span></td></tr>
<tr><td>income</td><td><span class="dt-select" style="display:inline-block;min-width:160px;padding:4px 24px 4px 10px;color:var(--ink-tertiary)">(global)</span></td><td>median <span style="color:var(--ink-tertiary);font-size:11px">· global</span></td></tr>
<tr><td>satisfaction</td><td><span class="dt-select" style="display:inline-block;min-width:160px;padding:4px 24px 4px 10px;color:var(--ink-tertiary)">(global)</span></td><td>median <span style="color:var(--ink-tertiary);font-size:11px">· global</span></td></tr>
<tr><td>comments</td><td><span class="dt-select" style="display:inline-block;min-width:160px;padding:4px 24px 4px 10px">constant</span></td><td><strong>constant</strong> <span style="color:var(--ink-tertiary);font-size:11px">· this column</span></td></tr>
</tbody>
</table>
</div>
@@ -198,28 +202,14 @@
<p><strong>Missingness — before vs. after</strong></p>
<div class="dt-table-wrap">
<table class="dt-table">
<thead><tr><th>column</th><th>before_missing</th><th>before_pct</th><th>after_missing</th><th>after_pct</th></tr></thead>
<thead><tr><th>column</th><th>before_missing</th><th>before_pct</th><th>after_missing</th><th>after_pct</th><th>strategy</th></tr></thead>
<tbody>
<tr><td>respondent_id</td><td>0</td><td>0.0</td><td class="dt-cell-add">0</td><td class="dt-cell-add">0.0</td></tr>
<tr><td>age</td><td class="dt-cell-flag">187</td><td>8.7</td><td class="dt-cell-add">0</td><td class="dt-cell-add">0.0</td></tr>
<tr><td>region</td><td class="dt-cell-flag">142</td><td>6.6</td><td class="dt-cell-add">0</td><td class="dt-cell-add">0.0</td></tr>
<tr><td>income</td><td class="dt-cell-flag">329</td><td>15.3</td><td class="dt-cell-add">0</td><td class="dt-cell-add">0.0</td></tr>
<tr><td>satisfaction</td><td class="dt-cell-flag">95</td><td>4.4</td><td class="dt-cell-add">0</td><td class="dt-cell-add">0.0</td></tr>
<tr><td>comments</td><td class="dt-cell-flag">290</td><td>13.5</td><td class="dt-cell-add">0</td><td class="dt-cell-add">0.0</td></tr>
</tbody>
</table>
</div>
<p><strong>Strategy applied per column</strong></p>
<div class="dt-table-wrap">
<table class="dt-table">
<thead><tr><th>column</th><th>strategy</th></tr></thead>
<tbody>
<tr><td>age</td><td>median</td></tr>
<tr><td>region</td><td>mode</td></tr>
<tr><td>income</td><td>median</td></tr>
<tr><td>satisfaction</td><td>median</td></tr>
<tr><td>comments</td><td>constant</td></tr>
<tr><td>respondent_id</td><td>0</td><td>0.0</td><td>0</td><td>0.0</td><td class="dt-cell-flag"></td></tr>
<tr><td>age</td><td class="dt-cell-flag">187</td><td>8.7</td><td class="dt-cell-add">0</td><td class="dt-cell-add">0.0</td><td>median</td></tr>
<tr><td>region</td><td class="dt-cell-flag">142</td><td>6.6</td><td class="dt-cell-add">0</td><td class="dt-cell-add">0.0</td><td>mode</td></tr>
<tr><td>income</td><td class="dt-cell-flag">329</td><td>15.3</td><td class="dt-cell-add">0</td><td class="dt-cell-add">0.0</td><td>median</td></tr>
<tr><td>satisfaction</td><td class="dt-cell-flag">95</td><td>4.4</td><td class="dt-cell-add">0</td><td class="dt-cell-add">0.0</td><td>median</td></tr>
<tr><td>comments</td><td class="dt-cell-flag">290</td><td>13.5</td><td class="dt-cell-add">0</td><td class="dt-cell-add">0.0</td><td>constant</td></tr>
</tbody>
</table>
</div>
@@ -262,6 +252,8 @@
<button class="dt-btn">Download config JSON</button>
</div>
<div class="dt-next-step"><span class="dt-mi">arrow_forward</span><span>Missing values handled. Next, most files need: <a href="01_deduplicator.html">Find Duplicates →</a></span><button class="dt-next-step-dismiss" title="Dismiss"></button></div>
</div>
</main>
</div>

View File

@@ -19,28 +19,27 @@
<!-- Tool header -->
<div class="dt-tool-header">
<h1>Map Columns</h1>
<button class="dt-help-btn"><span class="dt-mi">help_outline</span> Help</button>
<div class="dt-tool-header-actions">
<span class="dt-privacy-pill">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor">
<rect x="4" y="11" width="16" height="10" rx="2"/>
<path d="M8 11V7a4 4 0 018 0v4"/>
</svg>
Runs 100% locally
</span>
<button class="dt-help-btn"><span class="dt-mi">help_outline</span> Help</button>
</div>
</div>
<p class="dt-tool-caption">Rename columns, change their order, and set each one as text, number, or date.</p>
<div class="dt-spacer"></div>
<!-- Upload (file staged) -->
<p class="dt-caption">You can also import a file on the home screen and pick it up here.</p>
<label class="dt-label">Import CSV or Excel file</label>
<div class="dt-uploader">
<div class="dt-uploader-text">
<span class="hint"><span class="dt-mi" style="vertical-align:-4px">upload_file</span> Drag and drop file here</span>
<span class="sub">Up to 1.5 GB · CSV, TSV, XLSX, XLS · encoding &amp; delimiter auto-detected</span>
</div>
<button class="dt-btn">Browse files</button>
</div>
<div class="dt-file-chip">
<span class="dt-file-icon-chip"><svg viewBox="0 0 24 24" fill="none" stroke="currentColor"><path d="M14 2H6a2 2 0 00-2 2v16a2 2 0 002 2h12a2 2 0 002-2V8z"/><path d="M14 2v6h6"/></svg></span>
<span class="name">crm_contacts_raw.csv</span>
<span class="size">684 KB</span>
<button class="dt-btn dt-btn-tertiary" title="Remove"></button>
<!-- File pickup banner (using file from upload screen) -->
<div class="dt-alert info">
<span class="dt-mi">description</span>
<span>Using <strong>crm_contacts_raw.csv</strong> from the upload screen.</span>
</div>
<button class="dt-btn" style="margin-bottom:4px">Use a different file</button>
<!-- Preview expander (collapsed after a result exists) -->
<details class="dt-expander">
@@ -75,9 +74,9 @@
<div class="dt-radio-row" style="flex-direction:column; gap:8px">
<span class="dt-radio on"><span class="dot"></span> Build interactively (start from current columns)</span>
<span class="dt-radio"><span class="dot"></span> Import schema JSON</span>
<span class="dt-radio"><span class="dot"></span> Skip (rename / coerce only — no schema)</span>
<span class="dt-radio"><span class="dot"></span> Skip (rename / convert types only — no schema)</span>
</div>
<div class="dt-help-text">An interactive build is fastest for one-off cleanup. Import a JSON when you have a fixed contract (a CRM import format, db schema). Skip when you only want to rename or coerce specific columns.</div>
<div class="dt-help-text">An interactive build is fastest for one-off cleanup. Import a JSON when you have a fixed contract (a CRM import format, db schema). Skip when you only want to rename or convert the type of specific columns.</div>
</div>
<p class="dt-caption">Edit the table to define your target schema. Add rows for fields the input doesn't have yet (with a default), or remove rows for columns you want to drop.</p>
@@ -93,7 +92,7 @@
<tr><td>signup_date</td><td>date</td><td></td><td></td><td>Signup</td></tr>
<tr><td>amount_spent</td><td>float</td><td></td><td>0.0</td><td>Amount Spent</td></tr>
<tr><td>source</td><td>string</td><td></td><td>crm-import</td><td></td></tr>
<tr><td class="idx" style="color:var(--ink-tertiary)"><span class="dt-mi" style="font-size:16px;vertical-align:-3px">add</span> add row</td><td></td><td></td><td></td><td></td></tr>
<tr><td style="color:var(--ink-tertiary)"><span class="dt-mi" style="font-size:16px;vertical-align:-3px">add</span> add row</td><td></td><td></td><td></td><td></td></tr>
</tbody>
</table>
</div>
@@ -101,43 +100,8 @@
<hr class="dt-divider">
<!-- ===== Strategy ===== -->
<h3>Strategy</h3>
<div class="dt-field">
<label class="dt-label">Preset</label>
<div class="dt-radio-row" style="flex-direction:column; gap:8px">
<span class="dt-radio"><span class="dot"></span> rename-only (just rename, leave types alone, keep extras)</span>
<span class="dt-radio on"><span class="dot"></span> lenient-schema (rename + coerce + reorder, keep extras)</span>
<span class="dt-radio"><span class="dot"></span> strict-schema (rename + coerce + reorder, drop extras)</span>
</div>
</div>
<!-- Advanced options expander -->
<details class="dt-expander">
<summary>Advanced options</summary>
<div class="dt-expander-body">
<div class="dt-cols-2">
<div>
<div class="dt-field">
<label class="dt-label">Unmapped source columns</label>
<div class="dt-select">keep</div>
</div>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Coerce types per schema</div>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Reorder to schema order</div>
</div>
<div>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Auto-infer mapping (fuzzy match)</div>
<div class="dt-field">
<label class="dt-label">Fuzzy match threshold</label>
<div class="dt-slider"><div class="track"><div class="fill" style="width:80%"></div><div class="knob" style="left:80%"></div></div><div class="val">0.80</div></div>
</div>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Enforce required fields</div>
</div>
</div>
</div>
</details>
<!-- ===== Mapping ===== -->
<!-- Mapping follows the schema directly: define the schema, then map sources onto it. -->
<h3>Mapping</h3>
<!-- schema is set → source→target selectbox editor with auto-suggested flag -->
<div class="dt-table-wrap">
@@ -153,7 +117,53 @@
</tbody>
</table>
</div>
<p class="dt-caption">Pick a target for each source column. <code>Notes</code> stays unmapped — with the lenient preset it is kept as-is. <code>source</code> is added from the schema default.</p>
<p class="dt-caption">Pick a target for each source column. <code>Notes</code> stays unmapped — with the keep-extras strategy it is kept as-is. <code>source</code> is added from the schema default.</p>
<hr class="dt-divider">
<!-- ===== Strategy ===== -->
<!-- Strategy is a modifier on the mapping above (strictness: keep/drop extras, coerce, reorder), so it comes after the user can see what it acts on. -->
<h3>Strategy</h3>
<div class="dt-field">
<label class="dt-label">Preset</label>
<div class="dt-radio-row" style="flex-direction:column; gap:8px">
<span class="dt-radio"><span class="dot"></span> rename-only (just rename, leave types alone, keep extras)</span>
<span class="dt-radio"><span class="dot"></span> lenient-schema (rename + convert types + reorder, keep extras)</span>
<span class="dt-radio"><span class="dot"></span> strict-schema (rename + convert types + reorder, drop extras) <span class="dt-count-pill info" style="margin-left:4px">base</span></span>
<span class="dt-radio on"><span class="dot"></span> Custom — based on <strong>strict-schema</strong>, 1 control changed <span class="dt-count-pill warn" style="margin-left:4px">modified</span></span>
</div>
<div class="dt-precedence" style="margin-top:10px">
<span class="dt-mi">rule</span>
<span>Individual Advanced controls win over the preset. You started from <strong>strict-schema</strong>, then changed <strong>Unmapped source columns</strong> to <strong>keep</strong> below — so the preset is now <strong>Custom</strong>. The controls' current values are what actually run.</span>
</div>
<div class="dt-help-text">Pick a strategy as the baseline. Every Advanced toggle below is still individually overridable; overriding any one switches the preset to Custom.</div>
</div>
<!-- Advanced options expander -->
<details class="dt-expander" open>
<summary>Advanced options</summary>
<div class="dt-expander-body">
<div class="dt-cols-2">
<div>
<div class="dt-field">
<label class="dt-label">Unmapped source columns <span class="dt-count-pill warn" style="margin-left:4px">changed</span></label>
<div class="dt-select">keep</div>
<div class="dt-help-text">Winning value: <strong>keep</strong>. Overrides the strict-schema base (drop) — so <code>Notes</code> survives into the output.</div>
</div>
<div class="dt-check on" title="coerce types per schema"><span class="box"><span class="dt-mi">check</span></span> Convert each column to the right type</div>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Reorder to schema order</div>
</div>
<div>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Auto-infer mapping (fuzzy match)</div>
<div class="dt-field">
<label class="dt-label">Fuzzy match threshold</label>
<div class="dt-slider"><div class="track"><div class="fill" style="width:80%"></div><div class="knob" style="left:80%"></div></div><div class="val">0.80</div></div>
</div>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Enforce required fields</div>
</div>
</div>
</div>
</details>
</div>
</details>
@@ -176,20 +186,6 @@
<div class="dt-alert info"><span class="dt-mi">info</span><span>Added (with defaults): <code>source</code></span></div>
<div class="dt-alert warn"><span class="dt-mi">warning</span><span>Some cells could not be coerced and were left as NaN: amount_spent (3)</span></div>
<p><strong>Resolved mapping</strong></p>
<div class="dt-table-wrap">
<table class="dt-table">
<thead><tr><th>source</th><th>target</th><th>auto</th></tr></thead>
<tbody>
<tr><td>Full Name</td><td>full_name</td><td>True</td></tr>
<tr><td>EmailAddr</td><td>email</td><td>True</td></tr>
<tr><td>Phone #</td><td>phone</td><td>True</td></tr>
<tr><td>Signup</td><td>signup_date</td><td>True</td></tr>
<tr><td>Amount Spent</td><td>amount_spent</td><td>True</td></tr>
</tbody>
</table>
</div>
<p><strong>Mapped preview (first 10 rows)</strong></p>
<div class="dt-table-wrap">
<table class="dt-table">
@@ -213,6 +209,9 @@
<button class="dt-btn">Download config JSON</button>
</div>
<!-- Next-step suggestion -->
<div class="dt-next-step"><span class="dt-mi">arrow_forward</span><span>Columns mapped. <a href="home.html">Run the recommended clean →</a></span><button class="dt-next-step-dismiss" title="Dismiss"></button></div>
</div>
</main>
</div>

View File

@@ -12,7 +12,7 @@
<main class="dt-main">
<div class="dt-review-banner">
<span class="dt-mi">visibility</span>
<span>Static layout preview of <strong>Find Unusual Values</strong> — a <strong>Coming&nbsp;Soon</strong> tool. The page is a stub/teaser: an "under development" notice, a list of planned features, and disabled placeholder controls (only the file uploader is live). <a href="index.html">All pages →</a></span>
<span>Static layout preview of <strong>Find Unusual Values</strong> — a <strong>Coming&nbsp;Soon</strong> tool. The page is a stub: a "coming soon" notice, a plain-English list of what the tool will do, and a single "Notify me" action. <a href="index.html">All pages →</a></span>
</div>
<div class="dt-main-inner">
@@ -25,62 +25,26 @@
<div class="dt-spacer"></div>
<!-- st.info: under development -->
<!-- Coming-soon notice (st.info) -->
<div class="dt-alert info">
<span class="dt-mi">info</span>
<span>This tool is under development.</span>
<span>This tool is coming soon.</span>
</div>
<!-- Planned features (st.markdown) -->
<p><strong>Features:</strong></p>
<!-- What it will do (st.markdown) -->
<p><strong>What it will do:</strong></p>
<ul>
<li>Z-score detection (configurable threshold)</li>
<li>IQR (interquartile range) detection</li>
<li>MAD (median absolute deviation) detection</li>
<li>Domain-rule violations (e.g., age &lt; 0, price &gt; $1M)</li>
<li>Visual outlier highlighting in data preview</li>
<li>Handling: flag only, remove, cap/winsorize to bounds</li>
<li>Find values that are unusually high or low for a column</li>
<li>Spot values that break the rules you set (out of range, wrong type)</li>
<li>Choose how sensitive the check is</li>
<li>Flag unusual rows by adding a column, without changing your data</li>
<li>Cap extreme values at a limit you choose</li>
<li>See a summary of how many values were flagged</li>
</ul>
<hr class="dt-divider">
<!-- File upload (functional) -->
<label class="dt-label">Import CSV or Excel file</label>
<div class="dt-uploader">
<div class="dt-uploader-text">
<span class="hint"><span class="dt-mi" style="vertical-align:-4px">upload_file</span> Drag and drop file here</span>
<span class="sub">CSV, TSV, XLSX, XLS · Import a file to preview. Processing is not yet available.</span>
</div>
<button class="dt-btn">Browse files</button>
</div>
<!-- Placeholder options (all disabled) -->
<h3>Detection Method</h3>
<div class="dt-field" style="max-width:420px">
<label class="dt-label">Method</label>
<div class="dt-select" style="opacity:.55;cursor:not-allowed">Z-Score</div>
</div>
<div class="dt-field" style="max-width:420px;opacity:.55">
<label class="dt-label">Z-Score threshold</label>
<div class="dt-slider"><div class="track"><div class="fill" style="width:50%"></div><div class="knob" style="left:50%"></div></div><div class="val">3.0</div></div>
</div>
<div class="dt-field" style="max-width:420px;opacity:.55">
<label class="dt-label">IQR multiplier</label>
<div class="dt-slider"><div class="track"><div class="fill" style="width:25%"></div><div class="knob" style="left:25%"></div></div><div class="val">1.5</div></div>
</div>
<h3>Handling</h3>
<div class="dt-field" style="max-width:420px">
<label class="dt-label">Action</label>
<div class="dt-select" style="opacity:.55;cursor:not-allowed">Flag only (add column)</div>
</div>
<hr class="dt-divider">
<button class="dt-btn dt-btn-primary dt-btn-block is-disabled" disabled>Detect Outliers</button>
<button class="dt-btn dt-btn-primary"><span class="dt-mi">notifications</span> Notify me when this ships</button>
</div>
</main>

View File

@@ -12,7 +12,7 @@
<main class="dt-main">
<div class="dt-review-banner">
<span class="dt-mi">visibility</span>
<span>Static layout preview of <strong>Combine Files</strong> — a Coming-Soon tool. The page is a stub: an "under development" notice, a planned-features list, a working multi-file uploader, and disabled placeholder options. <a href="index.html">All pages →</a></span>
<span>Static layout preview of <strong>Combine Files</strong> — a <strong>Coming&nbsp;Soon</strong> tool. The page is a stub: a "coming soon" notice, a plain-English list of what the tool will do, and a single "Notify me" action. <a href="index.html">All pages →</a></span>
</div>
<div class="dt-main-inner">
@@ -23,56 +23,28 @@
</div>
<p class="dt-tool-caption">Combine several CSV or Excel files into one — even if columns differ.</p>
<!-- Under-development notice (st.info) -->
<div class="dt-spacer"></div>
<!-- Coming-soon notice (st.info) -->
<div class="dt-alert info">
<span class="dt-mi">info</span>
<span>This tool is under development.</span>
<span>This tool is coming soon.</span>
</div>
<!-- Planned features (st.markdown) -->
<p><strong>Features:</strong></p>
<ul style="font-size:14px;line-height:1.55;color:var(--ink);margin:0 0 0.6rem;padding-left:22px">
<li>Import multiple CSV/Excel files at once</li>
<li>Automatic schema alignment (matching columns by name)</li>
<li>Append mode: stack files vertically (union)</li>
<li>Join mode: merge files on shared key columns</li>
<li>Handle mismatched columns (fill missing with nulls or drop)</li>
<li>Source file tracking column</li>
<!-- What it will do (st.markdown) -->
<p><strong>What it will do:</strong></p>
<ul>
<li>Import several CSV or Excel files at once</li>
<li>Line up columns automatically by matching their names</li>
<li>Stack files on top of each other into one long file</li>
<li>Merge files side by side using shared key columns</li>
<li>Handle columns that don't match (fill the gaps with blanks or drop them)</li>
<li>Add a column showing which file each row came from</li>
</ul>
<hr class="dt-divider">
<!-- Multi-file upload (functional) -->
<label class="dt-label">Import CSV or Excel files</label>
<div class="dt-uploader">
<div class="dt-uploader-text">
<span class="hint"><span class="dt-mi" style="vertical-align:-4px">upload_file</span> Drag and drop files here</span>
<span class="sub">CSV, TSV, XLSX, XLS · multiple files allowed</span>
</div>
<button class="dt-btn">Browse files</button>
</div>
<div class="dt-help-text">Import multiple files to preview. Processing is not yet available.</div>
<!-- Placeholder options (all disabled) -->
<h3>Merge Strategy</h3>
<div class="dt-field">
<label class="dt-label">Mode</label>
<div class="dt-select" style="color:var(--ink-tertiary);background-color:var(--surface-hover)">Append (stack vertically)</div>
</div>
<div class="dt-field">
<label class="dt-label">Mismatched columns</label>
<div class="dt-select" style="color:var(--ink-tertiary);background-color:var(--surface-hover)">Fill with null</div>
</div>
<div class="dt-check on" style="opacity:0.6">
<span class="box"><span class="dt-mi">check</span></span> Add source filename column
</div>
<hr class="dt-divider">
<button class="dt-btn dt-btn-primary dt-btn-block is-disabled">Merge Files</button>
<button class="dt-btn dt-btn-primary"><span class="dt-mi">notifications</span> Notify me when this ships</button>
</div>
</main>

View File

@@ -12,7 +12,7 @@
<main class="dt-main">
<div class="dt-review-banner">
<span class="dt-mi">visibility</span>
<span>Static layout preview of <strong>Quality Check</strong>, a Coming-Soon tool. The page is a stub: an "under development" notice, a feature list, a working file uploader, and disabled placeholder controls. <a href="index.html">All pages →</a></span>
<span>Static layout preview of <strong>Quality Check</strong> a <strong>Coming&nbsp;Soon</strong> tool. The page is a stub: a "coming soon" notice, a plain-English list of what the tool will do, and a single "Notify me" action. <a href="index.html">All pages →</a></span>
</div>
<div class="dt-main-inner">
@@ -25,64 +25,26 @@
<div class="dt-spacer"></div>
<!-- Under-development notice (st.info) -->
<!-- Coming-soon notice (st.info) -->
<div class="dt-alert info">
<span class="dt-mi">info</span>
<span>This tool is under development.</span>
<span>This tool is coming soon.</span>
</div>
<!-- Features list (st.markdown) -->
<p><strong>Features:</strong></p>
<!-- What it will do (st.markdown) -->
<p><strong>What it will do:</strong></p>
<ul>
<li>Column-level validation rules (not null, unique, regex pattern, range, enum)</li>
<li>Cross-column validation (e.g., start_date &lt; end_date)</li>
<li>Data quality score per column and overall</li>
<li>Generate PDF quality report</li>
<li>Generate Excel report with flagged rows highlighted</li>
<li>Summary dashboard: pass/fail counts, severity breakdown</li>
<li>Check each column against rules you set (no blanks, no duplicates, matches a pattern, within a range, from a set list)</li>
<li>Check rules across columns (for example, start date is before end date)</li>
<li>Give each column and the whole file a quality score</li>
<li>Export a PDF quality report</li>
<li>Export an Excel report with the problem rows highlighted</li>
<li>Show a summary of what passed, what failed, and how serious each issue is</li>
</ul>
<hr class="dt-divider">
<!-- File upload (functional) -->
<label class="dt-label">Import CSV or Excel file</label>
<div class="dt-uploader">
<div class="dt-uploader-text">
<span class="hint"><span class="dt-mi" style="vertical-align:-4px">upload_file</span> Drag and drop file here</span>
<span class="sub">Import a file to preview. Processing is not yet available.</span>
</div>
<button class="dt-btn">Browse files</button>
</div>
<!-- Placeholder options -->
<h3>Validation Rules</h3>
<label class="dt-label">Load rules file (JSON)</label>
<div class="dt-uploader" style="opacity:0.55">
<div class="dt-uploader-text">
<span class="hint"><span class="dt-mi" style="vertical-align:-4px">upload_file</span> Drag and drop file here</span>
<span class="sub">JSON</span>
</div>
<button class="dt-btn is-disabled" disabled>Browse files</button>
</div>
<div class="dt-field">
<label class="dt-label">Quick checks</label>
<div class="dt-multiselect" style="opacity:0.55">
<span class="dt-ms-placeholder">Choose options</span>
</div>
</div>
<h3>Report Format</h3>
<div class="dt-field" style="max-width:320px">
<label class="dt-label">Output format</label>
<div class="dt-select" style="opacity:0.55">Excel (flagged rows)</div>
</div>
<hr class="dt-divider">
<button class="dt-btn dt-btn-primary dt-btn-block is-disabled" disabled>Validate &amp; Generate Report</button>
<button class="dt-btn dt-btn-primary"><span class="dt-mi">notifications</span> Notify me when this ships</button>
</div>
</main>

View File

@@ -19,7 +19,16 @@
<!-- Tool header -->
<div class="dt-tool-header">
<h1>Automated Workflows</h1>
<button class="dt-help-btn"><span class="dt-mi">help_outline</span> Help</button>
<div class="dt-tool-header-actions">
<span class="dt-privacy-pill">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor">
<rect x="4" y="11" width="16" height="10" rx="2"/>
<path d="M8 11V7a4 4 0 018 0v4"/>
</svg>
Runs 100% locally
</span>
<button class="dt-help-btn"><span class="dt-mi">help_outline</span> Help</button>
</div>
</div>
<p class="dt-tool-caption">Run several tools in a row — save the steps once, reuse them anytime.</p>
@@ -67,69 +76,192 @@
<summary>Options</summary>
<div class="dt-expander-body">
<!-- Mode radio -->
<!-- Mode radio. Editing the steps below auto-switches the mode from the
recommended default to "Build interactively" (same precedence-visibility
pattern as Fix Missing Values: the active state is made legible, and the
default it superseded is marked "· modified"). -->
<div class="dt-field">
<label class="dt-label">How would you like to define the pipeline?</label>
<div class="dt-radio-row" style="flex-direction:column;gap:9px">
<span class="dt-radio on"><span class="dot"></span> Use the recommended default (text-clean → format → missing → dedup)</span>
<span class="dt-radio"><span class="dot"></span> Build interactively</span>
<span class="dt-radio"><span class="dot"></span> Use the recommended default (Clean Text → Standardize → Fix Missing → Find Duplicates) <span class="dt-count-pill warn" style="margin-left:4px">· modified</span></span>
<span class="dt-radio on"><span class="dot"></span> Build interactively</span>
<span class="dt-radio"><span class="dot"></span> Import a saved pipeline JSON</span>
</div>
</div>
<div class="dt-precedence">
<span class="dt-mi">edit</span>
<span>You started from the recommended default and edited a step, so the mode switched to <strong>Build interactively</strong>. The steps below are now yours to change — pick <strong>recommended default</strong> again to discard your edits and restore the suggested order.</span>
</div>
<p class="dt-caption" style="margin:10px 0">
Edit the table to add, remove, reorder (drag the row index), enable, or configure each step.
Add, remove, reorder (drag the row index), enable, or configure each step.
Open a step's <strong>Configure</strong> panel to set its options in plain language.
Tool order is recommended, not enforced — violations surface as warnings below the table.
</p>
<!-- Pipeline editor (st.data_editor: Tool selectbox · Enabled checkbox · Options JSON) -->
<!-- Pipeline editor. Each step row carries an enable toggle + a "Configure"
expander that reveals that tool's OWN controls as the editing surface
(built from .dt-* form classes). Raw per-row JSON has been removed;
JSON survives only as import/export under "Advanced" below. -->
<div class="dt-table-wrap">
<table class="dt-table">
<thead>
<tr>
<th class="idx"></th>
<th>Tool</th>
<th>Enabled</th>
<th>Options (JSON)</th>
<th>Step</th>
<th style="text-align:center">Enabled</th>
<th style="text-align:right">Configure</th>
</tr>
</thead>
<tbody>
<tr>
<td class="idx">≡ 0</td>
<td>text_clean <span class="dt-mi" style="font-size:14px;vertical-align:-2px;color:var(--ink-tertiary)">expand_more</span></td>
<td><div style="font-weight:500" title="text_clean">Clean Text</div><div class="dt-caption" style="margin:2px 0 0">Trim spaces, collapse repeats, leave case as-is</div></td>
<td><span class="dt-check on" style="margin:0;justify-content:center"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td>{"trim": true, "collapse_whitespace": true}</td>
</tr>
<tr>
<td class="idx">≡ 1</td>
<td>format_standardize <span class="dt-mi" style="font-size:14px;vertical-align:-2px;color:var(--ink-tertiary)">expand_more</span></td>
<td><span class="dt-check on" style="margin:0;justify-content:center"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td>{"column_types": {"phone": "phone", "signup_date": "date"}}</td>
</tr>
<tr>
<td class="idx">≡ 2</td>
<td>missing <span class="dt-mi" style="font-size:14px;vertical-align:-2px;color:var(--ink-tertiary)">expand_more</span></td>
<td><span class="dt-check on" style="margin:0;justify-content:center"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td>{"strategy": "flag", "sentinels": ["N/A", "—"]}</td>
</tr>
<tr>
<td class="idx">≡ 3</td>
<td>dedup <span class="dt-mi" style="font-size:14px;vertical-align:-2px;color:var(--ink-tertiary)">expand_more</span></td>
<td><span class="dt-check on" style="margin:0;justify-content:center"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td>{"survivor_rule": "most_complete", "merge": true}</td>
</tr>
<tr>
<td class="idx" style="color:var(--ink-tertiary)"></td>
<td colspan="3" style="color:var(--ink-tertiary);font-family:var(--font-sans)">Add row</td>
<td style="text-align:right;color:var(--ink-tertiary)"><span class="dt-mi" style="font-size:16px;vertical-align:-3px">tune</span> Configure <span class="dt-mi" style="font-size:14px;vertical-align:-2px">expand_more</span></td>
</tr>
</tbody>
</table>
</div>
<!-- text_clean config panel (open to show the per-step editing surface) -->
<details class="dt-expander" open style="margin:6px 0 10px">
<summary>Configure: Clean Text</summary>
<div class="dt-expander-body">
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Trim leading &amp; trailing whitespace</div>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Collapse repeated spaces to one</div>
<div class="dt-check"><span class="box"></span> Normalize smart quotes &amp; dashes to plain ASCII</div>
<div class="dt-field">
<label class="dt-label">Letter case</label>
<div class="dt-select">Leave as-is</div>
</div>
</div>
</details>
<div class="dt-table-wrap">
<table class="dt-table">
<tbody>
<tr>
<td class="idx">≡ 1</td>
<td><div style="font-weight:500" title="format_standardize">Standardize Formats</div><div class="dt-caption" style="margin:2px 0 0">Format phone as phone, signup_date as a date</div></td>
<td><span class="dt-check on" style="margin:0;justify-content:center"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td style="text-align:right;color:var(--ink-tertiary)"><span class="dt-mi" style="font-size:16px;vertical-align:-3px">tune</span> Configure <span class="dt-mi" style="font-size:14px;vertical-align:-2px">chevron_right</span></td>
</tr>
</tbody>
</table>
</div>
<!-- format_standardize config panel (collapsed) -->
<details class="dt-expander" style="margin:6px 0 10px">
<summary>Configure: Standardize Formats</summary>
<div class="dt-expander-body">
<p class="dt-caption" style="margin-bottom:8px">Choose a target format for each column. Columns left as &ldquo;Leave as-is&rdquo; are untouched.</p>
<div class="dt-table-wrap">
<table class="dt-table">
<thead><tr><th>Column</th><th>Format as</th></tr></thead>
<tbody>
<tr><td>name</td><td><span class="dt-select" style="display:inline-block;min-width:150px;padding:4px 24px 4px 10px;color:var(--ink-tertiary)">Leave as-is</span></td></tr>
<tr><td>email</td><td><span class="dt-select" style="display:inline-block;min-width:150px;padding:4px 24px 4px 10px;color:var(--ink-tertiary)">Leave as-is</span></td></tr>
<tr><td>phone</td><td><span class="dt-select" style="display:inline-block;min-width:150px;padding:4px 24px 4px 10px">Phone number</span></td></tr>
<tr><td>signup_date</td><td><span class="dt-select" style="display:inline-block;min-width:150px;padding:4px 24px 4px 10px">Date</span></td></tr>
</tbody>
</table>
</div>
</div>
</details>
<div class="dt-table-wrap">
<table class="dt-table">
<tbody>
<tr>
<td class="idx">≡ 2</td>
<td><div style="font-weight:500" title="missing">Fix Missing Values</div><div class="dt-caption" style="margin:2px 0 0">Flag blank cells (treat &ldquo;N/A&rdquo; and &ldquo;&rdquo; as blank)</div></td>
<td><span class="dt-check on" style="margin:0;justify-content:center"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td style="text-align:right;color:var(--ink-tertiary)"><span class="dt-mi" style="font-size:16px;vertical-align:-3px">tune</span> Configure <span class="dt-mi" style="font-size:14px;vertical-align:-2px">chevron_right</span></td>
</tr>
</tbody>
</table>
</div>
<!-- missing config panel (collapsed) -->
<details class="dt-expander" style="margin:6px 0 10px">
<summary>Configure: Fix Missing Values</summary>
<div class="dt-expander-body">
<div class="dt-field">
<label class="dt-label">What should happen to blank cells?</label>
<div class="dt-radio-row" style="flex-direction:column;gap:8px">
<span class="dt-radio on"><span class="dot"></span> Flag them (mark blanks, change nothing)</span>
<span class="dt-radio"><span class="dot"></span> Fill them in (numbers → median, text → most common)</span>
<span class="dt-radio"><span class="dot"></span> Drop rows that have any blank</span>
</div>
</div>
<div class="dt-field">
<label class="dt-label">Treat these as blank (comma-separated)</label>
<div class="dt-input">N/A, —</div>
<div class="dt-help-text">Matched case-insensitively after stripping whitespace.</div>
</div>
</div>
</details>
<div class="dt-table-wrap">
<table class="dt-table">
<tbody>
<tr>
<td class="idx">≡ 3</td>
<td><div style="font-weight:500" title="dedup">Find Duplicates</div><div class="dt-caption" style="margin:2px 0 0">Match on email &amp; phone; keep the most complete row, merge in missing fields</div></td>
<td><span class="dt-check on" style="margin:0;justify-content:center"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td style="text-align:right;color:var(--ink-tertiary)"><span class="dt-mi" style="font-size:16px;vertical-align:-3px">tune</span> Configure <span class="dt-mi" style="font-size:14px;vertical-align:-2px">chevron_right</span></td>
</tr>
<tr>
<td class="idx" style="color:var(--ink-tertiary)"></td>
<td colspan="3" style="color:var(--ink-tertiary);font-family:var(--font-sans)">Add step</td>
</tr>
</tbody>
</table>
</div>
<!-- dedup config panel (collapsed) -->
<details class="dt-expander" style="margin:6px 0 10px">
<summary>Configure: Find Duplicates</summary>
<div class="dt-expander-body">
<div class="dt-field">
<label class="dt-label">When rows match, which one survives?</label>
<div class="dt-select">Keep the most complete row</div>
<div class="dt-help-text">Other options: keep the first seen, keep the last seen.</div>
</div>
<div class="dt-check on"><span class="box"><span class="dt-mi">check</span></span> Merge matched rows (fill each survivor's blanks from its duplicates)</div>
<div class="dt-field">
<label class="dt-label">Match on these columns</label>
<div class="dt-multiselect">
<span class="dt-ms-chip">email <span class="x"></span></span>
<span class="dt-ms-chip">phone <span class="x"></span></span>
</div>
</div>
</div>
</details>
<!-- Validation: pipeline is in recommended order, so no warning shown (warning block omitted) -->
<!-- Advanced: JSON is import/export only, never the per-step editing surface -->
<details class="dt-expander" style="margin-top:14px">
<summary>Advanced — import / export pipeline as JSON</summary>
<div class="dt-expander-body">
<p class="dt-caption" style="margin-bottom:8px">For sharing or version control. Editing is done in the step panels above — this is just the saved form of the same settings.</p>
<div class="dt-code">{
"version": 1,
"steps": [
{"tool": "text_clean", "enabled": true, "options": {"trim": true, "collapse_whitespace": true}},
{"tool": "format_standardize", "enabled": true, "options": {"column_types": {"phone": "phone", "signup_date": "date"}}},
{"tool": "missing", "enabled": true, "options": {"strategy": "flag", "sentinels": ["N/A", "—"]}},
{"tool": "dedup", "enabled": true, "options": {"survivor_rule": "most_complete", "merge": true, "keys": ["email", "phone"]}}
]
}</div>
<div class="dt-btn-row" style="margin-top:10px">
<button class="dt-btn"><span class="dt-mi">upload</span> Import JSON</button>
<button class="dt-btn"><span class="dt-mi">download</span> Export JSON</button>
</div>
</div>
</details>
<!-- Nested explainer expander -->
<details class="dt-expander" open style="margin-top:14px">
<details class="dt-expander" style="margin-top:14px">
<summary>Recommended tool order — why each step belongs where it does</summary>
<div class="dt-expander-body">
<p><strong>text_clean</strong> before <strong>format_standardize</strong> — format parsers (phone / currency / date) fail on smart-quote-contaminated or NBSP-padded input — clean text first</p>
@@ -161,39 +293,49 @@
</div>
<h4>Per-step summary</h4>
<!-- Standalone error column removed: status is one pill per step. A failed step
turns the pill danger and surfaces its message in a detail row directly below
that step (shown only on failure); successful steps just show a green pill.
Summaries are plain-English phrases, not raw JSON. Demo: this run completed
cleanly (all four ok, matching the metrics above) — the format_standardize
row carries a warn pill + detail row to illustrate how a non-fatal step issue
surfaces inline without a dedicated always-empty column. -->
<div class="dt-table-wrap">
<table class="dt-table">
<thead>
<tr><th>step</th><th>status</th><th>elapsed_ms</th><th>summary</th><th>error</th></tr>
<tr><th>step</th><th>status</th><th>elapsed</th><th>summary</th></tr>
</thead>
<tbody>
<tr>
<td>text_clean</td>
<td><span class="dt-count-pill success">ok</span></td>
<td>214</td>
<td>{"cells_changed": 1204, "columns": ["name", "city"]}</td>
<td></td>
<td>214 ms</td>
<td style="font-family:var(--font-sans)">1,204 cells changed in name &amp; city</td>
</tr>
<tr>
<td>format_standardize</td>
<td><span class="dt-count-pill success">ok</span></td>
<td>388</td>
<td>{"phone": 18301, "signup_date": 17996}</td>
<td><span class="dt-count-pill warn"><span class="dt-mi" style="font-size:13px;margin-right:3px">warning</span> ok · 141 skipped</span></td>
<td>388 ms</td>
<td style="font-family:var(--font-sans)">18,301 phones and 17,996 dates standardized</td>
</tr>
<tr style="background:var(--warn-fill)">
<td></td>
<td colspan="3" style="font-family:var(--font-sans);color:var(--warn);white-space:normal">
<span class="dt-mi" style="font-size:15px;vertical-align:-3px;margin-right:4px">info</span>
141 phone values didn't match any known pattern and were left unchanged. The step still completed — review them in the output preview if needed.
</td>
</tr>
<tr>
<td>missing</td>
<td><span class="dt-count-pill success">ok</span></td>
<td>121</td>
<td>{"flagged_cells": 642, "sentinels_found": ["—"]}</td>
<td></td>
<td>121 ms</td>
<td style="font-family:var(--font-sans)">642 blank cells flagged (sentinel &ldquo;&rdquo;)</td>
</tr>
<tr>
<td>dedup</td>
<td><span class="dt-count-pill success">ok</span></td>
<td>911</td>
<td>{"input_rows": 18442, "output_rows": 18130, "duplicates_removed": 312, "groups": 147}</td>
<td></td>
<td>911 ms</td>
<td style="font-family:var(--font-sans)">312 duplicates removed across 147 groups (18,442 → 18,130 rows)</td>
</tr>
</tbody>
</table>

View File

@@ -19,7 +19,16 @@
<!-- Tool header -->
<div class="dt-tool-header">
<h1>PDF to CSV</h1>
<button class="dt-help-btn"><span class="dt-mi">help_outline</span> Help</button>
<div class="dt-tool-header-actions">
<span class="dt-privacy-pill">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor">
<rect x="4" y="11" width="16" height="10" rx="2"/>
<path d="M8 11V7a4 4 0 018 0v4"/>
</svg>
Runs 100% locally
</span>
<button class="dt-help-btn"><span class="dt-mi">help_outline</span> Help</button>
</div>
</div>
<p class="dt-tool-caption">Pull transactions out of bank-statement PDFs into a clean CSV file.</p>
@@ -74,7 +83,7 @@
<span class="dt-file-name">statement-feb-2026.pdf</span>
<span class="dt-file-size" style="margin-left:auto">147.2 KB</span>
</div>
<button class="dt-file-add">
<button class="dt-file-add" style="margin-left:-16px;margin-right:-16px;width:calc(100% + 32px)">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor"><path d="M12 5v14M5 12h14"/></svg> Add more files
</button>
</div>
@@ -100,84 +109,89 @@
<!-- Results -->
<h4>47 candidate transaction(s) from 2 file(s)</h4>
<p class="dt-caption">Uncheck rows to exclude. Edit any cell to fix a value the scanner got wrong. The <code>raw</code> column shows the original PDF text for that row.</p>
<p class="dt-caption">Uncheck rows to exclude. Edit any cell to fix a value the scanner got wrong. Hover the <span class="dt-mi" style="font-size:15px;vertical-align:-3px;color:var(--ink-tertiary)">info</span> on any row to see the original PDF text it came from.</p>
<div class="dt-table-wrap">
<!-- overflow-x:auto belt-and-suspenders: any residual width scrolls instead of clipping (app.css .dt-table-wrap is overflow:hidden) -->
<div class="dt-table-wrap" style="overflow-x:auto">
<table class="dt-table">
<thead>
<tr>
<th>Include</th>
<th></th>
<th>date</th>
<th>description</th>
<th>amount_debit</th>
<th>amount_credit</th>
<th>account_number</th>
<th>source_file</th>
<th>page</th>
<th>raw</th>
</tr>
</thead>
<tbody>
<tr>
<td><span class="dt-check on" style="margin:0"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td>2026-01-03</td><td>OPENING BALANCE</td><td></td><td></td><td>****4821</td><td>statement-jan-2026.pdf</td><td class="idx">1</td><td>01/03 OPENING BALANCE 2,140.55</td>
<td class="idx" title="raw: 01/03 OPENING BALANCE 2,140.55" style="cursor:help"><span class="dt-mi" style="font-size:16px">info</span></td>
<td>2026-01-03</td><td>OPENING BALANCE</td><td></td><td></td><td>****4821</td><td>statement-jan-2026.pdf</td>
</tr>
<tr>
<td><span class="dt-check on" style="margin:0"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td>2026-01-05</td><td>POS PURCHASE WHOLE FOODS MKT</td><td>84.12</td><td></td><td>****4821</td><td>statement-jan-2026.pdf</td><td class="idx">1</td><td>01/05 POS PURCHASE WHOLE FOODS MKT (84.12)</td>
<td class="idx" title="raw: 01/05 POS PURCHASE WHOLE FOODS MKT (84.12)" style="cursor:help"><span class="dt-mi" style="font-size:16px">info</span></td>
<td>2026-01-05</td><td>POS PURCHASE WHOLE FOODS MKT</td><td>84.12</td><td></td><td>****4821</td><td>statement-jan-2026.pdf</td>
</tr>
<tr>
<td><span class="dt-check on" style="margin:0"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td>2026-01-08</td><td>ACH DEPOSIT PAYROLL ACME CORP</td><td></td><td>3,250.00</td><td>****4821</td><td>statement-jan-2026.pdf</td><td class="idx">1</td><td>01/08 ACH DEPOSIT PAYROLL ACME CORP 3,250.00</td>
<td class="idx" title="raw: 01/08 ACH DEPOSIT PAYROLL ACME CORP 3,250.00" style="cursor:help"><span class="dt-mi" style="font-size:16px">info</span></td>
<td>2026-01-08</td><td>ACH DEPOSIT PAYROLL ACME CORP</td><td></td><td>3,250.00</td><td>****4821</td><td>statement-jan-2026.pdf</td>
</tr>
<tr>
<td><span class="dt-check on" style="margin:0"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td>2026-01-11</td><td>ONLINE TRANSFER TO SAVINGS</td><td>500.00</td><td></td><td>****4821</td><td>statement-jan-2026.pdf</td><td class="idx">2</td><td>01/11 ONLINE TRANSFER TO SAVINGS (500.00)</td>
<td class="idx" title="raw: 01/11 ONLINE TRANSFER TO SAVINGS (500.00)" style="cursor:help"><span class="dt-mi" style="font-size:16px">info</span></td>
<td>2026-01-11</td><td>ONLINE TRANSFER TO SAVINGS</td><td>500.00</td><td></td><td>****4821</td><td>statement-jan-2026.pdf</td>
</tr>
<tr>
<td><span class="dt-check" style="margin:0"><span class="box"></span></span></td>
<td class="dt-cell-flag">2026-01-12</td><td class="dt-cell-flag">INTEREST RATE 0.50% APY DETAIL</td><td></td><td></td><td>****4821</td><td>statement-jan-2026.pdf</td><td class="idx">2</td><td>01/12 INTEREST RATE 0.50% APY 0.00</td>
<td class="idx" title="raw: 01/12 INTEREST RATE 0.50% APY 0.00" style="cursor:help"><span class="dt-mi" style="font-size:16px">info</span></td>
<td class="dt-cell-flag">2026-01-12</td><td class="dt-cell-flag">INTEREST RATE 0.50% APY DETAIL <span style="font-family:var(--font-sans);font-size:11px;font-weight:500;background:var(--warn-fill);color:var(--warn);border-radius:999px;padding:1px 7px;white-space:nowrap">auto-excluded · not a transaction line</span></td><td></td><td></td><td>****4821</td><td>statement-jan-2026.pdf</td>
</tr>
<tr>
<td><span class="dt-check on" style="margin:0"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td>2026-01-14</td><td>DEBIT CARD SHELL OIL #2287</td><td>52.40</td><td></td><td>****4821</td><td>statement-jan-2026.pdf</td><td class="idx">2</td><td>01/14 DEBIT CARD SHELL OIL #2287 (52.40)</td>
<td class="idx" title="raw: 01/14 DEBIT CARD SHELL OIL #2287 (52.40)" style="cursor:help"><span class="dt-mi" style="font-size:16px">info</span></td>
<td>2026-01-14</td><td>DEBIT CARD SHELL OIL #2287</td><td>52.40</td><td></td><td>****4821</td><td>statement-jan-2026.pdf</td>
</tr>
<tr>
<td><span class="dt-check on" style="margin:0"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td>2026-02-02</td><td>POS PURCHASE TRADER JOES #511</td><td>61.88</td><td></td><td>****4821</td><td>statement-feb-2026.pdf</td><td class="idx">1</td><td>02/02 POS PURCHASE TRADER JOES #511 (61.88)</td>
<td class="idx" title="raw: 02/02 POS PURCHASE TRADER JOES #511 (61.88)" style="cursor:help"><span class="dt-mi" style="font-size:16px">info</span></td>
<td>2026-02-02</td><td>POS PURCHASE TRADER JOES #511</td><td>61.88</td><td></td><td>****4821</td><td>statement-feb-2026.pdf</td>
</tr>
<tr>
<td><span class="dt-check on" style="margin:0"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td>2026-02-06</td><td>ACH DEPOSIT PAYROLL ACME CORP</td><td></td><td>3,250.00</td><td>****4821</td><td>statement-feb-2026.pdf</td><td class="idx">2</td><td>02/06 ACH DEPOSIT PAYROLL ACME CORP 3,250.00</td>
<td class="idx" title="raw: 02/06 ACH DEPOSIT PAYROLL ACME CORP 3,250.00" style="cursor:help"><span class="dt-mi" style="font-size:16px">info</span></td>
<td>2026-02-06</td><td>ACH DEPOSIT PAYROLL ACME CORP</td><td></td><td>3,250.00</td><td>****4821</td><td>statement-feb-2026.pdf</td>
</tr>
<tr>
<td><span class="dt-check on" style="margin:0"><span class="box"><span class="dt-mi">check</span></span></span></td>
<td>2026-02-09</td><td>CHECK #1043</td><td>1,200.00</td><td></td><td>****4821</td><td>statement-feb-2026.pdf</td><td class="idx">2</td><td>02/09 CHECK #1043 (1,200.00)</td>
<td class="idx" title="raw: 02/09 CHECK #1043 (1,200.00)" style="cursor:help"><span class="dt-mi" style="font-size:16px">info</span></td>
<td>2026-02-09</td><td>CHECK #1043</td><td>1,200.00</td><td></td><td>****4821</td><td>statement-feb-2026.pdf</td>
</tr>
</tbody>
</table>
</div>
<!-- Download row: download button (left) + columns multiselect (right) -->
<div class="dt-row" style="margin-top:14px;align-items:flex-start">
<div style="flex:2">
<button class="dt-btn dt-btn-primary dt-btn-block">Download 46 rows as CSV</button>
<p class="dt-caption" style="margin-top:8px">46 of 47 rows selected.</p>
</div>
<div style="flex:3">
<div class="dt-field" style="margin:0">
<label class="dt-label">Columns to include in CSV</label>
<div class="dt-multiselect">
<span class="dt-ms-chip">date <span class="x"></span></span>
<span class="dt-ms-chip">description <span class="x"></span></span>
<span class="dt-ms-chip">amount_debit <span class="x"></span></span>
<span class="dt-ms-chip">amount_credit <span class="x"></span></span>
<span class="dt-ms-chip">account_number <span class="x"></span></span>
<span class="dt-ms-chip">source_file <span class="x"></span></span>
</div>
<div class="dt-help-text"><code>page</code> and <code>raw</code> are kept off by default; tick them if you want them in the file.</div>
<!-- Download area: configure-then-act — column selector first, download button below -->
<div style="margin-top:14px;max-width:520px">
<div class="dt-field" style="margin:0 0 14px">
<label class="dt-label">Columns to include in CSV</label>
<div class="dt-multiselect">
<span class="dt-ms-chip">date <span class="x"></span></span>
<span class="dt-ms-chip">description <span class="x"></span></span>
<span class="dt-ms-chip">amount_debit <span class="x"></span></span>
<span class="dt-ms-chip">amount_credit <span class="x"></span></span>
<span class="dt-ms-chip">account_number <span class="x"></span></span>
<span class="dt-ms-chip">source_file <span class="x"></span></span>
</div>
<div class="dt-help-text"><code>page</code> and <code>raw</code> are kept off by default; tick them if you want them in the file.</div>
</div>
<button class="dt-btn dt-btn-primary dt-btn-block">Download 46 rows as CSV</button>
<p class="dt-caption" style="margin-top:8px">1 row excluded (INTEREST RATE detail line).</p>
</div>
</div>

View File

@@ -19,7 +19,16 @@
<!-- Tool header -->
<div class="dt-tool-header">
<h1>Reconcile Two Files</h1>
<button class="dt-help-btn"><span class="dt-mi">help_outline</span> Help</button>
<div class="dt-tool-header-actions">
<span class="dt-privacy-pill">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor">
<rect x="4" y="11" width="16" height="10" rx="2"/>
<path d="M8 11V7a4 4 0 018 0v4"/>
</svg>
Runs 100% locally
</span>
<button class="dt-help-btn"><span class="dt-mi">help_outline</span> Help</button>
</div>
</div>
<p class="dt-tool-caption">Compare two lists of transactions (e.g. bank vs. ledger) and flag what doesn't match.</p>
@@ -30,18 +39,11 @@
<!-- Left side -->
<div>
<h4 style="margin-top:0">Left (e.g. bank feed)</h4>
<div class="dt-uploader">
<div class="dt-uploader-text">
<span class="hint"><span class="dt-mi" style="vertical-align:-4px">upload_file</span> Drag and drop file here</span>
<span class="sub">CSV, TSV, XLSX, XLS</span>
</div>
<button class="dt-btn">Browse files</button>
</div>
<div class="dt-file-chip">
<span class="dt-file-icon-chip"><svg viewBox="0 0 24 24" fill="none" stroke="currentColor"><path d="M14 2H6a2 2 0 00-2 2v16a2 2 0 002 2h12a2 2 0 002-2V8z"/><path d="M14 2v6h6"/></svg></span>
<span class="name">bank_feed_may.csv</span>
<span class="size">214 KB</span>
<div class="dt-alert info">
<span class="dt-mi">description</span>
<span>Using <strong>bank_feed_may.csv</strong> from the upload screen.</span>
</div>
<button class="dt-btn" style="margin-bottom:4px">Use a different file</button>
<p class="dt-caption" style="margin-top:6px"><code>bank_feed_may.csv</code> — 1,204 rows, 4 columns</p>
<details class="dt-expander">
<summary>Preview left (e.g. bank feed)</summary>
@@ -63,18 +65,11 @@
<!-- Right side -->
<div>
<h4 style="margin-top:0">Right (e.g. ledger)</h4>
<div class="dt-uploader">
<div class="dt-uploader-text">
<span class="hint"><span class="dt-mi" style="vertical-align:-4px">upload_file</span> Drag and drop file here</span>
<span class="sub">CSV, TSV, XLSX, XLS</span>
</div>
<button class="dt-btn">Browse files</button>
</div>
<div class="dt-file-chip">
<span class="dt-file-icon-chip"><svg viewBox="0 0 24 24" fill="none" stroke="currentColor"><path d="M14 2H6a2 2 0 00-2 2v16a2 2 0 002 2h12a2 2 0 002-2V8z"/><path d="M14 2v6h6"/></svg></span>
<span class="name">ledger_may.xlsx</span>
<span class="size">96 KB</span>
<div class="dt-alert info">
<span class="dt-mi">description</span>
<span>Using <strong>ledger_may.xlsx</strong> from the upload screen.</span>
</div>
<button class="dt-btn" style="margin-bottom:4px">Use a different file</button>
<p class="dt-caption" style="margin-top:6px"><code>ledger_may.xlsx</code> — 1,198 rows, 5 columns</p>
<details class="dt-expander">
<summary>Preview right (e.g. ledger)</summary>
@@ -105,7 +100,7 @@
<h4 style="margin-top:0">Left columns</h4>
<div class="dt-field"><label class="dt-label">Date column (optional)</label><div class="dt-select">posted_date</div></div>
<div class="dt-field"><label class="dt-label">Description column (optional)</label><div class="dt-select">description</div></div>
<div class="dt-field"><label class="dt-label">Amount column</label><div class="dt-select">amount</div></div>
<div class="dt-field"><label class="dt-label">Amount column <span class="req">*</span></label><div class="dt-select">amount</div></div>
<div class="dt-field"><label class="dt-label">Reference columns (optional, e.g. check / invoice no.)</label>
<div class="dt-multiselect"><span class="dt-ms-chip">ref <span class="x"></span></span></div></div>
</div>
@@ -114,9 +109,10 @@
<h4 style="margin-top:0">Right columns</h4>
<div class="dt-field"><label class="dt-label">Date column (optional)</label><div class="dt-select">txn_date</div></div>
<div class="dt-field"><label class="dt-label">Description column (optional)</label><div class="dt-select">memo</div></div>
<div class="dt-field"><label class="dt-label">Amount column</label><div class="dt-select">value</div></div>
<div class="dt-field"><label class="dt-label">Amount column <span class="req">*</span></label><div class="dt-select">value</div></div>
<div class="dt-field"><label class="dt-label">Reference columns (must match left count)</label>
<div class="dt-multiselect"><span class="dt-ms-chip">invoice_no <span class="x"></span></span></div></div>
<div class="dt-multiselect"><span class="dt-ms-chip">invoice_no <span class="x"></span></span></div>
<div class="dt-help-text" style="color:var(--success);display:flex;align-items:center;gap:5px"><span class="dt-mi" style="font-family:'Material Symbols Outlined';font-size:15px;line-height:1">check_circle</span> 1 reference each side — counts match</div></div>
</div>
</div>
@@ -132,7 +128,7 @@
<div class="dt-input">1</div>
<div class="dt-help-text">Allow N calendar days of drift between posting dates.</div></div>
<div class="dt-field"><label class="dt-label">Invert right amount sign</label>
<div class="dt-check" style="margin-top:8px"><span class="box"></span> Invert right amount sign</div>
<div class="dt-check" style="margin-top:8px"><span class="box"></span></div>
<div class="dt-help-text">Use when one side records debits as positive and the other as negative.</div></div>
</div>
<div class="dt-field"><label class="dt-label">Description similarity boost (0 disables)</label>
@@ -150,56 +146,34 @@
<!-- Results -->
<h2>Results</h2>
<div class="dt-metrics">
<div class="dt-metric"><div class="label">Matched</div><div class="value">1,173</div></div>
<div class="dt-metric"><div class="label">Review</div><div class="value">9</div></div>
<div class="dt-metric"><div class="label">Unmatched left</div><div class="value">22</div></div>
<div class="dt-metric"><div class="label">Unmatched right</div><div class="value">16</div></div>
<div class="dt-metric"><div class="label">Matched</div><div class="value">1,173</div></div>
</div>
<p class="dt-caption">Coverage: 97.4% of the larger side</p>
<!-- Tabs (st.tabs) — Matched active -->
<!-- Tabs (st.tabs) — exceptions-first; Review active by default -->
<div class="dt-tabs">
<span class="dt-tab is-active">Matched (1,173)</span>
<span class="dt-tab">Review (9)</span>
<span class="dt-tab is-active">Review (9)</span>
<span class="dt-tab">Unmatched left (22)</span>
<span class="dt-tab">Unmatched right (16)</span>
<span class="dt-tab">Matched (1,173)</span>
</div>
<!-- Matched tab content -->
<p class="dt-caption">Preview of first 25 of 1,173 rows — download the CSV below for the full set.</p>
<!-- Active tab content: Review (exceptions-first default) -->
<p class="dt-caption">Pairs flagged because the algorithm couldn't pick a single best match (e.g. multiple equally-good candidates). Use the left/right indices to disambiguate manually.</p>
<div class="dt-table-wrap">
<table class="dt-table">
<thead><tr>
<th>left_posted_date</th><th>left_description</th><th>left_amount</th>
<th>right_txn_date</th><th>right_memo</th><th>right_value</th><th>amount_diff</th>
</tr></thead>
<thead><tr><th>left_idx</th><th>left_amount</th><th>right_idx</th><th>right_value</th><th>candidates</th></tr></thead>
<tbody>
<tr><td>2026-05-01</td><td>ACME SUPPLIES</td><td>-1240.00</td><td>2026-05-01</td><td>Acme Supplies Inc</td><td>-1240.00</td><td class="dt-cell-add">0.00</td></tr>
<tr><td>2026-05-02</td><td>PAYROLL RUN</td><td>-8800.00</td><td>2026-05-02</td><td>Monthly payroll</td><td>-8800.00</td><td class="dt-cell-add">0.00</td></tr>
<tr><td>2026-05-03</td><td>CLIENT GLOBEX</td><td>5200.00</td><td>2026-05-03</td><td>Globex retainer</td><td>5200.00</td><td class="dt-cell-add">0.00</td></tr>
<tr><td>2026-05-04</td><td>UTILITY CO</td><td>-318.42</td><td>2026-05-04</td><td>City Utilities</td><td>-318.40</td><td class="dt-cell-flag">0.02</td></tr>
<tr><td>2026-05-06</td><td>OFFICE DEPOT</td><td>-89.15</td><td>2026-05-07</td><td>Office supplies</td><td>-89.15</td><td class="dt-cell-add">0.00</td></tr>
<tr><td>118</td><td>-450.00</td><td>121, 209</td><td>-450.00</td><td class="dt-cell-flag">2 equal</td></tr>
<tr><td>203</td><td>1000.00</td><td>198, 244</td><td>1000.00</td><td class="dt-cell-flag">2 equal</td></tr>
</tbody>
</table>
</div>
<!-- Other tab previews shown as collapsed expanders for review context -->
<details class="dt-expander">
<summary>Review (9) — ambiguous candidates</summary>
<div class="dt-expander-body">
<p class="dt-caption">Pairs flagged because the algorithm couldn't pick a single best match (e.g. multiple equally-good candidates). Use the left/right indices to disambiguate manually.</p>
<div class="dt-table-wrap">
<table class="dt-table">
<thead><tr><th>left_idx</th><th>left_amount</th><th>right_idx</th><th>right_value</th><th>candidates</th></tr></thead>
<tbody>
<tr><td>118</td><td>-450.00</td><td>121, 209</td><td>-450.00</td><td class="dt-cell-flag">2 equal</td></tr>
<tr><td>203</td><td>1000.00</td><td>198, 244</td><td>1000.00</td><td class="dt-cell-flag">2 equal</td></tr>
</tbody>
</table>
</div>
</div>
</details>
<details class="dt-expander">
<summary>Unmatched left (22) — only in bank_feed_may.csv</summary>
<div class="dt-expander-body">
@@ -232,14 +206,37 @@
</div>
</details>
<details class="dt-expander">
<summary>Matched (1,173) — cleanly reconciled</summary>
<div class="dt-expander-body">
<p class="dt-caption">Preview of first 25 of 1,173 rows — download the CSV below for the full set.</p>
<div class="dt-table-wrap">
<table class="dt-table">
<thead><tr>
<th>left_posted_date</th><th>left_description</th><th>left_amount</th>
<th>right_txn_date</th><th>right_memo</th><th>right_value</th><th>amount_diff</th>
</tr></thead>
<tbody>
<tr><td>2026-05-01</td><td>ACME SUPPLIES</td><td>-1240.00</td><td>2026-05-01</td><td>Acme Supplies Inc</td><td>-1240.00</td><td class="dt-cell-add">0.00</td></tr>
<tr><td>2026-05-02</td><td>PAYROLL RUN</td><td>-8800.00</td><td>2026-05-02</td><td>Monthly payroll</td><td>-8800.00</td><td class="dt-cell-add">0.00</td></tr>
<tr><td>2026-05-03</td><td>CLIENT GLOBEX</td><td>5200.00</td><td>2026-05-03</td><td>Globex retainer</td><td>5200.00</td><td class="dt-cell-add">0.00</td></tr>
<tr><td>2026-05-04</td><td>UTILITY CO</td><td>-318.42</td><td>2026-05-04</td><td>City Utilities</td><td>-318.40</td><td class="dt-cell-flag">0.02</td></tr>
<tr><td>2026-05-06</td><td>OFFICE DEPOT</td><td>-89.15</td><td>2026-05-07</td><td>Office supplies</td><td>-89.15</td><td class="dt-cell-add">0.00</td></tr>
</tbody>
</table>
</div>
</div>
</details>
<hr class="dt-divider">
<!-- Downloads (st.columns(4) of html_download_button) -->
<!-- Downloads (st.columns(4) of html_download_button) — exceptions-first,
matching the tab/metric order; four parallel exports, equal weight -->
<div class="dt-btn-row">
<button class="dt-btn dt-btn-primary">Matched CSV</button>
<button class="dt-btn">Review CSV</button>
<button class="dt-btn">Unmatched left</button>
<button class="dt-btn">Unmatched right</button>
<button class="dt-btn">Matched CSV</button>
</div>
</div>

View File

@@ -122,6 +122,19 @@ code, .dt-mono { font-family: var(--font-mono); font-size: 0.92em; font-feature-
.dt-nav-link .dt-mi { font-family: "Material Symbols Outlined"; font-size: 18px; color: var(--ink-secondary); line-height: 1; }
.dt-nav-link.is-active .dt-mi { color: var(--ink); }
.dt-nav-link.is-soon { opacity: 0.55; }
/* "Start here" front-door item — weightier than ordinary nav links so the
obvious entry point reads at a glance. Accent-fill ground + accent-hover ink,
slightly larger hit area, with bottom margin to part it from the groups below.
Layers on .dt-nav-link, so the .is-active treatment still overrides cleanly. */
.dt-nav-start {
background: var(--accent-fill); color: var(--accent-hover); font-weight: 600;
padding: 8px 10px; margin-bottom: 12px;
}
.dt-nav-start:hover { background: var(--accent-fill-strong); color: var(--accent-hover); }
.dt-nav-start .dt-mi { color: var(--accent); }
.dt-nav-start.is-active { background: var(--accent-fill-strong); color: var(--accent-hover); }
.dt-nav-start.is-active .dt-mi { color: var(--accent); }
.dt-nav-soon-tag {
margin-left: auto; font-size: 9px; font-weight: 600; letter-spacing: 0.06em;
text-transform: uppercase; color: var(--ink-tertiary);
@@ -199,6 +212,11 @@ code, .dt-mono { font-family: var(--font-mono); font-size: 0.92em; font-feature-
}
.dt-help-btn .dt-mi { font-family: "Material Symbols Outlined"; font-size: 18px; }
.dt-tool-caption { font-size: 12.5px; color: var(--ink-tertiary); line-height: 1.5; margin: 2px 0 0; }
/* Right-side actions cluster in a tool header: the local-first privacy pill +
the Help button. One shared class so every tool page aligns identically
(replaces per-page inline flex/gap/margin drift). */
.dt-tool-header-actions { display: flex; align-items: center; gap: 12px; flex-shrink: 0; margin-top: 6px; }
.dt-tool-header-actions .dt-help-btn { margin-top: 0; }
/* ===========================================================================
Buttons
@@ -288,6 +306,24 @@ code, .dt-mono { font-family: var(--font-mono); font-size: 0.92em; font-feature-
.dt-alert.error { background: var(--danger-fill); color: var(--danger); }
.dt-alert code { background: rgba(0,0,0,0.05); padding: 1px 5px; border-radius: 4px; }
/* Next-step strip — slim single-line "what to do next" suggestion shown at the
end of a tool's results. Subtle accent ground + left accent rule so it nudges
without competing with alerts; the trailing dismiss control is unobtrusive. */
.dt-next-step {
display: flex; align-items: center; gap: 10px;
background: var(--accent-fill); border-left: 3px solid var(--accent);
border-radius: var(--r-md); padding: 10px 14px; margin: 16px 0;
font-size: 13.5px; line-height: 1.4; color: var(--ink);
}
.dt-next-step .dt-mi { font-family: "Material Symbols Outlined"; font-size: 18px; color: var(--accent); flex-shrink: 0; }
.dt-next-step a { color: var(--accent); font-weight: 500; }
.dt-next-step a:hover { color: var(--accent-hover); }
.dt-next-step-dismiss {
margin-left: auto; background: transparent; border: none; cursor: pointer;
color: var(--ink-tertiary); font-size: 13px; line-height: 1; padding: 2px 4px;
}
.dt-next-step-dismiss:hover { color: var(--ink-secondary); }
/* ===========================================================================
Inputs (static representations of Streamlit widgets)
=========================================================================== */
@@ -330,6 +366,20 @@ code, .dt-mono { font-family: var(--font-mono); font-size: 0.92em; font-feature-
.dt-radio .dot { width: 16px; height: 16px; border-radius: 50%; border: 1px solid var(--border-strong); display: inline-block; flex-shrink: 0; }
.dt-radio.on .dot { border: 5px solid var(--ink); }
/* Strategy precedence legend + overridden state (Fix Missing Values).
Makes the preset -> global -> per-column resolution order legible and
visibly dims a layer when a more specific layer wins. */
.dt-precedence {
display: flex; align-items: center; gap: 8px;
background: var(--surface-hover); border: 1px solid var(--border);
border-radius: var(--r-md); padding: 9px 13px; margin: 0 0 14px;
font-size: 12.5px; color: var(--ink-secondary); line-height: 1.4;
}
.dt-precedence .dt-mi { font-family: "Material Symbols Outlined"; font-size: 18px; color: var(--ink-tertiary); flex-shrink: 0; }
.dt-precedence strong { color: var(--ink); font-weight: 600; }
.dt-radio-row.is-overridden { opacity: 0.5; }
.dt-radio-row.is-overridden .dt-radio { text-decoration: line-through; text-decoration-color: var(--ink-tertiary); }
/* Slider */
.dt-slider { margin: 14px 0 6px; }
.dt-slider .track { position: relative; height: 4px; background: var(--border-strong); border-radius: 2px; }
@@ -445,6 +495,25 @@ table.dt-table td.idx { color: var(--ink-tertiary); background: var(--surface-ho
.dt-finding-title strong { font-weight: 500; }
.dt-finding-meta { font-family: var(--font-mono); font-size: 12px; color: var(--ink-tertiary); line-height: 1.4; margin: 0; font-feature-settings: "ss02"; }
/* Overflow control — sits at the foot of a findings card when rows are hidden.
Bleeds to the card edges (cancels the .dt-card 16px padding) like .dt-file-add. */
.dt-finding-more {
display: flex; align-items: center; justify-content: center; gap: 6px;
width: calc(100% + 32px); margin: 4px -16px -16px;
padding: 11px 16px; background: var(--surface-hover);
border: none; border-top: 1px solid var(--border);
border-radius: 0 0 var(--r-lg) var(--r-lg); cursor: pointer;
font-family: var(--font-sans); font-size: 12.5px; font-weight: 500; color: var(--ink-secondary);
}
.dt-finding-more:hover { background: var(--accent-fill); color: var(--accent); }
.dt-finding-more .dt-mi { font-family: "Material Symbols Outlined"; font-size: 18px; }
/* Collapsed findings panel — the group head fills the whole card (head only,
no body). Proper state variant so the two states don't drift; replaces the
per-instance inline margin-bottom:-16px hack. */
.dt-card.is-collapsed { padding: 0; }
.dt-finding-group-head.is-collapsed { margin: 0; border-bottom: none; border-radius: var(--r-lg); }
/* Match-group review card (dedup) */
.dt-match-card { background: var(--surface); border: 1px solid var(--border); border-radius: var(--r-lg); box-shadow: 0 1px 2px rgba(28,25,23,0.03); margin: 12px 0; overflow: hidden; }
.dt-match-head { background: var(--surface-hover); border-bottom: 1px solid var(--border); padding: 12px 16px; display: flex; align-items: center; gap: 12px; }

View File

@@ -69,9 +69,9 @@
</div>
<!-- Action bar -->
<div class="dt-btn-row" style="margin-top:16px;max-width:340px">
<button class="dt-btn dt-btn-primary">Run analysis</button>
<button class="dt-btn">Clear results</button>
<div class="dt-btn-row" style="margin-top:16px">
<button class="dt-btn dt-btn-primary" style="flex:0 0 auto">Run analysis</button>
<button class="dt-btn" style="flex:0 0 auto">Clear results</button>
</div>
<hr class="dt-divider">
@@ -79,8 +79,8 @@
<!-- Stats overview -->
<div class="dt-stats">
<div class="dt-stat">
<div class="dt-stat-label">Files analyzed</div>
<div class="dt-stat-value">3</div>
<div class="dt-stat-label">Rows scanned</div>
<div class="dt-stat-value">48,210 <span class="dt-stat-unit">rows</span></div>
</div>
<div class="dt-stat">
<div class="dt-stat-label">Total findings</div>
@@ -96,6 +96,44 @@
</div>
</div>
<!-- ======================================================================
FRONT DOOR — primary path. The orchestrator (09_pipeline_runner)
wearing a friendly face: maps the analyzer's findings to the
recommended pipeline (Clean Text → Standardize → Fix Missing →
Find Duplicates) and runs them in order, returning a downloadable
result. This is the hero of the page; the per-file findings below
remain as the manual "fix one thing at a time" path.
====================================================================== -->
<div class="dt-card" style="border-color:var(--accent);background:var(--accent-fill);box-shadow:0 1px 2px rgba(28,25,23,0.03),0 0 0 1px var(--accent)">
<div style="display:flex;align-items:flex-start;gap:14px;flex-wrap:wrap">
<span class="dt-file-icon-chip" style="width:36px;height:36px;border-radius:var(--r-md)">
<span class="dt-mi" style="font-family:'Material Symbols Outlined';font-size:20px">auto_awesome</span>
</span>
<div style="flex:1;min-width:240px">
<h3 style="margin:0 0 4px;color:var(--ink)">Recommended</h3>
<p style="margin:0;color:var(--ink-secondary)">Runs the recommended clean — fix text, standardize formats, fill blanks, remove duplicates — in the right order, then hands you the cleaned file.</p>
</div>
<button class="dt-btn dt-btn-primary" style="flex:0 0 auto;align-self:center">
<span class="dt-mi">auto_fix_high</span> Clean these files for me
</button>
</div>
<!-- Pipeline-step affordance: the order the findings will be resolved in -->
<div style="display:flex;align-items:center;gap:6px;flex-wrap:wrap;margin-top:14px;padding-top:12px;border-top:1px solid var(--accent-fill-strong)">
<span class="dt-count-pill" style="background:var(--surface);color:var(--ink-secondary)">1 · Clean Text</span>
<span class="dt-mi" style="font-family:'Material Symbols Outlined';font-size:16px;color:var(--accent)">arrow_forward</span>
<span class="dt-count-pill" style="background:var(--surface);color:var(--ink-secondary)">2 · Standardize</span>
<span class="dt-mi" style="font-family:'Material Symbols Outlined';font-size:16px;color:var(--accent)">arrow_forward</span>
<span class="dt-count-pill" style="background:var(--surface);color:var(--ink-secondary)">3 · Fix Missing</span>
<span class="dt-mi" style="font-family:'Material Symbols Outlined';font-size:16px;color:var(--accent)">arrow_forward</span>
<span class="dt-count-pill" style="background:var(--surface);color:var(--ink-secondary)">4 · Find Duplicates</span>
<span class="dt-caption" style="margin-left:auto">Result downloads when finished</span>
</div>
</div>
<!-- Secondary / manual path — keep full control over each fix -->
<h3 style="margin-top:24px">Or fix issues one at a time</h3>
<p class="dt-caption" style="margin:-2px 0 4px">Prefer to handle things yourself? Open any finding to jump straight to the right tool.</p>
<!-- Per-file findings panel #1 -->
<div class="dt-card">
<div class="dt-finding-group-head">
@@ -129,11 +167,15 @@
<p class="dt-finding-meta">3 formats detected · Standardize Formats →</p>
</div>
</div>
<button class="dt-finding-more">
<span class="dt-mi">expand_more</span> Show all 8 findings · 5 more
</button>
</div>
<!-- Per-file findings panel #2 (collapsed) -->
<div class="dt-card" style="padding-bottom:16px">
<div class="dt-finding-group-head" style="margin-bottom:-16px;border-radius:var(--r-lg);border-bottom:none">
<div class="dt-card is-collapsed">
<div class="dt-finding-group-head is-collapsed">
<span class="dt-finding-group-chevron">chevron_right</span>
<span class="dt-severity-dot warn"></span>
<span class="dt-group-filename">q3_transactions.xlsx</span>
@@ -145,8 +187,8 @@
</div>
<!-- Per-file findings panel #3 (clean) -->
<div class="dt-card" style="padding-bottom:16px">
<div class="dt-finding-group-head" style="margin-bottom:-16px;border-radius:var(--r-lg);border-bottom:none">
<div class="dt-card is-collapsed">
<div class="dt-finding-group-head is-collapsed">
<span class="dt-severity-dot success"></span>
<span class="dt-group-filename">vendor_list.csv</span>
<div class="dt-group-counts">

View File

@@ -3,28 +3,32 @@
src/gui/components/_legacy.py:render_sticky_footer(). Each page sets
<body data-page="<tool_id|home>"> to mark the active nav item. */
(function () {
// Sections + entries in the same order app.py registers them.
// Front-door entry — rendered standalone above the section groups.
var START = { id: "home", icon: "insert_chart_outlined", name: "Start here", href: "home.html" };
// Sections + entries in pipeline / job order.
var NAV = [
{ label: "Analysis", items: [
{ id: "home", icon: "insert_chart_outlined", name: "File Analysis", href: "home.html" },
{ id: "11_reconciler", icon: "compare_arrows", name: "Reconcile Two Files", href: "11_reconciler.html" },
]},
{ label: "Data Cleaners", items: [
{ id: "04_missing_handler", icon: "help_outline", name: "Fix Missing Values", href: "04_missing_handler.html" },
{ id: "06_outlier_detector", icon: "insights", name: "Find Unusual Values", href: "06_outlier_detector.html", soon: true },
{ id: "02_text_cleaner", icon: "text_format", name: "Clean Text", href: "02_text_cleaner.html" },
{ id: "03_format_standardizer", icon: "format_list_bulleted", name: "Standardize Formats", href: "03_format_standardizer.html" },
{ id: "04_missing_handler", icon: "help_outline", name: "Fix Missing Values", href: "04_missing_handler.html" },
{ id: "01_deduplicator", icon: "search", name: "Find Duplicates", href: "01_deduplicator.html" },
{ id: "08_validator_reporter", icon: "check_circle", name: "Quality Check", href: "08_validator_reporter.html", soon: true },
]},
{ label: "Transformations", items: [
{ id: "05_column_mapper", icon: "view_column", name: "Map Columns", href: "05_column_mapper.html" },
{ id: "07_multi_file_merger", icon: "account_tree", name: "Combine Files", href: "07_multi_file_merger.html", soon: true },
{ id: "10_pdf_extractor", icon: "picture_as_pdf", name: "PDF to CSV", href: "10_pdf_extractor.html" },
]},
{ label: "Automations", items: [
{ id: "09_pipeline_runner", icon: "auto_awesome", name: "Automated Workflows", href: "09_pipeline_runner.html" },
]},
{ label: "Finance", items: [
{ id: "11_reconciler", icon: "compare_arrows", name: "Reconcile Two Files", href: "11_reconciler.html" },
{ id: "10_pdf_extractor", icon: "picture_as_pdf", name: "PDF to CSV", href: "10_pdf_extractor.html" },
]},
{ label: "Coming soon", items: [
{ id: "06_outlier_detector", icon: "insights", name: "Find Unusual Values", href: "06_outlier_detector.html", soon: true },
{ id: "08_validator_reporter", icon: "check_circle", name: "Quality Check", href: "08_validator_reporter.html", soon: true },
{ id: "07_multi_file_merger", icon: "account_tree", name: "Combine Files", href: "07_multi_file_merger.html", soon: true },
]},
];
var active = document.body.getAttribute("data-page") || "";
@@ -41,8 +45,13 @@
'</span>' +
'</a>' +
'<nav class="dt-nav">';
var startCls = "dt-nav-link dt-nav-start" + (START.id === active ? " is-active" : "");
html += '<a class="' + startCls + '" href="' + START.href + '">' +
'<span class="dt-mi">' + START.icon + '</span>' +
'<span>' + START.name + '</span>' +
'</a>';
NAV.forEach(function (sec) {
var indicator = sec.label === "Analysis" ? "" : "";
var indicator = "";
html += '<div class="dt-nav-section">' + sec.label +
'<span class="dt-nav-indicator">' + indicator + '</span></div>';
sec.items.forEach(function (it) {

View File

@@ -146,6 +146,101 @@ def _sync_uploader_to_home_uploads() -> None:
st.session_state["home_findings_by_file"] = findings
def _read_upload_df(name: str, data: bytes):
"""Bytes -> DataFrame. Mirrors the Automated Workflows page reader:
Excel by extension, else CSV with encoding fallbacks. Kept in step
with ``9_Pipeline_Runner._read_uploaded`` so the one-click clean
reads files exactly as the standalone orchestrator would."""
import io as _io
from pathlib import Path as _Path
import pandas as pd
suffix = _Path(name).suffix.lower()
bio = _io.BytesIO(data)
if suffix in (".xlsx", ".xls"):
return pd.read_excel(bio)
for enc in ("utf-8", "utf-8-sig", "latin-1"):
try:
bio.seek(0)
sep = "\t" if suffix == ".tsv" else ","
return pd.read_csv(bio, encoding=enc, sep=sep, on_bad_lines="warn")
except UnicodeDecodeError:
continue
bio.seek(0)
return pd.read_csv(bio, encoding="latin-1")
def _run_recommended_clean(home_uploads: dict) -> None:
"""Front-door action: run the recommended pipeline (Clean Text ->
Standardize -> Fix Missing -> Find Duplicates, in that order) on
every imported file and stash a cleaned CSV per file in
``session_state`` for download. This is the orchestrator wearing a
friendly face — it consumes the same ``recommended_pipeline`` the
Automated Workflows page builds. Per-file errors are captured so one
bad file doesn't kill the batch."""
from src.core.pipeline import recommended_pipeline, run_pipeline
from src.core.errors import format_for_user
from src.audit import log_event
pipeline = recommended_pipeline()
names = list(home_uploads.keys())
results: dict = {}
progress = st.progress(0.0, text="Cleaning…")
for i, name in enumerate(names, start=1):
progress.progress((i - 1) / max(len(names), 1), text=name)
try:
df = _read_upload_df(name, home_uploads[name]["bytes"])
res = run_pipeline(df, pipeline, stop_on_error=False)
results[name] = {
"csv": res.final_df.to_csv(index=False).encode("utf-8"),
"initial_rows": res.initial_rows,
"final_rows": res.final_rows,
"error": None,
}
except Exception as e: # noqa: BLE001 — surface per file, keep the batch alive
results[name] = {"csv": None, "error": format_for_user(e)}
progress.empty()
log_event("tool_run", "Home one-click recommended clean", files=names)
st.session_state["home_clean_results"] = results
st.rerun()
def _render_clean_results() -> None:
"""Render per-file cleaned-CSV download buttons + a short summary from
the stash produced by :func:`_run_recommended_clean`. Only files
still present in ``home_uploads`` are shown, so removing a file
drops its stale result."""
import hashlib as _hashlib
results: dict = st.session_state.get("home_clean_results", {})
if not results:
return
current = st.session_state.get("home_uploads", {})
for name, r in results.items():
if name not in current:
continue
digest = _hashlib.sha1(
name.encode("utf-8"), usedforsecurity=False,
).hexdigest()[:10]
if r.get("error"):
st.error(f"**Could not clean `{name}`**\n\n```\n{r['error']}\n```")
continue
stem = name.rsplit(".", 1)[0]
st.download_button(
f"⬇ Download cleaned {name}",
data=r["csv"],
file_name=f"{stem}_cleaned.csv",
mime="text/csv",
key=f"home_clean_dl_{digest}",
width="stretch",
)
removed = r["initial_rows"] - r["final_rows"]
st.caption(
f"{r['final_rows']:,} rows kept"
+ (f" · {removed:,} removed" if removed else " · nothing to remove")
)
def _home_page() -> None:
"""Render the home page — multi-file upload + per-file analysis.
@@ -443,6 +538,7 @@ def _home_page() -> None:
if clear_clicked:
st.session_state["home_findings_by_file"] = {}
st.session_state["home_clean_results"] = {}
st.rerun()
if run_clicked:
@@ -458,6 +554,8 @@ def _home_page() -> None:
findings_by_file[name] = _run_analysis_on_upload(stashed)
progress.progress(i / len(pending), text=name)
st.session_state["home_findings_by_file"] = findings_by_file
# A fresh analysis invalidates any prior one-click clean outputs.
st.session_state["home_clean_results"] = {}
progress.empty()
st.rerun()
@@ -468,6 +566,30 @@ def _home_page() -> None:
# 4-card summary above the findings panels so the user can
# eyeball the run before expanding any one file.
_render_stats_overview(findings_by_file)
# ---- Front door: one-click recommended clean (primary path) ----
# The analyzer has the findings; the majority case is "just fix
# it." This primary button runs the recommended pipeline in the
# correct order and hands back a cleaned file per upload, so the
# user never has to decide which tool or what order. The per-file
# findings below remain the "fix one thing at a time" path.
if st.button(
"✨ Clean these files for me",
type="primary",
key="home_clean_all",
width="stretch",
):
_run_recommended_clean(home_uploads)
st.caption(
"Recommended: cleans text, standardizes formats, fills blanks, "
"and removes duplicates — in the right order — then gives you the "
"cleaned file."
)
_render_clean_results()
# ---- Manual path: per-file findings, fix one thing at a time ----
st.markdown("###### Or fix issues one at a time")
st.caption("Open any finding below to jump straight to the right tool.")
# Preserve the upload-stash order so the user sees results in
# the same order they appear in the file list above.
for name in home_uploads:

View File

@@ -78,10 +78,11 @@ def _page_for(tool_id: str, *, page_slug: str, icon: str, title: str) -> "st.Pag
def _build_navigation() -> dict[str, list]:
by_section: dict[str, list] = {
"analysis": [],
"cleaners": [],
"transformations": [],
"automations": [],
"finance": [],
"coming_soon": [],
}
# Resolve the tool name through ``tool_name`` (i18n lookup) instead
# of using the registry's English ``tool.name`` field, otherwise the
@@ -96,16 +97,16 @@ def _build_navigation() -> dict[str, list]:
)
)
# Home is now surfaced under the new "Analysis" section as
# "File Analysis" — the home page's content (importing files,
# running the analyzer, browsing findings) is itself a data-analysis
# workflow, so grouping it next to Reconcile keeps the sidebar's
# mental model coherent. ``default=True`` still points at this
# page so first-visit lands here regardless of section placement.
# Home is the product's front door: "Start here". It's surfaced as a
# standalone, unlabeled top entry (in the "" section, ahead of the
# hidden Activate/Logs/Close pages) so it reads as the obvious
# starting point above the tool groups rather than one item among
# equals. The companion CSS in ``hide_streamlit_chrome`` gives its
# nav link accent emphasis. ``default=True`` lands first-visit here.
home = st.Page(
_home_page,
title=_t("nav.file_analysis_title") or "File Analysis",
icon=":material/insert_chart_outlined:",
title=_t("nav.start_here_title") or "Start here",
icon=":material/play_circle:",
default=True,
url_path="home",
)
@@ -136,17 +137,20 @@ def _build_navigation() -> dict[str, list]:
url_path="close",
)
# Activate / Logs / Close stay in the unlabeled section (key ``""``)
# so the CSS in ``hide_streamlit_chrome`` keeps hiding them by
# ``href``. Home moved out of that bucket into "Analysis" — the
# unlabeled section now contains ONLY hidden pages, so no orphan
# entry appears above the "Analysis" header in the sidebar.
# Home leads the unlabeled section (key ``""``) so "Start here" sits
# at the very top with no section header above it. Activate / Logs /
# Close follow in the same unlabeled bucket and stay hidden by their
# ``href`` via the CSS in ``hide_streamlit_chrome``. Section order
# below is the journey order: cleaners (pipeline order) →
# transformations → automations → finance → coming soon (last, so
# not-yet-shipped tools never interleave with working ones).
return {
"": [activate, logs, close],
section_label("analysis"): [home, *by_section["analysis"]],
"": [home, activate, logs, close],
section_label("cleaners"): by_section["cleaners"],
section_label("transformations"): by_section["transformations"],
section_label("automations"): by_section["automations"],
section_label("finance"): by_section["finance"],
section_label("coming_soon"): by_section["coming_soon"],
}

View File

@@ -95,6 +95,18 @@ footer {
[data-testid="stSidebarNav"] a[href$="/close/"] {
display: none !important;
}
/* "Start here" front-door nav item — accent emphasis so the obvious
entry point reads at a glance above the tool groups. Targets the Home
link by href; accent values mirror theme.py (§3 color scale). */
[data-testid="stSidebarNav"] a[href$="/home"],
[data-testid="stSidebarNav"] a[href$="/home/"] {
background: #fef4ed !important;
font-weight: 600 !important;
}
[data-testid="stSidebarNav"] a[href$="/home"]:hover,
[data-testid="stSidebarNav"] a[href$="/home/"]:hover {
background: #fde4d3 !important;
}
/* Reclaim top padding lost from hidden header. Streamlit's default
block-container padding-top is ~6rem (room for the header it ships).
We hide the header so reclaim that space — the page title should sit
@@ -2168,14 +2180,37 @@ def render_tool_header(tool_id: str) -> None:
button as a defense-in-depth so the label can never wrap, no
matter how the column ends up sized.
"""
col_title, col_help = st.columns([8, 2])
col_title, col_help = st.columns([7, 3])
with col_title:
st.title(_t(f"tools.{tool_id}.page_title"))
with col_help:
# Spacer pushes the popover button down so it sits closer to
# the title's baseline than to its top — without the spacer the
# button floats above the big title text.
st.write("")
# Local-first reassurance + Help, right-aligned opposite the
# title. The "Runs 100% locally" privacy pill is shown on every
# working tool page (where the user is actively feeding in a
# customer list) and omitted on not-yet-shipped "Coming Soon"
# tools, which process nothing. When the pill is shown it also
# serves as the spacer that nudges the popover down toward the
# title baseline; without it we keep the explicit spacer.
from src.gui.tools_registry import tool_by_id as _tool_by_id
_tool = _tool_by_id(tool_id)
if _tool is None or _tool.status == "Ready":
import html as _html
st.markdown(
'<div style="display:flex;justify-content:flex-end">'
'<span class="dt-privacy-pill">'
'<svg viewBox="0 0 24 24" fill="none" stroke="currentColor">'
'<rect x="4" y="11" width="16" height="10" rx="2"/>'
'<path d="M8 11V7a4 4 0 018 0v4"/>'
'</svg>'
f'{_html.escape(_t("home.privacy_pill"))}'
'</span>'
'</div>',
unsafe_allow_html=True,
)
else:
# Spacer pushes the popover button down so it sits closer to
# the title's baseline than to its top.
st.write("")
body = _t(f"tools.{tool_id}.help_md")
# ``src.i18n.t`` falls back to returning the lookup key itself
# on miss (see ``_resolve`` → key-as-string fallback). That's

View File

@@ -24,7 +24,10 @@ Tier = Literal["core", "pro", "enterprise"]
Status = Literal["Ready", "Coming Soon"]
# Sidebar grouping. Tools are bucketed by what the user is trying to
# accomplish rather than by implementation detail.
Section = Literal["analysis", "cleaners", "transformations", "automations"]
Section = Literal[
"cleaners", "transformations", "automations",
"finance", "coming_soon",
]
@dataclass(frozen=True)
@@ -42,35 +45,14 @@ class Tool:
# Order in this list IS the order shown in each sidebar section, so
# arranging it carefully matters: within "cleaners" we lead with the
# operations a non-technical user is most likely to need (filling
# blanks, flagging outliers) before progressing to format cleanup,
# dedup, and the final quality report.
# arranging it carefully matters. Within "cleaners" the order is the
# recommended PIPELINE order (Clean Text → Standardize → Fix Missing
# Find Duplicates) so a user running tools by hand follows the sequence
# the orchestrator would. "Coming Soon" tools are grouped at the end in
# their own section so they never interleave with working tools, and the
# finance-oriented tools (Reconcile, PDF to CSV) live in their own group
# (see DECISIONS.md 2026-06-08).
TOOLS: list[Tool] = [
Tool(
tool_id="04_missing_handler",
icon=":material/help_outline:",
name="Fix Missing Values",
description=(
"Find blank cells (even ones written as 'N/A' or '?') and fill "
"them in or remove them."
),
page_slug="4_Missing_Values",
status="Ready",
section="cleaners",
),
Tool(
tool_id="06_outlier_detector",
icon=":material/insights:",
name="Find Unusual Values",
description=(
"Spot values that look wrong — way too high, way too low, or "
"breaking your rules."
),
page_slug="6_Outlier_Detector",
status="Coming Soon",
section="cleaners",
),
Tool(
tool_id="02_text_cleaner",
icon=":material/text_format:",
@@ -95,6 +77,18 @@ TOOLS: list[Tool] = [
status="Ready",
section="cleaners",
),
Tool(
tool_id="04_missing_handler",
icon=":material/help_outline:",
name="Fix Missing Values",
description=(
"Find blank cells (even ones written as 'N/A' or '?') and fill "
"them in or remove them."
),
page_slug="4_Missing_Values",
status="Ready",
section="cleaners",
),
Tool(
tool_id="01_deduplicator",
icon=":material/search:",
@@ -106,18 +100,6 @@ TOOLS: list[Tool] = [
status="Ready",
section="cleaners",
),
Tool(
tool_id="08_validator_reporter",
icon=":material/check_circle:",
name="Quality Check",
description=(
"Check your file against rules you set, and export a PDF or "
"Excel report."
),
page_slug="8_Validator_Reporter",
status="Coming Soon",
section="cleaners",
),
Tool(
tool_id="05_column_mapper",
icon=":material/view_column:",
@@ -130,18 +112,6 @@ TOOLS: list[Tool] = [
status="Ready",
section="transformations",
),
Tool(
tool_id="07_multi_file_merger",
icon=":material/account_tree:",
name="Combine Files",
description=(
"Combine several CSV or Excel files into one — even if their "
"columns don't match."
),
page_slug="7_Multi_File_Merger",
status="Coming Soon",
section="transformations",
),
Tool(
tool_id="09_pipeline_runner",
icon=":material/auto_awesome:",
@@ -154,17 +124,6 @@ TOOLS: list[Tool] = [
status="Ready",
section="automations",
),
Tool(
tool_id="10_pdf_extractor",
icon=":material/picture_as_pdf:",
name="PDF to CSV",
description=(
"Pull transactions out of bank-statement PDFs into a clean CSV file."
),
page_slug="10_PDF_Extractor",
status="Ready",
section="transformations",
),
Tool(
tool_id="11_reconciler",
icon=":material/compare_arrows:",
@@ -175,7 +134,54 @@ TOOLS: list[Tool] = [
),
page_slug="11_Reconciler",
status="Ready",
section="analysis",
section="finance",
),
Tool(
tool_id="10_pdf_extractor",
icon=":material/picture_as_pdf:",
name="PDF to CSV",
description=(
"Pull transactions out of bank-statement PDFs into a clean CSV file."
),
page_slug="10_PDF_Extractor",
status="Ready",
section="finance",
),
Tool(
tool_id="06_outlier_detector",
icon=":material/insights:",
name="Find Unusual Values",
description=(
"Spot values that look wrong — way too high, way too low, or "
"breaking your rules."
),
page_slug="6_Outlier_Detector",
status="Coming Soon",
section="coming_soon",
),
Tool(
tool_id="08_validator_reporter",
icon=":material/check_circle:",
name="Quality Check",
description=(
"Check your file against rules you set, and export a PDF or "
"Excel report."
),
page_slug="8_Validator_Reporter",
status="Coming Soon",
section="coming_soon",
),
Tool(
tool_id="07_multi_file_merger",
icon=":material/account_tree:",
name="Combine Files",
description=(
"Combine several CSV or Excel files into one — even if their "
"columns don't match."
),
page_slug="7_Multi_File_Merger",
status="Coming Soon",
section="coming_soon",
),
]
@@ -183,10 +189,11 @@ TOOLS: list[Tool] = [
# Display labels for each sidebar section. Kept here so i18n falls back
# to a sensible English string if a translation pack is missing the key.
SECTION_LABELS: dict[Section, str] = {
"analysis": "Analysis",
"cleaners": "Data Cleaners",
"transformations": "Transformations",
"automations": "Automations",
"finance": "Finance",
"coming_soon": "Coming soon",
}

View File

@@ -193,9 +193,12 @@
"section_cleaners": "Data Cleaners",
"section_transformations": "Transformations",
"section_automations": "Automations",
"section_finance": "Finance",
"section_coming_soon": "Coming soon",
"review_page_title": "Review",
"home_page_title": "Home",
"file_analysis_title": "File Analysis",
"start_here_title": "Start here",
"section_account": "Account",
"activate_title": "Activate",
"close_title": "Close",

View File

@@ -193,9 +193,12 @@
"section_cleaners": "Limpiadores de datos",
"section_transformations": "Transformaciones",
"section_automations": "Automatizaciones",
"section_finance": "Finanzas",
"section_coming_soon": "Próximamente",
"review_page_title": "Revisión",
"home_page_title": "Inicio",
"file_analysis_title": "Análisis de archivo",
"start_here_title": "Empezar aquí",
"section_account": "Cuenta",
"activate_title": "Activar",
"close_title": "Cerrar",

View File

@@ -239,12 +239,15 @@ class TestReconcilerAndPdfArePresent:
assert tool is not None
assert tool.page_slug == "10_PDF_Extractor"
assert tool.status == "Ready"
# PDF to CSV + Reconcile live in the "Finance" group (outside the
# cleaning flow) per DECISIONS.md 2026-06-08.
assert tool.section == "finance"
def test_reconciler_present(self):
tool = tool_by_id("11_reconciler")
assert tool is not None
assert tool.page_slug == "11_Reconciler"
assert tool.status == "Ready"
# The new "analysis" section was introduced with this tool;
# if the section disappears, the sidebar group goes empty.
assert tool.section == "analysis"
# Reconcile sits in the "Finance" group (see DECISIONS.md
# 2026-06-08); if that section disappears the sidebar goes empty.
assert tool.section == "finance"