diff --git a/layout-review/01_deduplicator.html b/layout-review/01_deduplicator.html index b00c79f..e396728 100644 --- a/layout-review/01_deduplicator.html +++ b/layout-review/01_deduplicator.html @@ -41,7 +41,8 @@ - +
Comma (,)
@@ -67,32 +68,33 @@
- + +
+
+
85
+
Higher means rows must look more alike to count as a duplicate.
+
+
the most-complete row
+
Which row survives in each group of duplicates.
+
+ +
- Options + Advanced options
-
- Advanced Options -
-
-
-
-
Leave empty to auto-detect
-
-
email
-
-
name
-
-
-
jaro_winkler
-
-
85
-
most-complete
-
-
-
check Merge mode — fill missing fields in the surviving row
+

Leave these empty to auto-detect which columns to compare. Otherwise, list the columns that must match exactly and the ones that only need to match approximately — together these are the columns used to find duplicates.

+
+
+
+
email
+
+
name
-
+
+
jaro_winkler
+
+
+
check Merge mode — fill missing fields in the surviving row
@@ -109,8 +111,9 @@
Match groups
147
Rows kept
18,130
+

Preview of an auto-resolved run: each group keeps its auto-picked survivor. Review the groups below to override any pending picks before the final download.

- +
@@ -123,6 +126,7 @@ +

Differing columns are highlighted. The survivor row is kept; uncheck a row to split it out of the group.

@@ -140,7 +144,6 @@
-

Differing columns highlighted. The survivor row is kept; uncheck rows to split the group.

@@ -163,8 +166,8 @@ -

Decisions: 1 merged, 1 pending

- +

Decisions: 1 merged, 1 pending · Pending groups keep their auto-picked survivor unless you review them.

+
diff --git a/layout-review/05_column_mapper.html b/layout-review/05_column_mapper.html index c0c2a02..89bbcbe 100644 --- a/layout-review/05_column_mapper.html +++ b/layout-review/05_column_mapper.html @@ -25,22 +25,12 @@
- -

You can also import a file on the home screen and pick it up here.

- -
-
- upload_file Drag and drop file here - Up to 1.5 GB · CSV, TSV, XLSX, XLS · encoding & delimiter auto-detected -
- -
-
- - crm_contacts_raw.csv - 684 KB - + +
+ description + Using crm_contacts_raw.csv from the upload screen.
+
@@ -93,7 +83,7 @@ signup_datedate✗Signup amount_spentfloat✗0.0Amount Spent sourcestring✗crm-import - add add row + add add row
@@ -101,43 +91,8 @@
- -

Strategy

-
- -
- rename-only (just rename, leave types alone, keep extras) - lenient-schema (rename + coerce + reorder, keep extras) - strict-schema (rename + coerce + reorder, drop extras) -
-
- - -
- Advanced options -
-
-
-
- -
keep
-
-
check Coerce types per schema
-
check Reorder to schema order
-
-
-
check Auto-infer mapping (fuzzy match)
-
- -
0.80
-
-
check Enforce required fields
-
-
-
-
- +

Mapping

@@ -153,7 +108,53 @@
-

Pick a target for each source column. Notes stays unmapped — with the lenient preset it is kept as-is. source is added from the schema default.

+

Pick a target for each source column. Notes stays unmapped — with the keep-extras strategy it is kept as-is. source is added from the schema default.

+ +
+ + + +

Strategy

+
+ +
+ rename-only (just rename, leave types alone, keep extras) + lenient-schema (rename + coerce + reorder, keep extras) + strict-schema (rename + coerce + reorder, drop extras) base + Custom — based on strict-schema, 1 control changed modified +
+
+ rule + Individual Advanced controls win over the preset. You started from strict-schema, then changed Unmapped source columns to keep below — so the preset is now Custom. The controls' current values are what actually run. +
+
Pick a strategy as the baseline. Every Advanced toggle below is still individually overridable; overriding any one switches the preset to Custom.
+
+ + +
+ Advanced options +
+
+
+
+ +
keep
+
Winning value: keep. Overrides the strict-schema base (drop) — so Notes survives into the output.
+
+
check Coerce types per schema
+
check Reorder to schema order
+
+
+
check Auto-infer mapping (fuzzy match)
+
+ +
0.80
+
+
check Enforce required fields
+
+
+
+
@@ -176,20 +177,6 @@
infoAdded (with defaults): source
warningSome cells could not be coerced and were left as NaN: amount_spent (3)
-

Resolved mapping

-
- - - - - - - - - -
sourcetargetauto
Full Namefull_nameTrue
EmailAddremailTrue
Phone #phoneTrue
Signupsignup_dateTrue
Amount Spentamount_spentTrue
-
-

Mapped preview (first 10 rows)

diff --git a/layout-review/07_multi_file_merger.html b/layout-review/07_multi_file_merger.html index ede9b11..c25a344 100644 --- a/layout-review/07_multi_file_merger.html +++ b/layout-review/07_multi_file_merger.html @@ -72,7 +72,7 @@
- + diff --git a/layout-review/08_validator_reporter.html b/layout-review/08_validator_reporter.html index d255430..ab70f04 100644 --- a/layout-review/08_validator_reporter.html +++ b/layout-review/08_validator_reporter.html @@ -57,15 +57,6 @@

Validation Rules

- -
-
- upload_file Drag and drop file here - JSON -
- -
-
diff --git a/layout-review/09_pipeline_runner.html b/layout-review/09_pipeline_runner.html index 63426bd..f7aedf1 100644 --- a/layout-review/09_pipeline_runner.html +++ b/layout-review/09_pipeline_runner.html @@ -67,69 +67,192 @@ Options
- +
- Use the recommended default (text-clean → format → missing → dedup) - Build interactively + Use the recommended default (text-clean → format → missing → dedup) · modified + Build interactively Import a saved pipeline JSON
+
+ edit + You started from the recommended default and edited a step, so the mode switched to Build interactively. The steps below are now yours to change — pick recommended default again to discard your edits and restore the suggested order. +
+

- Edit the table to add, remove, reorder (drag the row index), enable, or configure each step. + Add, remove, reorder (drag the row index), enable, or configure each step. + Open a step's Configure panel to set its options in plain language. Tool order is recommended, not enforced — violations surface as warnings below the table.

- +
- - - + + + - + - - - - - - - - - - - - - - - - - - - - - - - +
ToolEnabledOptions (JSON)StepEnabledConfigure
≡ 0text_clean expand_moretext_clean check{"trim": true, "collapse_whitespace": true}
≡ 1format_standardize expand_morecheck{"column_types": {"phone": "phone", "signup_date": "date"}}
≡ 2missing expand_morecheck{"strategy": "flag", "sentinels": ["N/A", "—"]}
≡ 3dedup expand_morecheck{"survivor_rule": "most_complete", "merge": true}
Add rowtune Configure expand_more
+ +
+ Configure: text_clean +
+
check Trim leading & trailing whitespace
+
check Collapse repeated spaces to one
+
Normalize smart quotes & dashes to plain ASCII
+
+ +
Leave as-is
+
+
+
+ +
+ + + + + + + + + +
≡ 1format_standardizechecktune Configure chevron_right
+
+ +
+ Configure: format_standardize +
+

Choose a target format for each column. Columns left as “Leave as-is” are untouched.

+
+ + + + + + + + +
ColumnFormat as
nameLeave as-is
emailLeave as-is
phonePhone number
signup_dateDate
+
+
+
+ +
+ + + + + + + + + +
≡ 2missingchecktune Configure chevron_right
+
+ +
+ Configure: missing +
+
+ +
+ Flag them (mark blanks, change nothing) + Fill them in (numbers → median, text → most common) + Drop rows that have any blank +
+
+
+ +
N/A, —
+
Matched case-insensitively after stripping whitespace.
+
+
+
+ +
+ + + + + + + + + + + + + +
≡ 3dedupchecktune Configure chevron_right
Add step
+
+ +
+ Configure: dedup +
+
+ +
Keep the most complete row
+
Other options: keep the first seen, keep the last seen.
+
+
check Merge matched rows (fill each survivor's blanks from its duplicates)
+
+ +
+ email + phone +
+
+
+
+ +
+ Advanced — import / export pipeline as JSON +
+

For sharing or version control. Editing is done in the step panels above — this is just the saved form of the same settings.

+
{ + "version": 1, + "steps": [ + {"tool": "text_clean", "enabled": true, "options": {"trim": true, "collapse_whitespace": true}}, + {"tool": "format_standardize", "enabled": true, "options": {"column_types": {"phone": "phone", "signup_date": "date"}}}, + {"tool": "missing", "enabled": true, "options": {"strategy": "flag", "sentinels": ["N/A", "—"]}}, + {"tool": "dedup", "enabled": true, "options": {"survivor_rule": "most_complete", "merge": true, "keys": ["email", "phone"]}} + ] +}
+
+ + +
+
+
+ -
+
Recommended tool order — why each step belongs where it does

text_clean before format_standardize — format parsers (phone / currency / date) fail on smart-quote-contaminated or NBSP-padded input — clean text first

@@ -161,39 +284,49 @@

Per-step summary

+
- + - - - + + - - - + + + + + + - - - + + - - - + +
stepstatuselapsed_mssummaryerror
stepstatuselapsedsummary
text_clean ok214{"cells_changed": 1204, "columns": ["name", "city"]}214 ms1,204 cells changed in name & city
format_standardizeok388{"phone": 18301, "signup_date": 17996}warning ok · 141 skipped388 ms18,301 phones and 17,996 dates standardized
+ info + 141 phone values didn't match any known pattern and were left unchanged. The step still completed — review them in the output preview if needed. +
missing ok121{"flagged_cells": 642, "sentinels_found": ["—"]}121 ms642 blank cells flagged (sentinel “—”)
dedup ok911{"input_rows": 18442, "output_rows": 18130, "duplicates_removed": 312, "groups": 147}911 ms312 duplicates removed across 147 groups (18,442 → 18,130 rows)
diff --git a/layout-review/10_pdf_extractor.html b/layout-review/10_pdf_extractor.html index 3d457b1..1dbf5ae 100644 --- a/layout-review/10_pdf_extractor.html +++ b/layout-review/10_pdf_extractor.html @@ -74,7 +74,7 @@ statement-feb-2026.pdf 147.2 KB
- @@ -100,84 +100,89 @@

47 candidate transaction(s) from 2 file(s)

-

Uncheck rows to exclude. Edit any cell to fix a value the scanner got wrong. The raw column shows the original PDF text for that row.

+

Uncheck rows to exclude. Edit any cell to fix a value the scanner got wrong. Hover the info on any row to see the original PDF text it came from.

-
+ +
+ - - - + + - + + - + + - + + - + + - + + - + + - + + - + +
Include date description amount_debit amount_credit account_number source_filepageraw
check2026-01-03OPENING BALANCE****4821statement-jan-2026.pdf101/03 OPENING BALANCE 2,140.55info2026-01-03OPENING BALANCE****4821statement-jan-2026.pdf
check2026-01-05POS PURCHASE WHOLE FOODS MKT84.12****4821statement-jan-2026.pdf101/05 POS PURCHASE WHOLE FOODS MKT (84.12)info2026-01-05POS PURCHASE WHOLE FOODS MKT84.12****4821statement-jan-2026.pdf
check2026-01-08ACH DEPOSIT PAYROLL ACME CORP3,250.00****4821statement-jan-2026.pdf101/08 ACH DEPOSIT PAYROLL ACME CORP 3,250.00info2026-01-08ACH DEPOSIT PAYROLL ACME CORP3,250.00****4821statement-jan-2026.pdf
check2026-01-11ONLINE TRANSFER TO SAVINGS500.00****4821statement-jan-2026.pdf201/11 ONLINE TRANSFER TO SAVINGS (500.00)info2026-01-11ONLINE TRANSFER TO SAVINGS500.00****4821statement-jan-2026.pdf
2026-01-12INTEREST RATE 0.50% APY DETAIL****4821statement-jan-2026.pdf201/12 INTEREST RATE 0.50% APY 0.00info2026-01-12INTEREST RATE 0.50% APY DETAIL auto-excluded · not a transaction line****4821statement-jan-2026.pdf
check2026-01-14DEBIT CARD SHELL OIL #228752.40****4821statement-jan-2026.pdf201/14 DEBIT CARD SHELL OIL #2287 (52.40)info2026-01-14DEBIT CARD SHELL OIL #228752.40****4821statement-jan-2026.pdf
check2026-02-02POS PURCHASE TRADER JOES #51161.88****4821statement-feb-2026.pdf102/02 POS PURCHASE TRADER JOES #511 (61.88)info2026-02-02POS PURCHASE TRADER JOES #51161.88****4821statement-feb-2026.pdf
check2026-02-06ACH DEPOSIT PAYROLL ACME CORP3,250.00****4821statement-feb-2026.pdf202/06 ACH DEPOSIT PAYROLL ACME CORP 3,250.00info2026-02-06ACH DEPOSIT PAYROLL ACME CORP3,250.00****4821statement-feb-2026.pdf
check2026-02-09CHECK #10431,200.00****4821statement-feb-2026.pdf202/09 CHECK #1043 (1,200.00)info2026-02-09CHECK #10431,200.00****4821statement-feb-2026.pdf
- -
-
- -

46 of 47 rows selected.

-
-
-
- -
- date - description - amount_debit - amount_credit - account_number - source_file -
-
page and raw are kept off by default; tick them if you want them in the file.
+ +
+
+ +
+ date + description + amount_debit + amount_credit + account_number + source_file
+
page and raw are kept off by default; tick them if you want them in the file.
+ +

1 row excluded (INTEREST RATE detail line).