diff --git a/layout-review/01_deduplicator.html b/layout-review/01_deduplicator.html index b00c79f..e396728 100644 --- a/layout-review/01_deduplicator.html +++ b/layout-review/01_deduplicator.html @@ -41,7 +41,8 @@ - +
Leave these empty to auto-detect which columns to compare. Otherwise, list the columns that must match exactly and the ones that only need to match approximately — together these are the columns used to find duplicates.
+Preview of an auto-resolved run: each group keeps its auto-picked survivor. Review the groups below to override any pending picks before the final download.
Differing columns are highlighted. The survivor row is kept; uncheck a row to split it out of the group.
Differing columns highlighted. The survivor row is kept; uncheck rows to split the group.
@@ -163,8 +166,8 @@ -Decisions: 1 merged, 1 pending
- +Decisions: 1 merged, 1 pending · Pending groups keep their auto-picked survivor unless you review them.
+You can also import a file on the home screen and pick it up here.
- -Pick a target for each source column. Notes stays unmapped — with the lenient preset it is kept as-is. source is added from the schema default.
Pick a target for each source column. Notes stays unmapped — with the keep-extras strategy it is kept as-is. source is added from the schema default.
Notes survives into the output.sourceResolved mapping
-| source | target | auto |
|---|---|---|
| Full Name | full_name | True |
| EmailAddr | True | |
| Phone # | phone | True |
| Signup | signup_date | True |
| Amount Spent | amount_spent | True |
Mapped preview (first 10 rows)
| - | Tool | -Enabled | -Options (JSON) | +Step | +Enabled | +Configure |
|---|---|---|---|---|---|---|
| ≡ 0 | -text_clean expand_more | +text_clean | check | -{"trim": true, "collapse_whitespace": true} | -||
| ≡ 1 | -format_standardize expand_more | -check | -{"column_types": {"phone": "phone", "signup_date": "date"}} | -|||
| ≡ 2 | -missing expand_more | -check | -{"strategy": "flag", "sentinels": ["N/A", "—"]} | -|||
| ≡ 3 | -dedup expand_more | -check | -{"survivor_rule": "most_complete", "merge": true} | -|||
| + | -Add row | +tune Configure expand_more | ||||
| ≡ 1 | +format_standardize | +check | +tune Configure chevron_right | +
Choose a target format for each column. Columns left as “Leave as-is” are untouched.
+| Column | Format as |
|---|---|
| name | Leave as-is |
| Leave as-is | |
| phone | Phone number |
| signup_date | Date |
| ≡ 2 | +missing | +check | +tune Configure chevron_right | +
| ≡ 3 | +dedup | +check | +tune Configure chevron_right | +
| + | +Add step | +||
For sharing or version control. Editing is done in the step panels above — this is just the saved form of the same settings.
+text_clean before format_standardize — format parsers (phone / currency / date) fail on smart-quote-contaminated or NBSP-padded input — clean text first
@@ -161,39 +284,49 @@| step | status | elapsed_ms | summary | error | ||
|---|---|---|---|---|---|---|
| step | status | elapsed | summary | |||
| text_clean | ok | -214 | -{"cells_changed": 1204, "columns": ["name", "city"]} | -+ | 214 ms | +1,204 cells changed in name & city |
| format_standardize | -ok | -388 | -{"phone": 18301, "signup_date": 17996} | +warning ok · 141 skipped | +388 ms | +18,301 phones and 17,996 dates standardized | +
| + | + info + 141 phone values didn't match any known pattern and were left unchanged. The step still completed — review them in the output preview if needed. + | |||||
| missing | ok | -121 | -{"flagged_cells": 642, "sentinels_found": ["—"]} | -+ | 121 ms | +642 blank cells flagged (sentinel “—”) |
| dedup | ok | -911 | -{"input_rows": 18442, "output_rows": 18130, "duplicates_removed": 312, "groups": 147} | -+ | 911 ms | +312 duplicates removed across 147 groups (18,442 → 18,130 rows) |
Uncheck rows to exclude. Edit any cell to fix a value the scanner got wrong. The raw column shows the original PDF text for that row.
Uncheck rows to exclude. Edit any cell to fix a value the scanner got wrong. Hover the info on any row to see the original PDF text it came from.
-| Include | +date | description | amount_debit | amount_credit | account_number | source_file | -page | -raw | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| check | -2026-01-03 | OPENING BALANCE | ****4821 | statement-jan-2026.pdf | 1 | 01/03 OPENING BALANCE 2,140.55 | +info | +2026-01-03 | OPENING BALANCE | ****4821 | statement-jan-2026.pdf | ||||
| check | -2026-01-05 | POS PURCHASE WHOLE FOODS MKT | 84.12 | ****4821 | statement-jan-2026.pdf | 1 | 01/05 POS PURCHASE WHOLE FOODS MKT (84.12) | +info | +2026-01-05 | POS PURCHASE WHOLE FOODS MKT | 84.12 | ****4821 | statement-jan-2026.pdf | ||
| check | -2026-01-08 | ACH DEPOSIT PAYROLL ACME CORP | 3,250.00 | ****4821 | statement-jan-2026.pdf | 1 | 01/08 ACH DEPOSIT PAYROLL ACME CORP 3,250.00 | +info | +2026-01-08 | ACH DEPOSIT PAYROLL ACME CORP | 3,250.00 | ****4821 | statement-jan-2026.pdf | ||
| check | -2026-01-11 | ONLINE TRANSFER TO SAVINGS | 500.00 | ****4821 | statement-jan-2026.pdf | 2 | 01/11 ONLINE TRANSFER TO SAVINGS (500.00) | +info | +2026-01-11 | ONLINE TRANSFER TO SAVINGS | 500.00 | ****4821 | statement-jan-2026.pdf | ||
| - | 2026-01-12 | INTEREST RATE 0.50% APY DETAIL | ****4821 | statement-jan-2026.pdf | 2 | 01/12 INTEREST RATE 0.50% APY 0.00 | +info | +2026-01-12 | INTEREST RATE 0.50% APY DETAIL auto-excluded · not a transaction line | ****4821 | statement-jan-2026.pdf | ||||
| check | -2026-01-14 | DEBIT CARD SHELL OIL #2287 | 52.40 | ****4821 | statement-jan-2026.pdf | 2 | 01/14 DEBIT CARD SHELL OIL #2287 (52.40) | +info | +2026-01-14 | DEBIT CARD SHELL OIL #2287 | 52.40 | ****4821 | statement-jan-2026.pdf | ||
| check | -2026-02-02 | POS PURCHASE TRADER JOES #511 | 61.88 | ****4821 | statement-feb-2026.pdf | 1 | 02/02 POS PURCHASE TRADER JOES #511 (61.88) | +info | +2026-02-02 | POS PURCHASE TRADER JOES #511 | 61.88 | ****4821 | statement-feb-2026.pdf | ||
| check | -2026-02-06 | ACH DEPOSIT PAYROLL ACME CORP | 3,250.00 | ****4821 | statement-feb-2026.pdf | 2 | 02/06 ACH DEPOSIT PAYROLL ACME CORP 3,250.00 | +info | +2026-02-06 | ACH DEPOSIT PAYROLL ACME CORP | 3,250.00 | ****4821 | statement-feb-2026.pdf | ||
| check | -2026-02-09 | CHECK #1043 | 1,200.00 | ****4821 | statement-feb-2026.pdf | 2 | 02/09 CHECK #1043 (1,200.00) | +info | +2026-02-09 | CHECK #1043 | 1,200.00 | ****4821 | statement-feb-2026.pdf |
46 of 47 rows selected.
-page and raw are kept off by default; tick them if you want them in the file.page and raw are kept off by default; tick them if you want them in the file.1 row excluded (INTEREST RATE detail line).