diff --git a/layout-review/01_deduplicator.html b/layout-review/01_deduplicator.html new file mode 100644 index 0000000..5b9f55e --- /dev/null +++ b/layout-review/01_deduplicator.html @@ -0,0 +1,187 @@ + + + + + +Layout review — Find Duplicates + + + +
+ +
+
+ visibility + Static layout preview of Find Duplicates, shown with a file imported and a completed run (results + match-group review). All pages → +
+
+ + +
+

Find Duplicates

+ +
+

Find rows that repeat, then keep one and remove the extras.

+ +
+ + + +
+
+ upload_file Drag and drop file here + Up to 1.5 GB · CSV, TSV, XLSX, XLS · encoding & delimiter auto-detected +
+ +
+
+ + customers_export.csv + 2.1 MB + +
+ + +
+ +
Comma (,)
+
Auto-detected on upload. Change if the preview looks wrong.
+
+ + +
+ Preview: customers_export.csv +
+

18,442 rows, 6 columns

+
+ + + + + + + + +
nameemailcityphonesignup_date
0Jane Doejane@acme.ioAustin512-555-01902024-01-04
1jane doeJANE@ACME.IOaustin(512) 555-019001/04/2024
2Bob Smithbob@globex.comDenver720-555-77812024-02-11
3R. Smithbob@globex.comDenver720-555-77812024-02-11
+
+
+
+ + +
+ Options +
+
+ Advanced Options +
+
+
+
+
Leave empty to auto-detect
+
+
email
+
+
name
+
+
+
jaro_winkler
+
+
85
+
most-complete
+
+
+
check Merge mode — fill missing fields in the surviving row
+
+
+
+
+ +
+ + +
+ + +

Results

+
+
Original rows
18,442
+
Duplicate rows
312
−312 removed
+
Match groups
147
+
Rows kept
18,130
+
+
+ + +
+ +
+ + +

Match Groups

+
+ + + +
+ + +
+
+ Group 1 · 2 rows + 98% match +
+
+
+ + + + + + +
keepnameemailcityphonesignup_date
keepJane Doejane@acme.ioAustin512-555-01902024-01-04
removejane doeJANE@ACME.IOaustin(512) 555-019001/04/2024
+
+

Differing columns highlighted. The survivor row is kept; uncheck rows to split the group.

+
+
+ + +
+
+ Group 2 · 2 rows + 87% match +
+
+
+ + + + + + +
keepnameemailcityphonesignup_date
keepBob Smithbob@globex.comDenver720-555-77812024-02-11
removeR. Smithbob@globex.comDenver720-555-77812024-02-11
+
+
+
+ +

Decisions: 1 merged, 1 pending

+ + + +
+ Processing Log +
+
[00:00.01] Loaded 18,442 rows from customers_export.csv +[00:00.04] Strategy: exact(email) + fuzzy(name, jaro_winkler ≥ 85) +[00:00.91] Compared 18,442 rows → 147 match groups +[00:01.02] Survivor rule: most-complete · merge=on +[00:01.05] 312 rows flagged for removal
+
+
+ +
+
+
+ + + + diff --git a/layout-review/02_text_cleaner.html b/layout-review/02_text_cleaner.html new file mode 100644 index 0000000..e423d76 --- /dev/null +++ b/layout-review/02_text_cleaner.html @@ -0,0 +1,208 @@ + + + + + +Layout review — Clean Text + + + + +
+ +
+
+ visibility + Static layout preview of Clean Text, shown with a file imported and a completed run (results metrics, changes-by-column, before/after examples, cleaned preview, downloads). All pages → +
+
+ + +
+

Clean Text

+ +
+

Trim extra spaces and strip out odd characters.

+ +
+ + + +
+
+ upload_file Drag and drop file here + Up to 1.5 GB · CSV, TSV, XLSX, XLS · encoding auto-detected +
+ +
+
+ + contacts_messy.csv + 684 KB + +
+ + +
+ Preview: contacts_messy.csv +
+

4,120 rows, 4 columns

+
check Show hidden characters in preview
+
+ + + + + + + + +
nameemailcompanynotes
0·Jane Doe·jane@acme.ioAcme·Inc.VIP
1Bob  Smithbob@globex.comGlobex
2Ana Lópezana@initech.comInitech·follow up
3Wei ChenWEI@umbrella.coUmbrella“keyaccount”
+
+
+
+ +
+ + +
+ Options +
+
+ +
+ excel-hygiene (recommended) + minimal + paranoid +
+
excel-hygiene: trim, collapse whitespace, fold smart quotes, strip invisible chars, normalize line endings, NFC.
+
+ +
+ Advanced options +
+
+
+
check Trim leading/trailing whitespace
+
check Collapse internal whitespace
+
check Normalize line endings (\r\n → \n)
+
check Strip control characters
+
check Strip BOM
+
+
+
check Fold smart characters (curly quotes, em-dash, NBSP)
+
check Strip zero-width / invisible characters
+
check Unicode NFC normalization
+
Unicode NFKC compat fold (lossy: ① → 1, fi → fi)
+
+
+ +

Scope

+
+ +
+ name + email + company + notes +
+
+
+ +
Choose columns to leave untouched
+
+ +

Case conversion

+
+ +
None
+
+
+
+
+
+ +
+ + +
+ + +

Results

+
+
Cells scanned
16,480
+
Cells changed
3,947
+
% changed
24.0%
+
Columns processed
4
+
+ +
check Show hidden characters (NBSP, ZWSP, smart quotes, control chars…)
+ +

Changes by column

+
+ + + + + + + + +
cells_changed
company1,604
name1,210
notes982
email151
+
+ +

Examples (first 25 changes)

+
+ + + + + + + + + + + + + +
RowColumnBeforeAfterOps applied
1name·Jane Doe·Jane Doetrim
1companyAcme·Inc.Acme Inc.fold_smart
1notesVIPVIP"fold_smart
2nameBob··SmithBob Smithcollapse_ws
2emailbob@globex.combob@globex.comstrip_zero_width
2notesstrip_control
3companyInitech·Initechtrim
4nameWei ChenWei Chentrim
4notes“keyaccount”"key-account"fold_smart, nfc
+
+ +

Cleaned preview (first 10 rows)

+
+ + + + + + + + +
nameemailcompanynotes
0Jane Doejane@acme.ioAcme Inc.VIP"
1Bob Smithbob@globex.comGlobex
2Ana Lópezana@initech.comInitechfollow up
3Wei ChenWEI@umbrella.coUmbrella"key-account"
+
+

Changed cells highlighted. Toggle “Show hidden characters” to inspect the invisibles being removed.

+ +
+ + +
+ + + +
+ +
+
+
+ + + + diff --git a/layout-review/03_format_standardizer.html b/layout-review/03_format_standardizer.html new file mode 100644 index 0000000..5e7fefa --- /dev/null +++ b/layout-review/03_format_standardizer.html @@ -0,0 +1,224 @@ + + + + + +Layout review — Standardize Formats + + + +
+ +
+
+ visibility + Static layout preview of Standardize Formats, shown with a file imported from the upload screen and a completed run (results + changes audit + standardized preview). All pages → +
+
+ + +
+

Standardize Formats

+ +
+

Make dates, phones, currency, and names look the same throughout.

+ +
+ + +
+ description + Using customers_export.csv from the upload screen. +
+ + + +
+ Preview: customers_export.csv +
+

18,442 rows, 6 columns

+
+ + + + + + + + +
full_namephoneamountsignup_dateactive
0jane DOE(512) 555-0190$1,234.501/04/2024Y
1bob smith720.555.7781$992024-2-11yes
2ALICIA REYES+1 415 555 2233$45,000Mar 3, 2024n
3m. okafor2125550148$7.9992024/04/22true
+
+
+
+ +
+ + +
+ Options +
+ +

Column types

+

Assign each column to a field type. Auto-detected suggestions are pre-filled; pick (skip) to leave a column untouched.

+ + +
+
Name
+
Phone
+
Currency
+
+
+
Date
+
Boolean
+
(skip)
+
+ +
+

Format options

+ + +
+ +
+ US (default) — ISO 8601 dates · E.164 phones · USD + European — DMY input · INTL phones · EUR comma decimal + UK — DD/MM/YYYY · GB phones · Yes/No booleans + ISO Strict — ISO 8601 · bare-number currency · true/false + Legacy US — MM/DD/YYYY · National phones · Yes/No + Custom — keep current settings +
+
Pick a published standard or regional convention as the baseline. Every option below is still individually overridable.
+
+ + +
+ +
+

Dates

+
YYYY-MM-DD (ISO)
+
+ +
+ MDY (US) + DMY (EU) +
+
+ +

Phones

+
E.164 (+15551234567)
+
+ +
US
+
Region used when the input has no country code. US, GB, DE, etc.
+
+
+ + +
+

Currency

+
+ +
+ dot (1,234.56) + comma (1.234,56) +
+
+
2
+
Preserve original precision (don't round)
+
Preserve currency code (emit USD 1234.56, EUR 99.00, etc.)
+ +

Names

+
Title Case
+ +

Booleans

+
True/False
+
+
+ +
+
+ +
+ + +
+ + +

Results

+
+
Cells scanned
92,210
+
Cells changed
61,838
+
% changed
67.1%
+
Unparseable
47
+
+ +
+ info + 47 cell(s) in typed columns didn't match a recognizable shape and were left as-is. Check the changes audit below to find them, or re-classify the column to (skip). +
+ + +

Changes by column

+
+ + + + + + + + + +
columnfield_typecells_changed
amountcurrency17,902
full_namename16,041
phonephone14,388
signup_datedate11,205
activeboolean2,302
+
+ + +

Examples (first 25 changes)

+
+ + + + + + + + + + + + + + +
rowcolumnfield_typebeforeafter
1full_namenamejane DOEJane Doe
1phonephone(512) 555-0190+15125550190
1amountcurrency$1,234.51234.50
1signup_datedate01/04/20242024-01-04
1activebooleanYTrue
2full_namenamebob smithBob Smith
2phonephone720.555.7781+17205557781
2signup_datedate2024-2-112024-02-11
3signup_datedateMar 3, 20242024-03-03
4amountcurrency$7.9998.00
+
+ + +

Standardized preview (first 10 rows)

+
+ + + + + + + + +
full_namephoneamountsignup_dateactive
0Jane Doe+151255501901234.502024-01-04True
1Bob Smith+1720555778199.002024-02-11True
2Alicia Reyes+1415555223345000.002024-03-03False
3M. Okafor+121255501488.002024-04-22True
+
+ +
+ + +
+ + + +
+ +
+
+
+ + + + diff --git a/layout-review/04_missing_handler.html b/layout-review/04_missing_handler.html new file mode 100644 index 0000000..f42508e --- /dev/null +++ b/layout-review/04_missing_handler.html @@ -0,0 +1,271 @@ + + + + + +Layout review — Fix Missing Values + + + +
+ +
+
+ visibility + Static layout preview of Fix Missing Values, shown with a file imported and a completed run (per-column missingness profile + before/after results). All pages → +
+
+ + +
+

Fix Missing Values

+ +
+

Find blank cells (even hidden ones) and fill them in or remove them.

+ +
+ + +

Tip: files imported on the Home screen are picked up here automatically.

+ +
+
+ upload_file Drag and drop file here + Up to 1.5 GB · CSV, TSV, XLSX, XLS +
+ +
+
+ + survey_responses.csv + 684 KB + +
+ + +
+ Preview: survey_responses.csv +
+

2,150 rows, 6 columns

+
+ + + + + + + + +
respondent_idageregionincomesatisfactioncomments
0R-100134West520004great service
1R-1002N/AEast3?
2R-100341-61000NULLnone
3R-100429SouthN/A5quick
+
+
+
+ +
+ + +
+ Options +
+ +

Missingness profile

+
+
Rows
2,150
+
Cells missing
1,043
+
% cells missing
8.1%
+
Complete rows
1,388
+
+ +
+ + + + + + + + + + +
columndtypemissingmissing_pctdisguisedhas_missing
respondent_idobject00.0%0False
agefloat641878.7%61True
regionobject1426.6%142True
incomefloat6432915.3%118True
satisfactionfloat64954.4%40True
commentsobject29013.5%290True
+
+ +
+ +

Strategy

+
+ +
+ detect-only (standardize sentinels to NaN, no fill or drop) + safe-fill (numeric → median, categorical → mode) + drop-incomplete (drop any row with missing) +
+
detect-only: replace 'N/A', '-', 'NULL', etc. with real NaN, then stop. safe-fill: also fill — numeric columns with median, others with mode. drop-incomplete: also drop every row that has any missing cell.
+
+ + +
+ Advanced options +
+
+
+

Detection

+
check Standardize disguised nulls to NaN
+
+ +
N/A, n/a, NA, NULL, null, None, -, --, ?, #N/A
+
Matched case-insensitively after stripping whitespace.
+
+
+
+

Strategy override

+
+ +
(use preset)
+
drop_row / drop_col use the thresholds below. mean / median / interpolate are numeric only — non-numeric columns fall back to the categorical strategy.
+
+
+ +
mode
+
+
+
+ +

Drop thresholds

+
+
+ +
1.00
+
+
+ +
1.00
+
+
+ +

Scope

+
+ +
+ respondent_id + age + region + income + satisfaction + comments +
+
+
+ +
Choose columns
+
+ +

Per-column strategy overrides (optional)

+

Set a different strategy for specific columns. Leave any row blank to use the global strategy.

+
+ + + + + + + + + +
ColumnOverride
agemedian
regionmode
income
satisfaction
commentsconstant
+
+
+
+ +
+
+ +
+ + +
+ + +
+

Results

+
+
Sentinels → NaN
651
+
Cells filled
1,043
+
Rows dropped
0
+
Columns dropped
0
+
+ +

Missingness — before vs. after

+
+ + + + + + + + + + +
columnbefore_missingbefore_pctafter_missingafter_pct
respondent_id00.000.0
age1878.700.0
region1426.600.0
income32915.300.0
satisfaction954.400.0
comments29013.500.0
+
+ +

Strategy applied per column

+
+ + + + + + + + + +
columnstrategy
agemedian
regionmode
incomemedian
satisfactionmedian
commentsconstant
+
+ +

Audit (first 50 changes)

+
+ + + + + + + + + + +
rowcolumnold_valuenew_valuereason
2ageN/A37.0fill: median
2income(blank)54000.0fill: median
2comments?(no comment)fill: constant
3region-Westfill: mode
3satisfactionNULL4.0fill: median
4incomeN/A54000.0fill: median
+
+

… and 1,037 more (download the full audit below).

+ +

Handled preview (first 10 rows)

+
+ + + + + + + + +
respondent_idageregionincomesatisfactioncomments
0R-100134.0West52000.04.0great service
1R-100237.0East54000.03.0(no comment)
2R-100341.0West61000.04.0none
3R-100429.0South54000.05.0quick
+
+ +
+ + +
+ + + +
+ +
+
+
+ + + + diff --git a/layout-review/05_column_mapper.html b/layout-review/05_column_mapper.html new file mode 100644 index 0000000..f92aafc --- /dev/null +++ b/layout-review/05_column_mapper.html @@ -0,0 +1,222 @@ + + + + + +Layout review — Map Columns + + + +
+ +
+
+ visibility + Static layout preview of Map Columns, shown with a file imported, an interactive target schema + mapping configured, and a completed run (results + mapped preview). All pages → +
+
+ + +
+

Map Columns

+ +
+

Rename columns, change their order, and set each one as text, number, or date.

+ +
+ + +

You can also import a file on the home screen and pick it up here.

+ +
+
+ upload_file Drag and drop file here + Up to 1.5 GB · CSV, TSV, XLSX, XLS · encoding & delimiter auto-detected +
+ +
+
+ + crm_contacts_raw.csv + 684 KB + +
+ + +
+ Preview: crm_contacts_raw.csv +
+

4,210 rows, 6 columns

+
+ + + + + + + + +
Full NameEmailAddrPhone #SignupAmount SpentNotes
0Jane Doejane@acme.io512-555-019001/04/2024$1,204.50VIP
1Bob Smithbob@globex.com720-555-778102/11/2024$88.00
2Carla Reyescarla@initech.net415-555-332203/02/2024$612.10renewal
3Dev Pateldev@umbrella.co206-555-904303/19/2024$0.00
+
+
+
+ +
+ + +
+ Options +
+ + +

Target schema

+
+ +
+ Build interactively (start from current columns) + Import schema JSON + Skip (rename / coerce only — no schema) +
+
An interactive build is fastest for one-off cleanup. Import a JSON when you have a fixed contract (a CRM import format, db schema). Skip when you only want to rename or coerce specific columns.
+
+ +

Edit the table to define your target schema. Add rows for fields the input doesn't have yet (with a default), or remove rows for columns you want to drop.

+ + +
+ + + + + + + + + + + +
Target nameTypeRequiredDefault (for added cols)Aliases (comma-sep, helps fuzzy-match)
full_namestringFull Name, name
emailstringEmailAddr, email_address
phonestringPhone #, tel
signup_datedateSignup
amount_spentfloat0.0Amount Spent
sourcestringcrm-import
add add row
+
+

6 target fields · 1 added field (source) not present in the input.

+ +
+ + +

Strategy

+
+ +
+ rename-only (just rename, leave types alone, keep extras) + lenient-schema (rename + coerce + reorder, keep extras) + strict-schema (rename + coerce + reorder, drop extras) +
+
+ + +
+ Advanced options +
+
+
+
+ +
keep
+
+
check Coerce types per schema
+
check Reorder to schema order
+
+
+
check Auto-infer mapping (fuzzy match)
+
+ +
0.80
+
+
check Enforce required fields
+
+
+
+
+ + +

Mapping

+ +
+ + + + + + + + + + +
SourceTargetAuto-suggested
Full Namefull_name
EmailAddremail
Phone #phone
Signupsignup_date
Amount Spentamount_spent
Notes(unmapped)
+
+

Pick a target for each source column. Notes stays unmapped — with the lenient preset it is kept as-is. source is added from the schema default.

+ +
+
+ +
+ + +
+ + +
+

Results

+
+
Renamed
5
+
Dropped
0
+
Added
1
+
Coerce fails
3
+
+ +
infoAdded (with defaults): source
+
warningSome cells could not be coerced and were left as NaN: amount_spent (3)
+ +

Resolved mapping

+
+ + + + + + + + + +
sourcetargetauto
Full Namefull_nameTrue
EmailAddremailTrue
Phone #phoneTrue
Signupsignup_dateTrue
Amount Spentamount_spentTrue
+
+ +

Mapped preview (first 10 rows)

+
+ + + + + + + + + +
full_nameemailphonesignup_dateamount_spentsourceNotes
0Jane Doejane@acme.io512-555-01902024-01-041204.5crm-importVIP
1Bob Smithbob@globex.com720-555-77812024-02-1188.0crm-import
2Carla Reyescarla@initech.net415-555-33222024-03-02612.1crm-importrenewal
3Dev Pateldev@umbrella.co206-555-90432024-03-190.0crm-import
4Mei Linmei@hooli.com503-555-11882024-04-07NaNcrm-importtrial
+
+ +
+ + +
+ + + +
+ +
+
+
+ + + + diff --git a/layout-review/06_outlier_detector.html b/layout-review/06_outlier_detector.html new file mode 100644 index 0000000..81154e8 --- /dev/null +++ b/layout-review/06_outlier_detector.html @@ -0,0 +1,91 @@ + + + + + +Layout review — Find Unusual Values + + + +
+ +
+
+ visibility + Static layout preview of Find Unusual Values — a Coming Soon tool. The page is a stub/teaser: an "under development" notice, a list of planned features, and disabled placeholder controls (only the file uploader is live). All pages → +
+
+ + +
+

Find Unusual Values

+ +
+

Spot values that look wrong — way too high, too low, or breaking your rules.

+ +
+ + +
+ info + This tool is under development. +
+ + +

Features:

+
    +
  • Z-score detection (configurable threshold)
  • +
  • IQR (interquartile range) detection
  • +
  • MAD (median absolute deviation) detection
  • +
  • Domain-rule violations (e.g., age < 0, price > $1M)
  • +
  • Visual outlier highlighting in data preview
  • +
  • Handling: flag only, remove, cap/winsorize to bounds
  • +
+ +
+ + + +
+
+ upload_file Drag and drop file here + CSV, TSV, XLSX, XLS · Import a file to preview. Processing is not yet available. +
+ +
+ + +

Detection Method

+ +
+ +
Z-Score
+
+ +
+ +
3.0
+
+ +
+ +
1.5
+
+ +

Handling

+ +
+ +
Flag only (add column)
+
+ +
+ + +
+
+
+ + + + diff --git a/layout-review/07_multi_file_merger.html b/layout-review/07_multi_file_merger.html new file mode 100644 index 0000000..7debd16 --- /dev/null +++ b/layout-review/07_multi_file_merger.html @@ -0,0 +1,83 @@ + + + + + +Layout review — Combine Files + + + +
+ +
+
+ visibility + Static layout preview of Combine Files — a Coming-Soon tool. The page is a stub: an "under development" notice, a planned-features list, a working multi-file uploader, and disabled placeholder options. All pages → +
+
+ + +
+

Combine Files

+ +
+

Combine several CSV or Excel files into one — even if columns differ.

+ + +
+ info + This tool is under development. +
+ + +

Features:

+
    +
  • Import multiple CSV/Excel files at once
  • +
  • Automatic schema alignment (matching columns by name)
  • +
  • Append mode: stack files vertically (union)
  • +
  • Join mode: merge files on shared key columns
  • +
  • Handle mismatched columns (fill missing with nulls or drop)
  • +
  • Source file tracking column
  • +
+ +
+ + + +
+
+ upload_file Drag and drop files here + CSV, TSV, XLSX, XLS · multiple files allowed +
+ +
+
Import multiple files to preview. Processing is not yet available.
+ + +

Merge Strategy

+ +
+ +
Append (stack vertically)
+
+ +
+ +
Fill with null
+
+ +
+ check Add source filename column +
+ +
+ + + +
+
+
+ + + + diff --git a/layout-review/08_validator_reporter.html b/layout-review/08_validator_reporter.html new file mode 100644 index 0000000..895ff5f --- /dev/null +++ b/layout-review/08_validator_reporter.html @@ -0,0 +1,93 @@ + + + + + +Layout review — Quality Check + + + +
+ +
+
+ visibility + Static layout preview of Quality Check, a Coming-Soon tool. The page is a stub: an "under development" notice, a feature list, a working file uploader, and disabled placeholder controls. All pages → +
+
+ + +
+

Quality Check

+ +
+

Check your file against rules you set, and export a PDF or Excel report.

+ +
+ + +
+ info + This tool is under development. +
+ + +

Features:

+
    +
  • Column-level validation rules (not null, unique, regex pattern, range, enum)
  • +
  • Cross-column validation (e.g., start_date < end_date)
  • +
  • Data quality score per column and overall
  • +
  • Generate PDF quality report
  • +
  • Generate Excel report with flagged rows highlighted
  • +
  • Summary dashboard: pass/fail counts, severity breakdown
  • +
+ +
+ + + +
+
+ upload_file Drag and drop file here + Import a file to preview. Processing is not yet available. +
+ +
+ + +

Validation Rules

+ + +
+
+ upload_file Drag and drop file here + JSON +
+ +
+ +
+ +
+ Choose options +
+
+ +

Report Format

+ +
+ +
Excel (flagged rows)
+
+ +
+ + + +
+
+
+ + + + diff --git a/layout-review/09_pipeline_runner.html b/layout-review/09_pipeline_runner.html new file mode 100644 index 0000000..8022be4 --- /dev/null +++ b/layout-review/09_pipeline_runner.html @@ -0,0 +1,231 @@ + + + + + +Layout review — Automated Workflows + + + +
+ +
+
+ visibility + Static layout preview of Automated Workflows (Pipeline Runner), shown with a file imported, a four-step pipeline configured, and a completed run (results + per-step summary). All pages → +
+
+ + +
+

Automated Workflows

+ +
+

Run several tools in a row — save the steps once, reuse them anytime.

+ +
+ + + +
+
+ upload_file Drag and drop file here + Up to 1.5 GB · CSV, TSV, XLSX, XLS · encoding & delimiter auto-detected +
+ +
+
+ + customers_export.csv + 2.1 MB + +
+ + +
+ Preview: customers_export.csv +
+

18,442 rows, 6 columns

+
+ + + + + + + + +
nameemailcityphonesignup_date
0 Jane Doe jane@acme.ioAustin512-555-01902024-01-04
1jane doeJANE@ACME.IOaustin(512) 555-019001/04/2024
2Bob Smithbob@globex.comDenver720.555.77812024-02-11
3R. Smithbob@globex.com720-555-7781Feb 11 2024
+
+
+
+ +
+ + +
+ Options +
+ + +
+ +
+ Use the recommended default (text-clean → format → missing → dedup) + Build interactively + Import a saved pipeline JSON +
+
+ +

+ Edit the table to add, remove, reorder (drag the row index), enable, or configure each step. + Tool order is recommended, not enforced — violations surface as warnings below the table. +

+ + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ToolEnabledOptions (JSON)
≡ 0text_clean expand_morecheck{"trim": true, "collapse_whitespace": true}
≡ 1format_standardize expand_morecheck{"column_types": {"phone": "phone", "signup_date": "date"}}
≡ 2missing expand_morecheck{"strategy": "flag", "sentinels": ["N/A", "—"]}
≡ 3dedup expand_morecheck{"survivor_rule": "most_complete", "merge": true}
Add row
+
+ + + + +
+ Recommended tool order — why each step belongs where it does +
+

text_clean before format_standardize — format parsers (phone / currency / date) fail on smart-quote-contaminated or NBSP-padded input — clean text first

+

text_clean before missing — sentinel detection misses cells padded with NBSP / zero-width characters — clean text first

+

text_clean before dedup — fuzzy matching treats NBSP-padded values as different — clean text first

+

format_standardize before missing — numeric imputation needs numeric dtypes; canonical phones / currencies improve sentinel detection

+

format_standardize before dedup — canonical phones / lowercase emails enable cross-format duplicate matching

+

missing before dedup — deduping rows with mixed NaN sentinels produces brittle merges — resolve missing values first

+
+
+ +
+
+ +
+ + + + +
+ + +

Results

+
+
Initial rows
18,442
+
Final rows
18,130
+
Steps run
4
+
Elapsed
1.84 s
+
+ +

Per-step summary

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
stepstatuselapsed_mssummaryerror
text_cleanok214{"cells_changed": 1204, "columns": ["name", "city"]}
format_standardizeok388{"phone": 18301, "signup_date": 17996}
missingok121{"flagged_cells": 642, "sentinels_found": ["—"]}
dedupok911{"input_rows": 18442, "output_rows": 18130, "duplicates_removed": 312, "groups": 147}
+
+ +

Output preview (first 10 rows)

+
+ + + + + + + + + +
nameemailcityphonesignup_date
0Jane Doejane@acme.ioAustin+1 512-555-01902024-01-04
1Bob Smithbob@globex.comDenver+1 720-555-77812024-02-11
2Carla Reyescarla@initech.coPhoenix+1 480-555-33202024-03-02
3Dan Okafordan@umbrella.net⚑ missing+1 206-555-77452024-03-18
4Emily Tranemily@hooli.comSeattle+1 206-555-11822024-04-05
+
+ +
+ + +
+ + + +
+ +
+
+
+ + + + diff --git a/layout-review/10_pdf_extractor.html b/layout-review/10_pdf_extractor.html new file mode 100644 index 0000000..8eb11fc --- /dev/null +++ b/layout-review/10_pdf_extractor.html @@ -0,0 +1,189 @@ + + + + + +Layout review — PDF to CSV + + + +
+ +
+
+ visibility + Static layout preview of PDF to CSV, shown with two bank-statement PDFs imported and a completed scan (candidate transactions in the editable preview table). All pages → +
+
+ + +
+

PDF to CSV

+ +
+

Pull transactions out of bank-statement PDFs into a clean CSV file.

+ +
+ + +
+ Scan options +
+
+
+ check + Treat (4.50) as negative +
+
+ check + Use OCR for scanned pages +
+
+

OCR status: ready (bundled Tesseract). Most modern bank PDFs are text-based and don't need OCR — only enable for image-based scans.

+
+
+ +
YYYY-MM-DD (2026-01-13)
+
+
+ + +
Leave blank for automatic (statement period → filename year → this override).
+
+
+
+
+ + +
+

Files

+ +
+ + +
+
+ + + statement-jan-2026.pdf + 171.2 KB +
+
+ + + statement-feb-2026.pdf + 147.2 KB +
+ +
+ + +
+ + +
+ +
+ + +
+ Warnings (1) +
+
+ warning + [statement-feb-2026.pdf] 2 lines matched a date but no amount — skipped (likely a wrapped description). Check the source if a transaction looks missing. +
+
+
+ + +

47 candidate transaction(s) from 2 file(s)

+

Uncheck rows to exclude. Edit any cell to fix a value the scanner got wrong. The raw column shows the original PDF text for that row.

+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Includedatedescriptionamount_debitamount_creditaccount_numbersource_filepageraw
check2026-01-03OPENING BALANCE****4821statement-jan-2026.pdf101/03 OPENING BALANCE 2,140.55
check2026-01-05POS PURCHASE WHOLE FOODS MKT84.12****4821statement-jan-2026.pdf101/05 POS PURCHASE WHOLE FOODS MKT (84.12)
check2026-01-08ACH DEPOSIT PAYROLL ACME CORP3,250.00****4821statement-jan-2026.pdf101/08 ACH DEPOSIT PAYROLL ACME CORP 3,250.00
check2026-01-11ONLINE TRANSFER TO SAVINGS500.00****4821statement-jan-2026.pdf201/11 ONLINE TRANSFER TO SAVINGS (500.00)
2026-01-12INTEREST RATE 0.50% APY DETAIL****4821statement-jan-2026.pdf201/12 INTEREST RATE 0.50% APY 0.00
check2026-01-14DEBIT CARD SHELL OIL #228752.40****4821statement-jan-2026.pdf201/14 DEBIT CARD SHELL OIL #2287 (52.40)
check2026-02-02POS PURCHASE TRADER JOES #51161.88****4821statement-feb-2026.pdf102/02 POS PURCHASE TRADER JOES #511 (61.88)
check2026-02-06ACH DEPOSIT PAYROLL ACME CORP3,250.00****4821statement-feb-2026.pdf202/06 ACH DEPOSIT PAYROLL ACME CORP 3,250.00
check2026-02-09CHECK #10431,200.00****4821statement-feb-2026.pdf202/09 CHECK #1043 (1,200.00)
+
+ + +
+
+ +

46 of 47 rows selected.

+
+
+
+ +
+ date + description + amount_debit + amount_credit + account_number + source_file +
+
page and raw are kept off by default; tick them if you want them in the file.
+
+
+
+ +
+
+
+ + + + diff --git a/layout-review/11_reconciler.html b/layout-review/11_reconciler.html new file mode 100644 index 0000000..f0fb49e --- /dev/null +++ b/layout-review/11_reconciler.html @@ -0,0 +1,251 @@ + + + + + +Layout review — Reconcile Two Files + + + +
+ +
+
+ visibility + Static layout preview of Reconcile Two Files, shown with both files imported, key columns mapped, and a completed reconciliation (matched / review / unmatched results). All pages → +
+
+ + +
+

Reconcile Two Files

+ +
+

Compare two lists of transactions (e.g. bank vs. ledger) and flag what doesn't match.

+ +
+ + +
+ +
+

Left (e.g. bank feed)

+
+
+ upload_file Drag and drop file here + CSV, TSV, XLSX, XLS +
+ +
+
+ + bank_feed_may.csv + 214 KB +
+

bank_feed_may.csv — 1,204 rows, 4 columns

+
+ Preview left (e.g. bank feed) +
+
+ + + + + + + + +
posted_datedescriptionamountref
2026-05-01ACME SUPPLIES-1240.00CHK1041
2026-05-02PAYROLL RUN-8800.00ACH5520
2026-05-03CLIENT GLOBEX5200.00DEP0090
2026-05-04UTILITY CO-318.42CHK1042
+
+
+
+
+ +
+

Right (e.g. ledger)

+
+
+ upload_file Drag and drop file here + CSV, TSV, XLSX, XLS +
+ +
+
+ + ledger_may.xlsx + 96 KB +
+

ledger_may.xlsx — 1,198 rows, 5 columns

+
+ Preview right (e.g. ledger) +
+
+ + + + + + + + +
txn_datememovalueinvoice_noaccount
2026-05-01Acme Supplies Inc-1240.00INV-10415000
2026-05-02Monthly payroll-8800.00INV-55206000
2026-05-03Globex retainer5200.00INV-00904000
2026-05-04City Utilities-318.40INV-10426100
+
+
+
+
+
+ +
+ + +

Match settings

+
+ +
+

Left columns

+
posted_date
+
description
+
amount
+
+
ref
+
+ +
+

Right columns

+
txn_date
+
memo
+
value
+
+
invoice_no
+
+
+ + +
+ Tolerances & options +
+
+
+
0.0200
+
Absolute tolerance on amount (e.g. 0.01 to absorb cent rounding).
+
+
1
+
Allow N calendar days of drift between posting dates.
+
+
Invert right amount sign
+
Use when one side records debits as positive and the other as negative.
+
+
+
80
+
When both sides have a description column set, accept matches with this minimum fuzzy similarity even if amount/date are merely within tolerance. Lower = more permissive.
+
+
+ +
+ + + +
+ + +

Results

+
+
Matched
1,173
+
Review
9
+
Unmatched left
22
+
Unmatched right
16
+
+

Coverage: 97.4% of the larger side

+ + +
+ Matched (1,173) + Review (9) + Unmatched left (22) + Unmatched right (16) +
+ + +

Preview of first 25 of 1,173 rows — download the CSV below for the full set.

+
+ + + + + + + + + + + + +
left_posted_dateleft_descriptionleft_amountright_txn_dateright_memoright_valueamount_diff
2026-05-01ACME SUPPLIES-1240.002026-05-01Acme Supplies Inc-1240.000.00
2026-05-02PAYROLL RUN-8800.002026-05-02Monthly payroll-8800.000.00
2026-05-03CLIENT GLOBEX5200.002026-05-03Globex retainer5200.000.00
2026-05-04UTILITY CO-318.422026-05-04City Utilities-318.400.02
2026-05-06OFFICE DEPOT-89.152026-05-07Office supplies-89.150.00
+
+ + +
+ Review (9) — ambiguous candidates +
+

Pairs flagged because the algorithm couldn't pick a single best match (e.g. multiple equally-good candidates). Use the left/right indices to disambiguate manually.

+
+ + + + + + +
left_idxleft_amountright_idxright_valuecandidates
118-450.00121, 209-450.002 equal
2031000.00198, 2441000.002 equal
+
+
+
+ +
+ Unmatched left (22) — only in bank_feed_may.csv +
+

Preview of first 25 of 22 rows.

+
+ + + + + + +
posted_datedescriptionamountref
2026-05-09BANK FEE-12.00FEE0001
2026-05-14ATM WITHDRAWAL-200.00ATM7781
+
+
+
+ +
+ Unmatched right (16) — only in ledger_may.xlsx +
+

Preview of first 25 of 16 rows.

+
+ + + + + + +
txn_datememovalueinvoice_noaccount
2026-05-11Accrued interest37.50INV-90017000
2026-05-22Depreciation-410.00INV-90448000
+
+
+
+ +
+ + +
+ + + + +
+ +
+
+
+ + + + diff --git a/layout-review/assets/app.css b/layout-review/assets/app.css new file mode 100644 index 0000000..b363eaa --- /dev/null +++ b/layout-review/assets/app.css @@ -0,0 +1,473 @@ +/* =========================================================================== + DataTools — static layout-review stylesheet + --------------------------------------------------------------------------- + Faithful reproduction of the live Streamlit app's design system for human + review of page layouts. Tokens are copied verbatim from src/gui/theme.py + (§3 color + type scale) and the component values from + src/gui/components/_legacy.py:_DESIGN_TOKENS_CSS. + + The live app applies these styles to Streamlit's data-testid DOM; here we + re-express the same look against clean semantic classes so the static HTML + stays readable. Where the app uses real .dt-* classes (page header, files + card, findings, stats) the class names are kept identical. + =========================================================================== */ + +@import url("https://fonts.googleapis.com/css2?family=Geist:wght@400;500;600;700&family=Geist+Mono:wght@400;500&display=swap"); +@import url("https://fonts.googleapis.com/css2?family=Material+Symbols+Outlined:opsz,wght,FILL,GRAD@20..48,400,0,0&display=block"); + +:root { + --font-sans: "Geist", -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif; + --font-mono: "Geist Mono", ui-monospace, "SF Mono", Menlo, monospace; + + --ink: #1c1917; + --ink-secondary: #57534e; + --ink-tertiary: #a8a29e; + --bg: #fafaf7; + --surface: #ffffff; + --surface-hover: #f8f7f3; + --border: #e7e5dc; + --border-strong: #d6d3c7; + --accent: #c2410c; + --accent-hover: #9a3412; + --accent-fill: #fef4ed; + --accent-fill-strong: #fde4d3; + + --warn: #b45309; + --warn-fill: #fef3c7; + --info: #0369a1; + --info-fill: #e0f2fe; + --success: #15803d; + --success-fill: #dcfce7; + --danger: #b91c1c; + --danger-fill: #fee2e2; + + --r-sm: 6px; + --r-md: 10px; + --r-lg: 14px; + + --sidebar-w: 264px; +} + +* { box-sizing: border-box; } + +html, body { + margin: 0; + padding: 0; + background: var(--bg); + color: var(--ink); + font-family: var(--font-sans); + font-feature-settings: "ss01", "cv01", "cv11"; + -webkit-font-smoothing: antialiased; +} + +/* ---------- Type scale (theme.py §4) ---------- */ +h1 { font-size: 32px; font-weight: 600; letter-spacing: -0.035em; line-height: 1.1; margin: 0 0 4px; } +h2 { font-size: 22px; font-weight: 600; letter-spacing: -0.025em; line-height: 1.2; margin: 1.5rem 0 0.75rem; } +h3 { font-size: 18px; font-weight: 500; letter-spacing: -0.018em; line-height: 1.25; margin: 1.25rem 0 0.5rem; } +h4 { font-size: 15px; font-weight: 500; letter-spacing: -0.012em; line-height: 1.35; margin: 1rem 0 0.5rem; } +p { font-size: 14px; font-weight: 400; line-height: 1.55; color: var(--ink); margin: 0 0 0.6rem; } +strong { font-weight: 500; color: var(--ink); } +a { color: var(--accent); text-decoration: none; } +a:hover { color: var(--accent-hover); text-decoration: underline; } +code, .dt-mono { font-family: var(--font-mono); font-size: 0.92em; font-feature-settings: "ss02"; } + +/* =========================================================================== + App frame — sidebar + main + sticky footer + =========================================================================== */ +.dt-app { display: flex; min-height: 100vh; } + +/* ---------- Sidebar (cream paper) ---------- */ +.dt-sidebar { + width: var(--sidebar-w); + flex-shrink: 0; + background: #f5f4ef; + border-right: 1px solid var(--border); + padding: 18px 14px 90px; + position: sticky; + top: 0; + align-self: flex-start; + height: 100vh; + overflow-y: auto; +} +.dt-brand { display: flex; align-items: center; gap: 10px; padding: 0 4px 18px; } +.dt-brand-mark { + width: 28px; height: 28px; border-radius: 7px; + background: var(--ink); color: var(--accent-fill); + display: inline-flex; align-items: center; justify-content: center; + font-weight: 700; font-size: 16px; letter-spacing: -0.04em; line-height: 1; flex-shrink: 0; +} +.dt-brand-name { display: flex; flex-direction: column; gap: 1px; line-height: 1.05; } +.dt-brand-eyebrow { + font-size: 9.5px; font-weight: 600; letter-spacing: 0.14em; + text-transform: uppercase; color: var(--ink-tertiary); line-height: 1; +} +.dt-brand-word { font-weight: 600; font-size: 15px; letter-spacing: -0.02em; color: var(--ink); } + +.dt-nav { display: flex; flex-direction: column; } +.dt-nav-section { + font-size: 11.5px; text-transform: uppercase; letter-spacing: 0.08em; + color: var(--ink-tertiary); font-weight: 500; + padding: 14px 10px 4px; margin: 0; + display: flex; align-items: center; justify-content: space-between; +} +.dt-nav-section .dt-nav-indicator { font-size: 16px; color: var(--ink-tertiary); } +.dt-nav-link { + display: flex; align-items: center; gap: 8px; + color: var(--ink-secondary); font-size: 13px; font-weight: 500; line-height: 1.3; + padding: 5px 10px; border-radius: var(--r-sm); margin-bottom: 1px; + text-decoration: none; transition: background 0.12s ease, color 0.12s ease; +} +.dt-nav-link:hover { background: rgba(0,0,0,0.04); color: var(--ink); text-decoration: none; } +.dt-nav-link.is-active { background: rgba(0,0,0,0.04); color: var(--ink); font-weight: 600; } +.dt-nav-link .dt-mi { font-family: "Material Symbols Outlined"; font-size: 18px; color: var(--ink-secondary); line-height: 1; } +.dt-nav-link.is-active .dt-mi { color: var(--ink); } +.dt-nav-link.is-soon { opacity: 0.55; } +.dt-nav-soon-tag { + margin-left: auto; font-size: 9px; font-weight: 600; letter-spacing: 0.06em; + text-transform: uppercase; color: var(--ink-tertiary); + border: 1px solid var(--border-strong); border-radius: 999px; padding: 1px 6px; +} + +.dt-sidebar-foot { margin-top: 22px; padding-top: 16px; border-top: 1px solid var(--border); display: flex; flex-direction: column; gap: 10px; } +.dt-sidebar-label { font-size: 11.5px; font-weight: 500; text-transform: uppercase; letter-spacing: 0.08em; color: var(--ink-tertiary); margin-bottom: 4px; } +.dt-license-badge { font-size: 12.5px; color: var(--ink-secondary); } + +/* ---------- Main column ---------- */ +.dt-main { flex: 1; min-width: 0; padding: 40px 56px 96px; } +.dt-main-inner { max-width: 920px; margin: 0 auto; } + +/* Review banner above every mockup */ +.dt-review-banner { + max-width: 920px; margin: 0 auto 20px; display: flex; gap: 10px; align-items: center; + background: var(--info-fill); color: var(--info); + border: 1px solid transparent; border-radius: var(--r-md); + padding: 8px 14px; font-size: 12.5px; line-height: 1.4; +} +.dt-review-banner a { color: var(--info); text-decoration: underline; } +.dt-review-banner .dt-mi { font-family: "Material Symbols Outlined"; font-size: 18px; } + +/* ---------- Sticky footer ---------- */ +.dt-footer { + position: fixed; bottom: 0; left: var(--sidebar-w); right: 0; + background: rgba(255,255,255,0.97); backdrop-filter: blur(8px); + border-top: 1px solid var(--border-strong); + padding: 8px 20px; z-index: 50; + display: flex; align-items: center; gap: 8px; +} +.dt-footer-btn { + display: inline-flex; align-items: center; gap: 8px; + color: var(--ink-secondary); font-size: 13px; font-weight: 500; line-height: 1.3; + padding: 5px 10px; border-radius: var(--r-sm); + background: transparent; border: none; cursor: pointer; text-decoration: none; +} +.dt-footer-btn:hover { background: rgba(0,0,0,0.04); color: var(--ink); text-decoration: none; } +.dt-footer-btn .dt-mi { font-family: "Material Symbols Outlined"; font-size: 16px; } + +/* =========================================================================== + Page header (brand + privacy pill) — .dt-page-* mirror the live app + =========================================================================== */ +.dt-page-header { + display: flex; align-items: center; justify-content: space-between; gap: 24px; + margin: 0 0 24px; padding-bottom: 22px; border-bottom: 1px solid var(--border); +} +.dt-page-brand { display: flex; flex-direction: column; gap: 8px; } +.dt-page-brand-row { display: flex; align-items: center; gap: 18px; } +.dt-page-brand-mark { + width: 56px; height: 56px; border-radius: 14px; background: var(--ink); + color: var(--accent-fill); display: inline-flex; align-items: center; justify-content: center; + font-weight: 700; font-size: 32px; letter-spacing: -0.04em; line-height: 1; flex-shrink: 0; +} +.dt-page-brand-words { display: flex; flex-direction: column; gap: 2px; line-height: 1; } +.dt-page-eyebrow { font-size: 11.5px; font-weight: 600; letter-spacing: 0.14em; text-transform: uppercase; color: var(--ink-tertiary); line-height: 1.2; } +.dt-page-wordmark { margin: 0; font-weight: 600; font-size: 32px; letter-spacing: -0.035em; line-height: 1.1; color: var(--ink); } +.dt-page-subtitle { margin: 4px 0 0; color: var(--ink-secondary); font-size: 14px; line-height: 1.5; } +.dt-privacy-pill { + display: inline-flex; align-items: center; gap: 6px; padding: 6px 11px; + background: var(--success-fill); color: var(--success); border-radius: 999px; + font-size: 12px; font-weight: 500; white-space: nowrap; flex-shrink: 0; +} +.dt-privacy-pill svg { width: 13px; height: 13px; stroke-width: 2; } + +/* ---------- Tool header (title + Help popover) ---------- */ +.dt-tool-header { display: flex; align-items: flex-start; justify-content: space-between; gap: 16px; } +.dt-tool-header h1 { margin: 0; } +.dt-help-btn { + display: inline-flex; align-items: center; gap: 6px; white-space: nowrap; + background: var(--surface); color: var(--ink); border: 1px solid var(--border-strong); + border-radius: var(--r-md); padding: 9px 16px; font-size: 13.5px; font-weight: 500; + cursor: pointer; flex-shrink: 0; margin-top: 6px; +} +.dt-help-btn .dt-mi { font-family: "Material Symbols Outlined"; font-size: 18px; } +.dt-tool-caption { font-size: 12.5px; color: var(--ink-tertiary); line-height: 1.5; margin: 2px 0 0; } + +/* =========================================================================== + Buttons + =========================================================================== */ +.dt-btn { + border-radius: var(--r-md); font-family: var(--font-sans); font-weight: 500; + font-size: 13.5px; letter-spacing: -0.005em; line-height: 1; padding: 9px 16px; + border: 1px solid var(--border-strong); background: var(--surface); color: var(--ink); + cursor: pointer; transition: background 0.12s ease, border-color 0.12s ease, color 0.12s ease; + display: inline-flex; align-items: center; justify-content: center; gap: 8px; +} +.dt-btn:hover { background: var(--surface-hover); border-color: var(--ink-tertiary); } +.dt-btn-primary { background: var(--ink); color: var(--bg); border-color: var(--ink); } +.dt-btn-primary:hover { background: #292524; border-color: #292524; color: var(--bg); } +.dt-btn-tertiary { background: transparent; border: none; color: var(--ink-tertiary); padding: 4px 8px; } +.dt-btn-tertiary:hover { background: var(--danger-fill); color: var(--danger); } +.dt-btn:disabled, .dt-btn.is-disabled { + background: var(--surface-hover); color: var(--ink-tertiary); + border: 1px solid var(--border); cursor: not-allowed; +} +.dt-btn-block { width: 100%; } +.dt-btn .dt-mi { font-family: "Material Symbols Outlined"; font-size: 18px; } + +.dt-btn-row { display: flex; gap: 10px; flex-wrap: wrap; } +.dt-btn-row > .dt-btn { flex: 1; } + +/* =========================================================================== + File uploader (cream dropzone) + =========================================================================== */ +.dt-uploader { + background: var(--surface-hover); border: 1px dashed var(--border-strong); + border-radius: var(--r-md); padding: 22px 20px; + display: flex; align-items: center; justify-content: space-between; gap: 16px; +} +.dt-uploader-text { display: flex; flex-direction: column; gap: 2px; } +.dt-uploader-text .hint { font-size: 14px; color: var(--ink); } +.dt-uploader-text .sub { font-size: 12.5px; color: var(--ink-tertiary); } +.dt-uploader .dt-mi { font-family: "Material Symbols Outlined"; font-size: 24px; color: var(--ink-tertiary); } + +/* Staged-file chip */ +.dt-file-chip { + display: flex; align-items: center; gap: 12px; + background: var(--surface); border: 1px solid var(--border); border-radius: var(--r-sm); + padding: 10px 14px; margin-top: 10px; +} +.dt-file-chip .name { font-family: var(--font-mono); font-size: 13px; color: var(--ink); font-feature-settings: "ss02"; } +.dt-file-chip .size { font-family: var(--font-mono); font-size: 12px; color: var(--ink-tertiary); margin-left: auto; } + +/* =========================================================================== + Expanders / bordered cards + =========================================================================== */ +.dt-expander { + background: var(--surface); border: 1px solid var(--border); border-radius: var(--r-lg); + overflow: hidden; box-shadow: 0 1px 2px rgba(28,25,23,0.03); margin: 10px 0; +} +.dt-expander > summary, .dt-expander-head { + background: var(--surface-hover); border-bottom: 1px solid var(--border); + padding: 12px 16px; font-weight: 500; color: var(--ink); font-size: 14px; + cursor: pointer; list-style: none; display: flex; align-items: center; gap: 8px; +} +.dt-expander > summary::-webkit-details-marker { display: none; } +.dt-expander > summary::before { + content: "expand_more"; font-family: "Material Symbols Outlined"; font-size: 20px; + color: var(--ink-tertiary); transition: transform 0.15s ease; +} +.dt-expander[open] > summary::before { transform: rotate(180deg); } +.dt-expander-body, .dt-expander > .dt-expander-body { padding: 14px 16px; } +.dt-expander:not([open]) > summary { border-bottom: none; } + +.dt-card { + background: var(--surface); border: 1px solid var(--border); border-radius: var(--r-lg); + box-shadow: 0 1px 2px rgba(28,25,23,0.03); padding: 16px; margin: 10px 0; +} + +/* =========================================================================== + Alerts + =========================================================================== */ +.dt-alert { + border-radius: var(--r-md); border: 1px solid transparent; + padding: 10px 14px; font-size: 13.5px; line-height: 1.45; margin: 10px 0; + display: flex; gap: 10px; align-items: flex-start; +} +.dt-alert .dt-mi { font-family: "Material Symbols Outlined"; font-size: 18px; flex-shrink: 0; margin-top: 1px; } +.dt-alert.info { background: var(--info-fill); color: var(--info); } +.dt-alert.success { background: var(--success-fill); color: var(--success); } +.dt-alert.warn { background: var(--warn-fill); color: var(--warn); } +.dt-alert.error { background: var(--danger-fill); color: var(--danger); } +.dt-alert code { background: rgba(0,0,0,0.05); padding: 1px 5px; border-radius: 4px; } + +/* =========================================================================== + Inputs (static representations of Streamlit widgets) + =========================================================================== */ +.dt-field { margin: 10px 0; } +.dt-label { font-size: 13px; font-weight: 500; color: var(--ink); margin-bottom: 5px; display: block; } +.dt-label .req { color: var(--accent); } +.dt-input, .dt-select, .dt-textarea { + width: 100%; background: var(--surface); border: 1px solid var(--border-strong); + border-radius: var(--r-sm); padding: 8px 11px; font-family: var(--font-sans); + font-size: 13.5px; color: var(--ink); +} +.dt-select { appearance: none; background-image: linear-gradient(45deg, transparent 50%, var(--ink-tertiary) 50%), linear-gradient(135deg, var(--ink-tertiary) 50%, transparent 50%); background-position: calc(100% - 16px) 14px, calc(100% - 11px) 14px; background-size: 5px 5px, 5px 5px; background-repeat: no-repeat; } +.dt-textarea { min-height: 76px; resize: vertical; font-family: var(--font-mono); font-size: 13px; } +.dt-help-text { font-size: 12px; color: var(--ink-tertiary); margin-top: 4px; } + +/* Multiselect — chips inside a box */ +.dt-multiselect { + width: 100%; background: var(--surface); border: 1px solid var(--border-strong); + border-radius: var(--r-sm); padding: 6px 8px; min-height: 38px; + display: flex; flex-wrap: wrap; gap: 6px; align-items: center; +} +.dt-ms-chip { + display: inline-flex; align-items: center; gap: 5px; background: var(--accent-fill); + color: var(--accent-hover); border-radius: var(--r-sm); padding: 3px 8px; + font-size: 12.5px; font-weight: 500; +} +.dt-ms-chip .x { color: var(--accent); font-size: 13px; } +.dt-ms-placeholder { color: var(--ink-tertiary); font-size: 13px; padding: 2px 4px; } + +/* Checkbox / radio */ +.dt-check { display: flex; align-items: center; gap: 9px; margin: 8px 0; font-size: 13.5px; color: var(--ink); } +.dt-check .box { + width: 18px; height: 18px; border-radius: 5px; border: 1px solid var(--border-strong); + background: var(--surface); display: inline-flex; align-items: center; justify-content: center; flex-shrink: 0; +} +.dt-check.on .box { background: var(--ink); border-color: var(--ink); color: var(--bg); } +.dt-check.on .box .dt-mi { font-family: "Material Symbols Outlined"; font-size: 14px; } +.dt-radio-row { display: flex; gap: 18px; flex-wrap: wrap; margin: 8px 0; } +.dt-radio { display: inline-flex; align-items: center; gap: 7px; font-size: 13.5px; } +.dt-radio .dot { width: 16px; height: 16px; border-radius: 50%; border: 1px solid var(--border-strong); display: inline-block; flex-shrink: 0; } +.dt-radio.on .dot { border: 5px solid var(--ink); } + +/* Slider */ +.dt-slider { margin: 14px 0 6px; } +.dt-slider .track { position: relative; height: 4px; background: var(--border-strong); border-radius: 2px; } +.dt-slider .fill { position: absolute; left: 0; top: 0; height: 4px; background: var(--ink); border-radius: 2px; } +.dt-slider .knob { position: absolute; top: 50%; width: 16px; height: 16px; border-radius: 50%; background: var(--ink); transform: translate(-50%, -50%); } +.dt-slider .val { font-family: var(--font-mono); font-size: 12px; color: var(--ink-secondary); margin-top: 8px; } + +/* =========================================================================== + Layout helpers + =========================================================================== */ +.dt-row { display: flex; gap: 16px; } +.dt-row > * { flex: 1; min-width: 0; } +.dt-cols-2 { display: grid; grid-template-columns: 1fr 1fr; gap: 16px; } +.dt-cols-3 { display: grid; grid-template-columns: repeat(3, 1fr); gap: 16px; } +.dt-divider { border: none; border-top: 1px solid var(--border); margin: 22px 0; } +.dt-caption { font-size: 12.5px; color: var(--ink-tertiary); line-height: 1.5; } +.dt-spacer { height: 12px; } + +/* =========================================================================== + DataFrame / preview table + =========================================================================== */ +.dt-table-wrap { border: 1px solid var(--border); border-radius: var(--r-md); overflow: hidden; margin: 8px 0; } +table.dt-table { width: 100%; border-collapse: collapse; font-size: 13px; } +table.dt-table th { + background: var(--surface-hover); color: var(--ink-secondary); font-weight: 500; + text-align: left; padding: 8px 12px; border-bottom: 1px solid var(--border); + font-size: 12px; text-transform: none; white-space: nowrap; +} +table.dt-table td { + padding: 7px 12px; border-bottom: 1px solid var(--border); + font-family: var(--font-mono); font-size: 12.5px; color: var(--ink); font-feature-settings: "ss02"; white-space: nowrap; +} +table.dt-table tr:last-child td { border-bottom: none; } +table.dt-table tr:nth-child(even) td { background: #fcfbf8; } +table.dt-table td.idx { color: var(--ink-tertiary); background: var(--surface-hover); } +.dt-cell-flag { color: var(--warn); } +.dt-cell-del { color: var(--danger); text-decoration: line-through; } +.dt-cell-add { color: var(--success); } + +/* =========================================================================== + Stats overview (home) — copied from _legacy.py + =========================================================================== */ +.dt-stats { display: grid; grid-template-columns: repeat(4, 1fr); gap: 12px; margin: 8px 0 20px; } +.dt-stat { background: var(--surface); border: 1px solid var(--border); border-radius: var(--r-lg); padding: 16px 18px; box-shadow: 0 1px 2px rgba(28,25,23,0.03); } +.dt-stat-label { font-size: 11.5px; text-transform: uppercase; letter-spacing: 0.08em; color: var(--ink-tertiary); font-weight: 500; margin-bottom: 6px; line-height: 1.4; } +.dt-stat-value { font-size: 28px; font-weight: 600; letter-spacing: -0.03em; line-height: 1; color: var(--ink); display: flex; align-items: baseline; gap: 6px; } +.dt-stat-unit { font-size: 12px; font-weight: 400; color: var(--ink-tertiary); letter-spacing: 0; } +.dt-stat.is-warn .dt-stat-value { color: var(--warn); } +.dt-stat.is-info .dt-stat-value { color: var(--info); } +.dt-stat.is-success .dt-stat-value { color: var(--success); } +@media (max-width: 900px) { .dt-stats { grid-template-columns: repeat(2, 1fr); } } + +/* Metric (st.metric) */ +.dt-metrics { display: flex; gap: 28px; flex-wrap: wrap; margin: 6px 0 14px; } +.dt-metric .label { font-size: 12.5px; color: var(--ink-tertiary); margin-bottom: 4px; } +.dt-metric .value { font-size: 26px; font-weight: 600; letter-spacing: -0.03em; color: var(--ink); line-height: 1; } +.dt-metric .delta { font-size: 12.5px; margin-top: 3px; } +.dt-metric .delta.up { color: var(--success); } +.dt-metric .delta.down { color: var(--danger); } + +/* =========================================================================== + Files card (home) — copied from _legacy.py + =========================================================================== */ +.dt-files-section-head { display: flex; align-items: baseline; justify-content: space-between; margin: 4px 0 10px; gap: 12px; } +.dt-files-section-head h2 { margin: 0; } +.dt-section-meta { font-size: 12.5px; color: var(--ink-tertiary); } +.dt-file-row { display: flex; align-items: center; gap: 12px; } +.dt-file-icon-chip { width: 28px; height: 28px; border-radius: var(--r-sm); background: var(--accent-fill); color: var(--accent); display: inline-flex; align-items: center; justify-content: center; flex-shrink: 0; } +.dt-file-icon-chip svg { width: 14px; height: 14px; stroke-width: 1.8; } +.dt-file-name { font-family: var(--font-mono); font-size: 13px; color: var(--ink); font-feature-settings: "ss02"; } +.dt-file-size { font-family: var(--font-mono); font-size: 12px; color: var(--ink-tertiary); font-feature-settings: "ss02"; } +.dt-file-add { + display: flex; align-items: center; justify-content: center; gap: 8px; + width: 100%; padding: 12px 16px; background: var(--surface-hover); + border: none; border-top: 1px dashed var(--border-strong); + border-radius: 0 0 var(--r-lg) var(--r-lg); cursor: pointer; + font-size: 13px; font-weight: 500; color: var(--ink-secondary); margin-top: 14px; +} +.dt-file-add:hover { background: var(--accent-fill); color: var(--accent); } +.dt-file-add svg { width: 14px; height: 14px; stroke-width: 2; } + +/* =========================================================================== + Findings panel — copied from _legacy.py + =========================================================================== */ +.dt-finding-group-head { + display: flex; align-items: center; gap: 12px; padding: 16px 22px; + border-bottom: 1px solid var(--border); background: var(--surface-hover); + margin: -16px -16px 1.2rem; border-radius: var(--r-lg) var(--r-lg) 0 0; + cursor: pointer; user-select: none; +} +.dt-finding-group-chevron { color: var(--ink-tertiary); font-family: "Material Symbols Outlined"; font-size: 20px; line-height: 1; flex-shrink: 0; } +.dt-severity-dot { width: 8px; height: 8px; border-radius: 50%; flex-shrink: 0; display: inline-block; } +.dt-severity-dot.warn { background: var(--warn); } +.dt-severity-dot.info { background: var(--info); } +.dt-severity-dot.error { background: var(--danger); } +.dt-severity-dot.success { background: var(--success); } +.dt-group-filename { font-family: var(--font-mono); font-size: 13.5px; font-weight: 500; color: var(--ink); font-feature-settings: "ss02"; } +.dt-group-counts { margin-left: auto; display: flex; align-items: center; gap: 8px; } +.dt-count-pill { display: inline-flex; align-items: center; padding: 3px 9px; border-radius: 999px; font-size: 11.5px; font-weight: 500; line-height: 1.4; white-space: nowrap; } +.dt-count-pill.warn { background: var(--warn-fill); color: var(--warn); } +.dt-count-pill.info { background: var(--info-fill); color: var(--info); } +.dt-count-pill.error { background: var(--danger-fill); color: var(--danger); } +.dt-count-pill.success { background: var(--success-fill); color: var(--success); } +.dt-finding-row { display: flex; align-items: flex-start; gap: 12px; padding: 12px 0; border-top: 1px solid var(--border); } +.dt-finding-row:first-of-type { border-top: none; } +.dt-finding-icon { width: 24px; height: 24px; border-radius: var(--r-sm); display: inline-flex; align-items: center; justify-content: center; flex-shrink: 0; } +.dt-finding-icon.warn { background: var(--warn-fill); color: var(--warn); } +.dt-finding-icon.info { background: var(--info-fill); color: var(--info); } +.dt-finding-icon.error { background: var(--danger-fill); color: var(--danger); } +.dt-finding-icon .dt-mi { font-family: "Material Symbols Outlined"; font-size: 16px; line-height: 1; } +.dt-finding-body { flex: 1; min-width: 0; } +.dt-finding-title { font-size: 14px; color: var(--ink); margin: 0 0 2px; line-height: 1.4; letter-spacing: -0.005em; } +.dt-finding-title strong { font-weight: 500; } +.dt-finding-meta { font-family: var(--font-mono); font-size: 12px; color: var(--ink-tertiary); line-height: 1.4; margin: 0; font-feature-settings: "ss02"; } + +/* Match-group review card (dedup) */ +.dt-match-card { background: var(--surface); border: 1px solid var(--border); border-radius: var(--r-lg); box-shadow: 0 1px 2px rgba(28,25,23,0.03); margin: 12px 0; overflow: hidden; } +.dt-match-head { background: var(--surface-hover); border-bottom: 1px solid var(--border); padding: 12px 16px; display: flex; align-items: center; gap: 12px; } +.dt-match-head .title { font-weight: 500; font-size: 14px; } +.dt-match-head .conf { margin-left: auto; } +.dt-match-body { padding: 14px 16px; } +.dt-keep-row { background: var(--success-fill); } +.dt-keep-tag { display: inline-flex; align-items: center; gap: 4px; background: var(--success-fill); color: var(--success); border-radius: 999px; padding: 2px 8px; font-size: 11px; font-weight: 500; } + +/* Progress bar */ +.dt-progress { height: 6px; background: var(--border); border-radius: 3px; overflow: hidden; margin: 10px 0; } +.dt-progress .bar { height: 100%; background: var(--ink); border-radius: 3px; } + +/* Tabs */ +.dt-tabs { display: flex; gap: 18px; border-bottom: 1px solid var(--border); margin: 10px 0 16px; } +.dt-tab { font-size: 13.5px; color: var(--ink-secondary); padding: 8px 2px; border-bottom: 2px solid transparent; cursor: pointer; } +.dt-tab.is-active { color: var(--ink); font-weight: 500; border-bottom-color: var(--accent); } + +/* Code block */ +.dt-code { background: var(--surface-hover); border: 1px solid var(--border); border-radius: var(--r-md); padding: 12px 14px; font-family: var(--font-mono); font-size: 12.5px; color: var(--ink); white-space: pre; overflow-x: auto; font-feature-settings: "ss02"; } + +@media (max-width: 1100px) { + .dt-footer { left: 0; } + .dt-sidebar { display: none; } + .dt-main { padding: 28px 24px 96px; } +} diff --git a/layout-review/assets/shell.js b/layout-review/assets/shell.js new file mode 100644 index 0000000..68aecc9 --- /dev/null +++ b/layout-review/assets/shell.js @@ -0,0 +1,74 @@ +/* Shared app chrome (sidebar nav + sticky footer) for the static layout + review pages. Mirrors src/gui/app.py:_build_navigation() ordering and + src/gui/components/_legacy.py:render_sticky_footer(). Each page sets + to mark the active nav item. */ +(function () { + // Sections + entries in the same order app.py registers them. + var NAV = [ + { label: "Analysis", items: [ + { id: "home", icon: "insert_chart_outlined", name: "File Analysis", href: "home.html" }, + { id: "11_reconciler", icon: "compare_arrows", name: "Reconcile Two Files", href: "11_reconciler.html" }, + ]}, + { label: "Data Cleaners", items: [ + { id: "04_missing_handler", icon: "help_outline", name: "Fix Missing Values", href: "04_missing_handler.html" }, + { id: "06_outlier_detector", icon: "insights", name: "Find Unusual Values", href: "06_outlier_detector.html", soon: true }, + { id: "02_text_cleaner", icon: "text_format", name: "Clean Text", href: "02_text_cleaner.html" }, + { id: "03_format_standardizer", icon: "format_list_bulleted", name: "Standardize Formats", href: "03_format_standardizer.html" }, + { id: "01_deduplicator", icon: "search", name: "Find Duplicates", href: "01_deduplicator.html" }, + { id: "08_validator_reporter", icon: "check_circle", name: "Quality Check", href: "08_validator_reporter.html", soon: true }, + ]}, + { label: "Transformations", items: [ + { id: "05_column_mapper", icon: "view_column", name: "Map Columns", href: "05_column_mapper.html" }, + { id: "07_multi_file_merger", icon: "account_tree", name: "Combine Files", href: "07_multi_file_merger.html", soon: true }, + { id: "10_pdf_extractor", icon: "picture_as_pdf", name: "PDF to CSV", href: "10_pdf_extractor.html" }, + ]}, + { label: "Automations", items: [ + { id: "09_pipeline_runner", icon: "auto_awesome", name: "Automated Workflows", href: "09_pipeline_runner.html" }, + ]}, + ]; + + var active = document.body.getAttribute("data-page") || ""; + + // ---- Sidebar ----------------------------------------------------------- + var sb = document.getElementById("dt-sidebar"); + if (sb) { + var html = '' + + '' + + 'D' + + '' + + 'UNALOGIX' + + 'DataTools' + + '' + + '' + + '' + + '
' + + '
Language
' + + '
English
' + + '
Core · 1,820 days left
' + + '
'; + sb.innerHTML = html; + } + + // ---- Sticky footer ----------------------------------------------------- + var ft = document.getElementById("dt-footer"); + if (ft) { + ft.innerHTML = + 'closeClose' + + '' + + 'DataTools · local-first · static layout preview'; + } +})(); diff --git a/layout-review/home.html b/layout-review/home.html new file mode 100644 index 0000000..252ebad --- /dev/null +++ b/layout-review/home.html @@ -0,0 +1,164 @@ + + + + + +Layout review — File Analysis (Home) + + + +
+ +
+
+ visibility + Static layout preview of the Home / File Analysis page, shown with three imported files in the post-analysis state. All pages → +
+
+ + +
+
+
+
D
+
+ UNALOGIX +

DataTools

+
+
+

Clean. Normalize. Transform.

+
+ + + + + + Runs 100% locally + +
+ + +
+

Files

+ +
+ + +
+
+ + + customers_export.csv + 2.1 MB +
+
+ + + q3_transactions.xlsx + 1.8 MB +
+
+ + + vendor_list.csv + 0.8 MB +
+ +
+ + +
+ + +
+ +
+ + +
+
+
Files analyzed
+
3
+
+
+
Total findings
+
14
+
+
+
Warnings
+
9 to review
+
+
+
Info
+
5 suggestions
+
+
+ + +
+
+ chevron_right + + customers_export.csv +
+ 6 warnings + 2 info +
+
+ +
+ priority_high +
+

312 duplicate rows across exact + near matches

+

column: email · Find Duplicates →

+
+
+
+ format_color_text +
+

1,204 cells with leading / trailing whitespace

+

columns: name, city · Clean Text →

+
+
+
+ event +
+

Mixed date formats in signup_date

+

3 formats detected · Standardize Formats →

+
+
+
+ + +
+
+ chevron_right + + q3_transactions.xlsx +
+ 3 warnings + 3 info +
+
+
+ + +
+
+ + vendor_list.csv +
+ no issues +
+
+
+ +
+
+
+ + + + diff --git a/layout-review/index.html b/layout-review/index.html new file mode 100644 index 0000000..880e1be --- /dev/null +++ b/layout-review/index.html @@ -0,0 +1,71 @@ + + + + + +DataTools — Layout Review + + + + +
+
+
+
+
D
+
+ UNALOGIX · LAYOUT REVIEW +

DataTools

+
+
+

Static HTML reproductions of every tool page, built from the live app's design tokens for human review of layouts.

+
+
+ +
+ info + These are faithful static mockups — not the running Streamlit app. Colors, type scale, spacing, and components are copied verbatim from theme.py and components/_legacy.py. Each page is shown in a representative populated state so the layout can be reviewed end-to-end. Fonts load from Google Fonts (needs network); the chrome (sidebar + footer) is shared across every page. +
+ +
Analysis
+
+ insert_chart_outlinedFile Analysis (Home)Import files, run the analyzer, browse per-file findings. + compare_arrowsReconcile Two FilesCompare two lists of transactions and flag what doesn't match. +
+ +
Data Cleaners
+
+ help_outlineFix Missing ValuesFind blank cells (even hidden ones) and fill them in or remove them. + insightsFind Unusual Values SoonSpot values that look wrong — too high, too low, or rule-breaking. + text_formatClean TextTrim extra spaces and strip out odd characters. + format_list_bulletedStandardize FormatsMake dates, phones, currency, and names look the same throughout. + searchFind DuplicatesFind rows that repeat, then keep one and remove the extras. + check_circleQuality Check SoonCheck your file against rules and export a PDF or Excel report. +
+ +
Transformations
+
+ view_columnMap ColumnsRename columns, reorder, and set each one as text, number, or date. + account_treeCombine Files SoonCombine several CSV or Excel files into one — even if columns differ. + picture_as_pdfPDF to CSVPull transactions out of bank-statement PDFs into a clean CSV file. +
+ +
Automations
+
+ auto_awesomeAutomated WorkflowsRun several tools in a row — save the steps and reuse them anytime. +
+
+ +