User reported the column-visual approach is too brittle for real
bank statements: column-x-positions saved against a sample page
don't survive layout drift between months (statement A has
columns at x=300, statement B drifted to x=320), and a saved
template can only realistically work for one statement's
specific render. The fundamental fix is to stop depending on
coordinates at all.
**Row-heuristic mode** finds transaction rows by pattern: any
line with a date token + N amount tokens IS a transaction. Date
patterns (US slash / EU slash / ISO / "Jan 15, 2026" / etc.) and
amount patterns (currency, parens-negative, thousands grouping)
are matched against word text — no x-positions involved.
The full pipeline:
1. ``find_transaction_rows`` clusters words into rows and scans
each line for date + amount tokens.
2. Multi-line descriptions still attach to the previous row via
the no-date-no-amount continuation rule.
3. Amount shapes drive interpretation: ``single`` /
``txn_balance`` / ``debit_credit`` / ``debit_credit_balance``.
4. ``_infer_amount_column_centers`` clusters amount x-midpoints
ACROSS ALL detected rows to find natural column groupings —
so debit-vs-credit assignment for single-amount lines works
without the user marking anything on screen.
``apply_template`` is now a dispatch over ``template["mode"]``:
- ``mode="row_heuristic"`` (default for new templates) — the new
pipeline.
- ``mode="column_visual"`` — the existing pipeline, kept under
``_apply_template_column_visual`` for v1 templates and the
Advanced fallback.
18 new tests cover: date detection (US slash, two-digit year,
ISO, month-name, missing); amount-token finding (currency,
parens, pure text, bare-year rejection); column-center inference
(clear two-column case, empty input); end-to-end on synthetic
Page objects with all four amount shapes; the critical
layout-drift test that proves the same template works on pages
of different sizes / different absolute x-positions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>