fix(pdf): drop statement_period_start/end columns from output
User asked to remove them — the two columns repeated the same value on every row from a given statement, took up screen space in the editor, and offered limited value once the date column already carries the inferred full date. What's kept: - ``account_number`` — still stamped onto every row so multi- statement CSVs are self-attributing - ``extract_statement_metadata`` — still runs every scan because ``period_end`` is the source of the year inference that binds Chase-style short ``01/13`` dates to ``20250113`` - ``_extract_statement_period`` and its tests — period detection itself isn't going anywhere, just its appearance in the output rows What's removed: - ``record["statement_period_start"]`` / ``record["statement_period_end"]`` assignments in ``scan_pdf_for_transactions`` - The two columns from the page's column-ordering setup - Tests pinning their presence; replaced with assertions that they're explicitly absent Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -782,15 +782,15 @@ def scan_pdf_for_transactions(
|
||||
"page": 1,
|
||||
"raw": "01/15/2026 Coffee $4.50",
|
||||
"account_number": "****1234", # from header
|
||||
"statement_period_start": "20260101",
|
||||
"statement_period_end": "20260131",
|
||||
}
|
||||
|
||||
Header metadata (``account_number`` /
|
||||
``statement_period_start`` / ``statement_period_end``) is
|
||||
extracted once per PDF and stamped onto every detected row.
|
||||
That way a multi-statement CSV remains attributable per row
|
||||
when it's reshaped or imported elsewhere.
|
||||
Account number is extracted from the statement header once
|
||||
per PDF and stamped onto every detected row so the CSV is
|
||||
self-attributing when statements are combined. The statement
|
||||
period IS detected (used internally for year inference on
|
||||
short dates like "01/13") but isn't surfaced as a per-row
|
||||
column — the inferred year already lives in the ``date``
|
||||
field.
|
||||
|
||||
Short dates without a year (``01/13``, ``Jan 13``) are bound
|
||||
to the year of the statement period's end before formatting.
|
||||
@@ -915,15 +915,14 @@ def scan_pdf_for_transactions(
|
||||
if not _has_real_transaction_amount(record):
|
||||
continue
|
||||
|
||||
# Stamp the header metadata onto every kept row so the
|
||||
# CSV is self-attributing.
|
||||
# Stamp the account number onto every kept row so the
|
||||
# CSV is self-attributing when statements are combined.
|
||||
# The period start/end aren't surfaced per row — they're
|
||||
# used only for the year-inference fallback above
|
||||
# (binding short dates like "01/13" to the statement's
|
||||
# year) but downstream the date column already carries
|
||||
# the inferred full date.
|
||||
record["account_number"] = metadata["account_number"] or ""
|
||||
record["statement_period_start"] = format_date(
|
||||
metadata["period_start"], output_date_format,
|
||||
)
|
||||
record["statement_period_end"] = format_date(
|
||||
metadata["period_end"], output_date_format,
|
||||
)
|
||||
|
||||
out_rows.append(record)
|
||||
prev = record
|
||||
|
||||
Reference in New Issue
Block a user