fix(pdf): drop statement_period_start/end columns from output
User asked to remove them — the two columns repeated the same value on every row from a given statement, took up screen space in the editor, and offered limited value once the date column already carries the inferred full date. What's kept: - ``account_number`` — still stamped onto every row so multi- statement CSVs are self-attributing - ``extract_statement_metadata`` — still runs every scan because ``period_end`` is the source of the year inference that binds Chase-style short ``01/13`` dates to ``20250113`` - ``_extract_statement_period`` and its tests — period detection itself isn't going anywhere, just its appearance in the output rows What's removed: - ``record["statement_period_start"]`` / ``record["statement_period_end"]`` assignments in ``scan_pdf_for_transactions`` - The two columns from the page's column-ordering setup - Tests pinning their presence; replaced with assertions that they're explicitly absent Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -427,7 +427,7 @@ else:
|
||||
|
||||
# Order columns so the user-facing fields are leftmost; raw +
|
||||
# internals are last and easy to scroll past or unselect at
|
||||
# download time. Statement metadata sits with the transaction
|
||||
# download time. ``account_number`` sits with the transaction
|
||||
# detail since it's per-row context an accountant typically
|
||||
# wants alongside the amounts.
|
||||
front = [
|
||||
@@ -435,11 +435,7 @@ else:
|
||||
"description",
|
||||
]
|
||||
amount_cols = sorted(c for c in df.columns if c.startswith("amount_"))
|
||||
metadata_cols = [
|
||||
"account_number",
|
||||
"statement_period_start",
|
||||
"statement_period_end",
|
||||
]
|
||||
metadata_cols = ["account_number"]
|
||||
tail = ["source_file", "page", "raw"]
|
||||
ordered = [
|
||||
c for c in front + amount_cols + metadata_cols + tail
|
||||
|
||||
Reference in New Issue
Block a user