fix(pdf): consistent 2-decimal amount precision in display and CSV

User reported amounts losing trailing zeros — 4.50 rendering as
4.5, 1000.00 as 1000 — on the same statement. Classic float
display issue: Python's native ``repr(4.5)`` drops the
``.0``, and pandas / Streamlit happily show that
inconsistency cell-by-cell.

Two layers of fix, internal type stays ``float`` for arithmetic:

**Display.** ``st.column_config.NumberColumn(format="%.2f")``
applied programmatically to every ``amount_*`` column on the
data_editor. Every numeric amount now shows with exactly two
decimal places regardless of trailing zeros.

**CSV export.** Pandas' default float-to-CSV writer also drops
trailing zeros (the same issue an accountant would see when
opening the file in Excel). Before serialising, each amount
column is mapped through the new ``format_amount`` helper —
returns ``f"{v:.2f}"`` for numerics, empty string for
None/NaN/inf, ``str(value)`` for booleans (guards the
``True → "1.00"`` foot-gun since ``bool`` is an ``int``
subclass), and passes through any string the scanner kept
because parsing failed (e.g. ``(4.50)`` when parens-negative is
off — user can correct in the editor before re-exporting).

``format_amount`` lives in ``src/pdf_extract.py`` so it's
testable in isolation (the page module can't easily be unit
tested because of its Streamlit import chain). 8 new tests
cover the trailing-zeros case, negatives, None/empty,
string-passthrough, bool guard, NaN/inf, and the ``places``
parameter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-20 01:27:16 +00:00
parent 6f2ad57490
commit ad7c22d7fb
3 changed files with 91 additions and 1 deletions

View File

@@ -25,6 +25,7 @@ from src.gui.components import hide_streamlit_chrome, render_sticky_footer
from src.pdf_extract import (
PdfDependencyMissing,
diagnose_pdf_lines,
format_amount,
ocr_available,
scan_pdf_for_transactions,
)
@@ -480,6 +481,18 @@ else:
column_config["source_file"] = st.column_config.TextColumn(
"source_file", disabled=True,
)
# Force 2-decimal display on every amount column. Without this,
# Streamlit / Pandas show floats with their raw repr ("4.5",
# "12.0", "1000") and the precision looks inconsistent across
# rows that all came from the same statement. Internal dtype
# stays float for arithmetic accuracy; only the rendering and
# CSV-export formatting force two-place precision.
for amt_col in (c for c in df.columns if c.startswith("amount_")):
column_config[amt_col] = st.column_config.NumberColumn(
amt_col,
format="%.2f",
help="Two-decimal currency amount.",
)
edited = st.data_editor(
df,
@@ -511,7 +524,16 @@ else:
help="``page`` and ``raw`` are kept off by default; "
"tick them if you want them in the file.",
)
export = selected[keep] if keep else selected
export = (selected[keep] if keep else selected).copy()
# Coerce every amount column to a fixed 2-decimal string
# before serialising. Pandas' default float-to-CSV
# writer drops trailing zeros (4.50 → 4.5) which an
# accountant immediately notices in Excel; preserving
# the precision is the whole point of this commit.
for amt_col in (
c for c in export.columns if c.startswith("amount_")
):
export[amt_col] = export[amt_col].map(format_amount)
csv_bytes = export.to_csv(index=False).encode("utf-8")
st.download_button(
f"Download {len(export):,} rows as CSV",