fix(pdf): consistent 2-decimal amount precision in display and CSV

User reported amounts losing trailing zeros — 4.50 rendering as
4.5, 1000.00 as 1000 — on the same statement. Classic float
display issue: Python's native ``repr(4.5)`` drops the
``.0``, and pandas / Streamlit happily show that
inconsistency cell-by-cell.

Two layers of fix, internal type stays ``float`` for arithmetic:

**Display.** ``st.column_config.NumberColumn(format="%.2f")``
applied programmatically to every ``amount_*`` column on the
data_editor. Every numeric amount now shows with exactly two
decimal places regardless of trailing zeros.

**CSV export.** Pandas' default float-to-CSV writer also drops
trailing zeros (the same issue an accountant would see when
opening the file in Excel). Before serialising, each amount
column is mapped through the new ``format_amount`` helper —
returns ``f"{v:.2f}"`` for numerics, empty string for
None/NaN/inf, ``str(value)`` for booleans (guards the
``True → "1.00"`` foot-gun since ``bool`` is an ``int``
subclass), and passes through any string the scanner kept
because parsing failed (e.g. ``(4.50)`` when parens-negative is
off — user can correct in the editor before re-exporting).

``format_amount`` lives in ``src/pdf_extract.py`` so it's
testable in isolation (the page module can't easily be unit
tested because of its Streamlit import chain). 8 new tests
cover the trailing-zeros case, negatives, None/empty,
string-passthrough, bool guard, NaN/inf, and the ``places``
parameter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-20 01:27:16 +00:00
parent 6f2ad57490
commit ad7c22d7fb
3 changed files with 91 additions and 1 deletions

View File

@@ -520,6 +520,35 @@ def _find_amount_tokens(
return out
def format_amount(value, places: int = 2) -> str:
"""Render an amount value as a fixed-precision string.
Floats lose trailing zeros in their native repr (``4.5`` is
not ``4.50``), and pandas / Streamlit happily show that
inconsistency cell-by-cell — confusing on a statement where
every number is currency. This formatter forces *places*
decimals so 4.5, 12.0 and 1000 all render with the same
precision.
Numeric → ``{value:.{places}f}``. None / empty / non-finite →
empty string. Strings (typically the raw token preserved when
``parse_amount`` couldn't decode the original) pass through
untouched so the user sees the source text in the editor.
Booleans pass through as ``str(value)`` — guards against ``True``
rendering as ``"1.00"`` because Python treats ``bool`` as ``int``.
"""
if value is None or value == "":
return ""
if isinstance(value, bool):
return str(value)
if isinstance(value, (int, float)):
import math
if isinstance(value, float) and not math.isfinite(value):
return ""
return f"{value:.{places}f}"
return str(value)
def format_date(iso_str: str | None, fmt: str = "%Y%m%d") -> str:
"""Convert an ISO ``YYYY-MM-DD`` date string to *fmt*.
@@ -973,6 +1002,7 @@ __all__ = [
"extract_pages",
"extract_pages_auto",
"extract_statement_metadata",
"format_amount",
"format_date",
"ocr_available",
"parse_amount",