feat(pdf): visual region picker on rendered sample page
Phase 5/6. Adds a "Visual picker" tab as the first stop in the template-build flow. The sample PDF page is rasterized with ``pypdfium2`` (capped at ~900px wide for sensible display), and ``streamlit-drawable-canvas`` overlays drawing tools on top. UX: - **Line mode** — drag short (roughly vertical) strokes where you want columns to split. Each stroke's x-midpoint becomes one boundary in PDF point coordinates. - **Rect mode** — drag a rectangle around the transactions table; bbox is preserved on the template as ``visual.table_bbox`` for round-trip, future use as a hard crop region. - **Transform mode** — move/resize already-drawn shapes after the fact. Round-trip: re-entering Build mode with an existing template seeds the canvas with full-height vertical lines for every boundary already on the template, plus the saved bbox if any, so editing-after-save matches the user's mental model. Coordinate translation: the canvas reports pixel positions; we divide by the renderer's pixels-per-PDF-point scale to get back to PDF coordinates that ``apply_template`` already expects. No template-schema change required — the boundaries the picker writes are the same list the text-input editor wrote in commit 3, just sourced visually. New helper in the extraction module: - ``render_page_image(pdf_bytes, page_no, target_width=900)`` — rasterize a single 1-indexed page to a PIL image; returns ``(image, scale)`` for coordinate translation. The text-input boundary editor in the Columns tab remains as a fallback for power users / keyboard-only workflows and for copy-paste from spreadsheet-derived x-positions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -512,6 +512,39 @@ def ocr_available() -> tuple[bool, str]:
|
||||
return True, ""
|
||||
|
||||
|
||||
def render_page_image(
|
||||
pdf_bytes: bytes,
|
||||
page_no: int,
|
||||
*,
|
||||
target_width: int = 900,
|
||||
) -> tuple["Any", float]:
|
||||
"""Rasterize one page of *pdf_bytes* (1-indexed) to a PIL image.
|
||||
|
||||
Returns ``(pil_image, scale)`` where ``scale`` is the
|
||||
pixels-per-PDF-point factor. The caller uses ``scale`` to map
|
||||
canvas coordinates (pixels) back to PDF coordinates (points).
|
||||
|
||||
``target_width`` caps the rendered width so the image is a
|
||||
sensible size for the visual picker — bank statements at 100%
|
||||
can be 800–1200 pts wide; we want ~900px on screen.
|
||||
"""
|
||||
import pypdfium2 as pdfium
|
||||
|
||||
pdf = pdfium.PdfDocument(pdf_bytes)
|
||||
try:
|
||||
idx = max(0, min(page_no - 1, len(pdf) - 1))
|
||||
page = pdf[idx]
|
||||
# Width in PDF points → pixels-per-point scale.
|
||||
pdf_width = page.get_width()
|
||||
scale = target_width / pdf_width if pdf_width else 2.0
|
||||
# Cap scale so big A3-style scans don't blow up.
|
||||
scale = min(scale, 3.0)
|
||||
bitmap = page.render(scale=scale)
|
||||
return bitmap.to_pil(), scale
|
||||
finally:
|
||||
pdf.close()
|
||||
|
||||
|
||||
def ocr_pdf_to_pages(pdf_bytes: bytes, dpi: int = 200) -> list[Page]:
|
||||
"""Run Tesseract over each page of *pdf_bytes* and return a
|
||||
word-position-rich ``Page`` list, parallel to ``extract_pages``.
|
||||
|
||||
Reference in New Issue
Block a user