docs: design notes for future PDF→CSV tool
New ``docs/FUTURE-TOOLS.md`` captures post-launch tool ideas with a consistent shape — What / Why / Can we ship now / Approach / GUI sketch / Effort / Risks / Ship criteria. Resting place for things the new-tool freeze in ``PLAN.md`` §2.1 refuses to build but that keep coming up. First entry: **#10 PDF → CSV extractor** (bank statements et al.). Key facts captured: - **Current state**: no PDF infrastructure exists. Zero PDF dependencies in requirements.txt; zero PDF-touching code under ``src/``. The only "PDF" string in the codebase is the planned- output copy for the Quality Check tool, unrelated to extraction. - **Library picks**: pdfplumber as the extraction core (BSD-3, no native compiler, gives coordinate-aware text), Tesseract via pytesseract as the OCR fallback for scanned PDFs, streamlit-drawable-canvas as the region-picker component. - **GUI sketch**: user draws a header strip + a row template on a rendered page; the tool applies that template across N pages, saves the template by layout fingerprint for next month's statement, emits CSV. - **Effort phased A–E**: 3–4 weeks for a text-only MVP; 6–10 weeks for a polished version with multi-page template recall; +2–3 weeks if scanned-PDF OCR is required. - **Difficulty**: medium-hard. The pieces are well-trodden; the combination (region selection that persists across pages and across documents with similar layouts) is where the engineering goes. - **Ship criteria**: ≥1 paying customer + ≥3 paid or ≥5 demo emails asking for PDF extraction + the bookkeeper niche converting at least one customer first. None have fired. Cross-references added: - ``docs/REQUIREMENTS.md`` §11: pointer to FUTURE-TOOLS.md for parked tool ideas, with a one-paragraph summary of #10. - ``docs/PLAN.md`` §2.1: notes that the freeze parks future tools in FUTURE-TOOLS.md and explicitly names #10 as the current highest-pressure entry. - ``docs/NEXT-STEPS.md`` Phase 5 "what NOT to build" table: a new row for the PDF tool tied to the same ship-trigger language. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -58,6 +58,14 @@ buy" into "an automatable workflow you depend on." That conversion is
|
||||
what produces retention and word-of-mouth — the only marketing channel
|
||||
that scales under the no-network/no-touch constraint.
|
||||
|
||||
**Parked behind the freeze**: post-launch tool ideas are captured in
|
||||
`docs/FUTURE-TOOLS.md` with feasibility, GUI sketch, effort estimate,
|
||||
and ship criteria for each. Currently parked: **#10 PDF → CSV
|
||||
extractor** (bank statements et al.) — gated on a paying customer +
|
||||
≥3 paying customers or ≥5 demo emails explicitly asking for PDF
|
||||
extraction, with the bookkeeper niche converting at least one customer
|
||||
first. None of those triggers have fired yet.
|
||||
|
||||
### 2.2 The demo *is* the product. Make it embarrassingly good.
|
||||
|
||||
- Three persona-tagged sample datasets, not one generic CSV: Shopify
|
||||
|
||||
Reference in New Issue
Block a user