docs: design notes for future PDF→CSV tool

New ``docs/FUTURE-TOOLS.md`` captures post-launch tool ideas with a
consistent shape — What / Why / Can we ship now / Approach / GUI
sketch / Effort / Risks / Ship criteria. Resting place for things
the new-tool freeze in ``PLAN.md`` §2.1 refuses to build but that
keep coming up.

First entry: **#10 PDF → CSV extractor** (bank statements et al.).

Key facts captured:

- **Current state**: no PDF infrastructure exists. Zero PDF
  dependencies in requirements.txt; zero PDF-touching code under
  ``src/``. The only "PDF" string in the codebase is the planned-
  output copy for the Quality Check tool, unrelated to extraction.
- **Library picks**: pdfplumber as the extraction core (BSD-3,
  no native compiler, gives coordinate-aware text), Tesseract via
  pytesseract as the OCR fallback for scanned PDFs,
  streamlit-drawable-canvas as the region-picker component.
- **GUI sketch**: user draws a header strip + a row template on a
  rendered page; the tool applies that template across N pages,
  saves the template by layout fingerprint for next month's
  statement, emits CSV.
- **Effort phased A–E**: 3–4 weeks for a text-only MVP; 6–10
  weeks for a polished version with multi-page template recall;
  +2–3 weeks if scanned-PDF OCR is required.
- **Difficulty**: medium-hard. The pieces are well-trodden; the
  combination (region selection that persists across pages and
  across documents with similar layouts) is where the engineering
  goes.
- **Ship criteria**: ≥1 paying customer + ≥3 paid or ≥5 demo
  emails asking for PDF extraction + the bookkeeper niche
  converting at least one customer first. None have fired.

Cross-references added:

- ``docs/REQUIREMENTS.md`` §11: pointer to FUTURE-TOOLS.md for
  parked tool ideas, with a one-paragraph summary of #10.
- ``docs/PLAN.md`` §2.1: notes that the freeze parks future tools
  in FUTURE-TOOLS.md and explicitly names #10 as the current
  highest-pressure entry.
- ``docs/NEXT-STEPS.md`` Phase 5 "what NOT to build" table: a new
  row for the PDF tool tied to the same ship-trigger language.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-17 01:52:42 +00:00
parent c73d716d06
commit ee0b1f6f6b
4 changed files with 264 additions and 0 deletions

View File

@@ -269,6 +269,7 @@ moves until $5k/mo MRR:
| | Why locked |
|---|---|
| ❌ More tools (0608) | `PLAN.md` §2.1 distribution-gate. Tool 09 was the exception; no others until first paid customer + one external review. |
| ❌ Tool #10 PDF → CSV (the most-asked-for adjacency) | Parked in `docs/FUTURE-TOOLS.md` with full design + 34 wk MVP / 610 wk polished estimate. Ship trigger: paying customer + ≥3 paid or ≥5 demo emails asking for PDF + the bookkeeper niche converting first. None have fired yet. |
| ❌ SaaS pivot | `DECISIONS.md` §4 — recurring infra conflicts with the lifestyle constraint. |
| ❌ Live chat / sales calls | `DECISIONS.md` §1 #8 — no-touch is locked until $5k/mo. |
| ❌ Custom integrations / one-off consulting | Breaks "build once, sell many." |