New ``docs/FUTURE-TOOLS.md`` captures post-launch tool ideas with a
consistent shape — What / Why / Can we ship now / Approach / GUI
sketch / Effort / Risks / Ship criteria. Resting place for things
the new-tool freeze in ``PLAN.md`` §2.1 refuses to build but that
keep coming up.
First entry: **#10 PDF → CSV extractor** (bank statements et al.).
Key facts captured:
- **Current state**: no PDF infrastructure exists. Zero PDF
dependencies in requirements.txt; zero PDF-touching code under
``src/``. The only "PDF" string in the codebase is the planned-
output copy for the Quality Check tool, unrelated to extraction.
- **Library picks**: pdfplumber as the extraction core (BSD-3,
no native compiler, gives coordinate-aware text), Tesseract via
pytesseract as the OCR fallback for scanned PDFs,
streamlit-drawable-canvas as the region-picker component.
- **GUI sketch**: user draws a header strip + a row template on a
rendered page; the tool applies that template across N pages,
saves the template by layout fingerprint for next month's
statement, emits CSV.
- **Effort phased A–E**: 3–4 weeks for a text-only MVP; 6–10
weeks for a polished version with multi-page template recall;
+2–3 weeks if scanned-PDF OCR is required.
- **Difficulty**: medium-hard. The pieces are well-trodden; the
combination (region selection that persists across pages and
across documents with similar layouts) is where the engineering
goes.
- **Ship criteria**: ≥1 paying customer + ≥3 paid or ≥5 demo
emails asking for PDF extraction + the bookkeeper niche
converting at least one customer first. None have fired.
Cross-references added:
- ``docs/REQUIREMENTS.md`` §11: pointer to FUTURE-TOOLS.md for
parked tool ideas, with a one-paragraph summary of #10.
- ``docs/PLAN.md`` §2.1: notes that the freeze parks future tools
in FUTURE-TOOLS.md and explicitly names #10 as the current
highest-pressure entry.
- ``docs/NEXT-STEPS.md`` Phase 5 "what NOT to build" table: a new
row for the PDF tool tied to the same ship-trigger language.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two coupled hardening upgrades.
1. Asymmetric signatures (HMAC → Ed25519)
The previous HMAC scheme used a symmetric secret that any motivated
reverse engineer could pull out of the shipped binary and use to
mint blobs for any tier / name / email. With Ed25519, the binary
ships only the public verification key; the signing key never
leaves the seller's environment, so binary compromise no longer
yields forgery.
- src/license/crypto.py rewritten around
cryptography.hazmat.primitives.asymmetric.ed25519. Same public
API surface (sign/verify/encode_blob/decode_blob), same canonical
JSON encoding — drop-in for the manager / cli / GUI layers.
- DATATOOLS_LICENSE_PRIVKEY (seller-side) and
DATATOOLS_LICENSE_PUBKEY (build-time) env vars supply the keys;
the in-source dev keypair (src/license/_dev_keypair.py)
deterministically derives from a seed phrase for repro builds and
tests.
- Blob prefix bumped DTLIC1: → DTLIC2:. Decoding a DTLIC1 blob
surfaces a clear "old format" error rather than a confusing
signature mismatch.
- scripts/generate_keypair.py mints fresh production keypairs for
the seller (run once, stash the private key offline). Adds
cryptography>=41,<46 to requirements.txt (was an undeclared
transitive dep).
2. Production-safe tripwire
assert_production_safe() refuses to boot a frozen / shipped build
when either:
- DATATOOLS_DEV_MODE=1 is set (would unconditionally bypass every
license check — fine in source/test but catastrophic in a buyer
install).
- The active verification key is still the embedded dev key (the
build pipeline forgot to set DATATOOLS_LICENSE_PUBKEY).
No-op in source / pytest runs (sys.frozen is unset) so test
fixtures and dev workflows keep working without ceremony. Called
from src/cli_license_guard.guard() and from hide_streamlit_chrome
— so it fires on every CLI invocation and every GUI page load.
Tests: 49 license-layer unit tests (was 40); added Ed25519
wrong-key rejection, dev-keypair seed pin, blob v2 prefix, v1
rejection with clear message, and four production-safe scenarios
(no-op in source, fires on DEV_MODE in frozen, fires on dev key in
frozen, passes in frozen with prod pubkey). Total: 2024 → 2033.
Docs (REQUIREMENTS §17a, DEVELOPER licensing recipe, DECISIONS
§9b + decision log) updated with the new threat-model write-up,
key-storage workflow, and tripwire behaviour.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two coupled changes:
1. Lite tier
- New Tier.LITE in src/license/schema.py.
- FEATURES_BY_TIER[Tier.LITE] = {Deduplicator, Text Cleaner,
Format Standardizer}. The three universally-useful tools that
cover the most common bookkeeping / RevOps / Klaviyo prep
workflows. Other six tools require Core.
- i18n: license.tier_lite, license.feature_locked_title,
license.feature_locked_body, license.upgrade_link,
license.status_locked (en + es).
- Per-tool feature gate at every GUI tool page
(require_feature_or_render_upgrade) and every tool CLI
(guard(feature=...)). A locked tool renders an upgrade
prompt + Manage-license button (GUI) or exits with code 2
(CLI).
- Home grid: tool cards the user's tier doesn't unlock get a
red 🔒 Locked badge in place of green Ready.
2. Trial removed
- Activation form's "Start 1-year trial" button removed.
- license_cli's `trial` subcommand removed.
- activation.trial_button / activation.trial_help i18n keys
dropped (pack parity test stays green).
- Tier.TRIAL stays in the enum (back-compat with any field-
tested trial licenses); LicenseManager._mint stays internal
for tests and the seller's key generator.
- Decision logged in DECISIONS §9b: a 1-year all-features
trial undercuts paid Lite; paid-only keeps tier economics
clean.
Tests (+29 net): +17 Lite-tier unit/guard tests + 13 Lite-tier
GUI tests + 1 trial-absent assertion - 2 trial CLI tests - 1
trial GUI button test. Total: 1995 → 2024.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- USER-GUIDE EN + ES gain a §0 "First launch — activation" section
covering paid blob activation, 1-year trial, renewal, file
location, and device-swap.
- REQUIREMENTS §17a "Licensing" — storage path, activation model,
lifetime, tier list, dev bypass env var. Test count: 1995.
- DEVELOPER gains a "Licensing" recipe in the Extension recipes
section: public API, feature-flag add, tier add, minting via the
creator-only script.
- DECISIONS §9b — log the offline-HMAC choice with the threat-model
trade-off (motivated piracy not stopped; honor-system + 30-day
refund covers casual sharing).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
REQUIREMENTS §16 updates the test count (1777 → 1916) and breaks out
the GUI subset. DEVELOPER's Tests section gains the 'gui' marker
recipes and the new tests/gui/ tree under test layout, plus a short
'GUI test layer' explainer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
REQUIREMENTS §10 carries the new measured numbers and the dedup
blocking trade-off note. DEVELOPER known-limitations is rewritten to
reflect that exact-only dedup is now O(n), fuzzy-blocking is opt-in,
and column-parallelism is scaffolding for free-threaded Python.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
REQUIREMENTS §10 reflects the post-optimisation numbers and the
known O(n²) dedup match step (flagged for a future blocking pass).
en/es upload-limit copy and uploader help now say 1.5 GB.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
README + USER-GUIDE describe the sidebar picker and current coverage
(home + shared chrome, per-tool bodies pending). DEVELOPER gains a
how-to for adding packs and keys with the parity-test guarantee.
TECHNICAL §10b records the in-house-JSON architecture and locks in the
no-gettext decision (also logged in DECISIONS). REQUIREMENTS reflects
the new interface surface and updated test count. COPY.md adds a
"Language claim" slot so landing/email work can pick it up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New docs/REQUIREMENTS.md catalogs every shipped capability in 17 numbered
categories — file handling, input/output encodings, delimiters, line
endings, detectors, finding schema, confidence tiers, decisions,
performance targets (1 GB), tools, gate behavior, interfaces, platforms,
deps, test coverage, privacy. Linked from README and USER-GUIDE so a
buyer / integrator can scan compliance in under a minute.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>