Sweep follow-up to 93e43fc. Display labels now consistent across docs,
landing pages, CLI output, code comments, docstrings, and test prose.
Five parallel surfaces touched:
- docs (EN + ES): README, USER-GUIDE, CLI-REFERENCE, and 11 internal
design/planning docs
- landing pages: index + bookkeeper/revops/shopify-pet
- src: CLI module docstrings, _TOOL_DISPLAY dicts in cli_analyze.py
and gui/components/_legacy.py, core module headers, every tool
page's module docstring
- tests: class/method/module docstrings and section-header comments
- test-cases READMEs
Page slugs (1_Deduplicator etc.), tool_id strings (01_deduplicator
etc.), Python class names (TestDeduplicatorWorkflow, FeatureFlag.*),
URL paths, anchor IDs, CSS classes, and asset filenames were left
intact since they're code identifiers / structural references.
All 2033 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
16 KiB
Decisions
Creator-only. Locked criteria, scoring rubric, decision log. Version: 1.6 · Updated: 2026-05-01
1. Locked operating criteria
Constraints
- Cash budget ≤ $1,200/mo recurring. No external funding.
- Time ≤ 10 hr/wk. Build-once assets preferred.
- Skill set: database design, data pipelines, programming. Every opportunity must leverage these.
- Network: none. Zero reliance on personal connections.
Targets
- First revenue: 15 days preferred, 90 days hard stop.
- Revenue ceiling: tiered (BUSINESS §6). Realistic 12-mo: $5k/mo.
- Lifestyle cashflow goal. No saleable-asset exit required.
- Distribution: fully async, no-touch. Revisit at $5k/mo.
- Work pattern: deep work + recovery. No real-time on-call.
Goals
- Escape 9-5 W2 employment without stability concerns. (Primary)
- Free up time for retirement lifestyle, optional enjoyable work. (Secondary)
Internal contradictions
"Fully async + 15-day-to-revenue + no network" is tight but workable. Caveat in BUSINESS §8: revisit async at $5k/mo.
2. Scoring rubric
Each candidate scored 1-5 on 6 dimensions. Total /30 → verdict.
| Dimension | What it measures |
|---|---|
| Fit to locked criteria | Direct match to constraints 1-4 + targets 5-9. Any 1 = hard kill. |
| Demand durability | Structural shift vs. trend peak. Pays in 3 yr? |
| Defensibility | What stops the next entrant. |
| Unit economics realism | CAC, payback, gross margin, working capital. |
| Operator fit | Skills, capital, time, stomach. |
| Exit / cash-flow optionality | Multiple revenue paths. |
Verdict: PURSUE / INVESTIGATE / PASS / KILL.
v1.1 calibration: original scoring inflated unit economics by treating ~100% gross margin as 5/5 without accounting for CAC under "no network." Honest score: 7.0-7.5/10 (was 8.7). Strategy still sound; optimism deflated.
3. Candidate evaluation
| Rank | Candidate | Score | Verdict |
|---|---|---|---|
| 1 | Niche Python Automation Script Bundles | 8.7/10 → 7.5/10 (calibrated) | PURSUE |
| 2 | Curated Datasets | 8.7/10 | PURSUE (deferred) |
| 3 | Hosted Data Pipeline Micro-Tool | 8.3/10 | INVESTIGATE |
Why #1 over #2: faster path to first revenue (digital download vs. ongoing curation pipeline). Lower ongoing maintenance. Direct programming leverage. Better fit for "build once, sell many."
Rejected: Notion Templates (weak skill leverage), Query Optimizer SaaS (recurring infra conflicts with lifestyle/maintenance constraint).
4. Platform model
| Model | Verdict |
|---|---|
| Standalone tools, dual CLI + GUI (chosen) | CHOSEN (revised v1.2). Build once, no hosting, no SaaS support. GUI captures non-tech buyer; CLI captures power users. |
| SaaS web app | Rejected. Recurring hosting + support conflicts with minimal-maintenance constraint. |
| CLI-only | Rejected (revised v1.2). Wrong fit for non-tech buyer; produces refunds. |
| Browser extension | Rejected. Sandbox limits, wrong tool for files. |
| Notion / Airtable templates | Rejected. Doesn't leverage programming. |
v1.2 rationale:
- Buyer persona ("hates Excel work but can't code") won't learn a CLI. Refunds at this price.
- Find Duplicates needs interactive review — not viable in pure CLI.
- Dual interface keeps CLI for automation without sacrificing primary buyer surface.
4a. Functional scope principle (v1.2)
Decision: each script ships complete coverage of the workflow it names, including features Excel does free.
Why: one-stop shopping is the value. Forcing buyers to bounce between this product and Excel/OpenRefine for parts of one task defeats the value prop.
Anti-rule: not license to scope-creep. Boundary = the named workflow. Dedup includes normalization + survivor + audit. NOT format conversion or charting (those belong to other scripts).
4b. UX standards for GUI (v1.2 — load-bearing)
| Standard | What it means |
|---|---|
| Works out of the box | Drop file → useful result, zero config. |
| Sensible defaults visible | Every option has a default that works for the common case. |
| Progressive disclosure | Default view = file uploader + go button + results. Advanced in expander panes. |
| Plain-English labels | "Find duplicates" not "Apply Levenshtein at 0.85". Tooltips carry technical detail. |
| Visible safety | Dry-run / preview by default. Original input never modified. |
| No multi-step setup | Single window for the basic task. |
| Errors name problem + fix | "Column 'email' not found. Available: name, phone. Did you mean 'phone'?" not KeyError. |
| Identical core to CLI | No drift. Anything CLI does, GUI does (minus interactive review = GUI-natural). |
"Intuitive enough" test: a non-technical user who's never seen the tool can complete the lead use case on first launch with no docs read.
4c. GUI framework: Streamlit (v1.3)
| Framework | Verdict |
|---|---|
| Streamlit | CHOSEN |
| Tkinter + CustomTkinter | Rejected — maintainer absent (last release Jan 2024, ~28 mo). Snyk: Inactive. |
| Plain Tkinter | Rejected — UX gap unacceptable at $49-79 in 2026 without heavy hand-styling. |
| Flet | Rejected — ecosystem too young for build-once-maintain-for-years. |
| PySide6 / Qt | Rejected — overkill, steepest learning curve, biggest bundles. |
| NiceGUI | Rejected — same browser tradeoff as Streamlit, smaller community + ecosystem. |
Scored matrix (1-5, 5 = best for this product)
| Dimension | Tk | Tk+CTk | Streamlit | Flet | PySide6 | NiceGUI |
|---|---|---|---|---|---|---|
| Non-tech UX | 1 | 3 | 4 | 4 | 5 | 4 |
| Native window (no browser) | 5 | 5 | 1 | 5 | 5 | 1 |
| Build speed v1 | 3 | 3 | 5 | 4 | 2 | 4 |
| Build speed per feature | 3 | 3 | 5 | 4 | 2 | 4 |
| PyInstaller compat | 5 | 4 | 2 | 3 | 3 | 2 |
| Bundle size (smaller better) | 5 | 4 | 1 | 3 | 2 | 1 |
| Maintenance burden | 4 | 3 | 4 | 3 | 4 | 3 |
| Ecosystem maturity | 5 | 3 | 4 | 2 | 5 | 3 |
| Solo-dev learning curve | 4 | 4 | 5 | 4 | 2 | 4 |
| Drop-file-see-result fit | 3 | 3 | 5 | 4 | 4 | 5 |
| Total /50 | 38 | 37 | 38 | 36 | 34 | 35 |
Sums lie. Tk ties Streamlit but loses on look-and-feel + data-app fit (the dimensions that matter). Verdict is per-dimension, not total.
Why Streamlit won
- Fastest build velocity — "drop CSV, see results" is native. Tables, file uploads, dataframes are first-class. Compounds across 9-script lead + 5 future bundles.
- Lowest maintenance burden — active, large community, mature ecosystem. Bugs fixed upstream.
- Hosted demo as marketing asset — Streamlit Community Cloud (free) lets the landing page offer "Try free in browser" with sample data. Tk-family options can't.
- Future SaaS optionality — same code runs unchanged on a hosted server (modulo auth + per-user isolation). Tk would require rewrite. Zero implementation now, meaningful flexibility later.
Tradeoffs accepted
- Browser-launch UX — buyer double-click → default browser opens to localhost. Mitigated: install email + welcome dialog + persistent in-app message. Pywebview wrap is the v1.1 fallback if confusing.
- Bundle size — ~300-500 MB vs. ~50 MB for Tk. Acceptable in 2026.
- PyInstaller fiddly first time — budget 1-3 days. Reusable across all bundles after.
- Streamlit's session re-run model is unusual but manageable.
5. Distribution
Primary: Marketplaces (Gumroad, Lemon Squeezy). Built-in traffic, async payments/delivery/refunds, listing in days.
Own-domain SEO: long-term compounding asset (6-18 mo), not early-stage channel.
v1.3 addition: hosted browser demo as secondary distribution + primary conversion lever.
6. Pricing
$49-79/bundle · $149 full suite (when 3+ exist).
- < $99 → no procurement friction for solo operators.
-
$99 → triggers SaaS-support expectations conflicting with no-touch.
- $49-79 → right unit economics + impulse-purchase territory.
7. Decision log
| Date | Decision | Rationale |
|---|---|---|
| Apr 2026 | Lock operating criteria | Project kickoff |
| Apr 2026 | Python Bundles selected | Highest score |
| Apr 2026 | Excel/CSV Cleaning as lead bundle | Highest pain, broadest demand |
| Apr 2026 (v1.1) | PyInstaller cross-platform pipeline | Eliminates "install Python" friction |
| Apr 2026 (v1.1) | Apple Developer Program ($99/yr) | Required for clean macOS install |
| Apr 2026 (v1.1) | Tiered revenue targets ($5k @ 12mo, $10k @ 24mo) | Original $50k unsupported by evidence |
| Apr 2026 (v1.1) | Tag "no-touch" for revisit at $5k/mo | Strict adherence pre-PMF may cost more revenue than it saves |
| Apr 28 (v1.2) | Functional scope: include workflow features even if free elsewhere | One-stop shopping is the value prop. See §4a. |
| Apr 28 (v1.2) | Promote GUI to required at v1; ship dual CLI + GUI | Buyer persona won't use CLI. See §4. |
| Apr 28 (v1.2) | Lock UX standards (works OOTB, sensible defaults, progressive disclosure, dry-run) | Load-bearing for non-tech buyer. See §4b. |
| Apr 28 (v1.3) | Lock GUI framework as Streamlit | Fastest velocity, lowest maintenance, hosted demo, SaaS optionality. See §4c. |
| Apr 28 (v1.3) | Add hosted browser demo as conversion lever | Direct consequence of Streamlit choice. See §5. |
| Apr 28 (v1.4) | Re-apply 04/06 boundary work (silent-drift recovery) | Stream B v1.2 content overwritten in parallel v1.3 work. Restored per no-silent-drift rule. |
| Apr 28 (v1.5) | Add 02_text_cleaner.py; renumber 02-08 → 03-09 |
Character-level hygiene had no clear owner. See TECHNICAL §10. |
| Apr 29 (v1.7) | Adopt Clean Text Tier 1/2/3 spec; lock excel-hygiene default |
Promotes from stub to buildable v1 target. Full spec in TECHNICAL §11.2. |
| Apr 28 (v1.6) | Fold conversation-history content into docs (deduplicator spec, lead bundle use cases, full GUI matrix, 04/06 examples, Streamlit-to-SaaS reasoning) | No new decisions; promote at-risk analysis from chat history per no-silent-drift rule. |
| May 1 (v1.6) | Mark Standardize Formats Ready | 199-row buyer corpus passing; Tier 1 + most Tier 2 built. |
| May 1 (v1.6) | Add src/core/errors.py structured hierarchy |
Uniform helpful messages across CLI + GUI. See TECHNICAL §7. |
| May 13 (v1.6) | Ship in-house JSON i18n + EN/ES packs | Expand addressable market (Spanish-first buyers, LatAm bookkeepers) without a gettext build step. JSON packs editable by non-devs; parity test prevents drift. See TECHNICAL §10b. |
| May 13 (v1.6) | Ship licensing: 1-year HMAC-signed blobs, name+email registration, offline verification, tier-scaffolded for future SKUs | Unlock the lifetime-update business model without recurring infra. Honor-system DRM (HMAC + 30-day refund) — sufficient at $49. See §9b below. |
| May 13 (v1.6) | Add Lite SKU (Find Duplicates + Clean Text + Standardize Formats) | Lower-priced entry point for buyers who only need the three universal tools. Per-tool feature gating + lock badges on the home grid surface the upgrade path. See §9b. |
| May 13 (v1.6) | Remove user-facing free trial | A 1-year all-features trial undercut the paid Lite SKU. Paid-only keeps tier economics clean. Internal _mint API still exists for tests and the seller's key generator. See §9b. |
| May 13 (v1.6) | Upgrade license crypto: HMAC → Ed25519 (asymmetric) | HMAC's symmetric secret was extractable from the shipped binary — anyone with the binary could mint blobs. Ed25519 splits sign (seller) from verify (binary), so binary compromise doesn't let an attacker forge licenses. Blob prefix bumped DTLIC1 → DTLIC2. See §9b. |
| May 13 (v1.6) | Add assert_production_safe tripwire |
A shipped build with DATATOOLS_DEV_MODE=1 or the in-source dev pubkey would silently defeat licensing. The tripwire refuses to boot such a build. No-op in source / pytest runs. See §9b. |
9b. Licensing model
Decision (v1.6): offline HMAC-signed license blobs, 1-year lifetime, name + email registration required. Tier-scaffolded so future SKUs (PRO, ENTERPRISE) can carve per-tool feature sets without code changes.
| Option | Verdict |
|---|---|
| Offline HMAC blob (chosen) | CHOSEN. No server, no internet, fits the no-touch constraint. Honor-system at this price point. |
| Online activation check | Rejected. Conflicts with the "your data never leaves your computer" promise; introduces support load (server downtime, network issues). |
| No license at all | Rejected. The lifetime-update value prop requires some gating to make renewal meaningful. |
| Time-bombed binary (PyInstaller --no-license) | Rejected. Can't deliver renewals without re-shipping the installer. |
| Hardware-locked license | Rejected. Friction on legitimate device-swaps; doesn't match the buyer persona's tolerance. |
Threat model (v1.6 — Ed25519): the binary ships only the public key. A motivated reverse engineer who pulls everything out of the binary has the verification key but not the signing key — they can't mint new licenses. The earlier HMAC scheme had this hole; the asymmetric upgrade closes it. The remaining attack surface is:
- Re-signing with a forked binary that ships an attacker-controlled pubkey + auto-grants licenses. Costs more effort than the price of a legitimate copy and the result is per-fork, not shareable.
- Hooking the verification call to always return True. Defeats DRM entirely but only on the attacker's own machine — they could just write down "I unlocked DataTools" and skip the work.
- Setting
DATATOOLS_DEV_MODE=1to bypass checks. Refused in shipped builds byassert_production_safe; works in source/test runs only.
The 30-day refund window covers casual blob sharing from a different angle (anyone who shares their blob is implicitly authorizing the buyer to issue them a refund-on-demand).
What's enforced:
- License blob signature must match (HMAC-SHA256 with the build secret).
- Buyer-entered name + email must match the values embedded in the blob.
- Expiry date must be in the future.
- Tier must include the requested feature.
What's NOT enforced:
- Number of devices the same blob is used on (no concurrent-use detection).
- Reverse-engineered re-signing of expired blobs (would require RSA / online check).
Future SKUs: the FEATURES_BY_TIER table in src/license/features.py is the single source of truth for "which tools each tier unlocks". Adding a PRO SKU that excludes Automated Workflows is a 1-line edit there + a 1-line edit at the gate site. No consumer-code churn.
v1.6 SKU lineup:
| Tier | Tools unlocked | Notes |
|---|---|---|
| LITE | Find Duplicates, Clean Text, Standardize Formats | Entry SKU. Three universal tools that handle the most common bookkeeping / RevOps / Klaviyo prep workflows. |
| CORE | All 9 tools | Full v1 suite. |
| PRO | All 9 tools (scaffolded) | Reserved for future per-feature carve-outs (e.g., scheduled pipelines, API access). |
| ENTERPRISE | All 9 tools (scaffolded) | Reserved for future bulk / multi-seat SKUs. |
| TRIAL | Same as LITE | Deprecated — no longer issuable. Mapping kept for any legacy on-disk trial licenses to load without error. |
Trial removed (v1.6): a 1-year free trial that unlocked every tool would undercut the paid Lite SKU (why pay for Lite when trial gives more for longer?). Paid-only keeps the funnel clean. The internal LicenseManager._mint API still exists for tests and for the seller's scripts/generate_license.py key generator; there's no user-facing way to self-issue a license.
8. Re-lock triggers
These criteria are load-bearing. Triggers for explicit re-evaluation:
- $5k/mo MRR (revisit async constraint).
- $10k/mo MRR (revisit time-budget allocation).
- Marketplace shutdown (Gumroad / Lemon Squeezy policy).
- New skill that opens a higher-leverage product category.
- Burnout signal — time/recovery balance broken.
- Streamlit hard direction change breaking desktop packaging (low probability).
Any re-lock writes new criteria here with date + rationale. No silent drift.