datatools-dev/docs/DECISIONS.md

# Decisions

> Creator-only. Locked criteria, scoring rubric, decision log.
> **Version**: 1.6 · **Updated**: 2026-05-01

## 1. Locked operating criteria

### Constraints
1. Cash budget ≤ $1,200/mo recurring. No external funding.
2. Time ≤ 10 hr/wk. Build-once assets preferred.
3. Skill set: database design, data pipelines, programming. Every opportunity must leverage these.
4. Network: none. Zero reliance on personal connections.

### Targets
5. First revenue: 15 days preferred, 90 days hard stop.
6. Revenue ceiling: tiered (BUSINESS §6). Realistic 12-mo: $5k/mo.
7. Lifestyle cashflow goal. No saleable-asset exit required.
8. Distribution: fully async, no-touch. Revisit at $5k/mo.
9. Work pattern: deep work + recovery. No real-time on-call.

### Goals
10. Escape 9-5 W2 employment without stability concerns. (Primary)
11. Free up time for retirement lifestyle, optional enjoyable work. (Secondary)

### Internal contradictions

"Fully async + 15-day-to-revenue + no network" is tight but workable. Caveat in BUSINESS §8: revisit async at $5k/mo.

## 2. Scoring rubric

Each candidate scored 1-5 on 6 dimensions. Total /30 → verdict.

| Dimension | What it measures |
|-----------|------------------|
| Fit to locked criteria | Direct match to constraints 1-4 + targets 5-9. **Any 1 = hard kill.** |
| Demand durability | Structural shift vs. trend peak. Pays in 3 yr? |
| Defensibility | What stops the next entrant. |
| Unit economics realism | CAC, payback, gross margin, working capital. |
| Operator fit | Skills, capital, time, stomach. |
| Exit / cash-flow optionality | Multiple revenue paths. |

**Verdict**: PURSUE / INVESTIGATE / PASS / KILL.

**v1.1 calibration**: original scoring inflated unit economics by treating ~100% gross margin as 5/5 without accounting for CAC under "no network." Honest score: 7.0-7.5/10 (was 8.7). Strategy still sound; optimism deflated.

## 3. Candidate evaluation

| Rank | Candidate | Score | Verdict |
|------|-----------|-------|---------|
| 1 | Niche Python Automation Script Bundles | 8.7/10 → 7.5/10 (calibrated) | **PURSUE** |
| 2 | Curated Datasets | 8.7/10 | PURSUE (deferred) |
| 3 | Hosted Data Pipeline Micro-Tool | 8.3/10 | INVESTIGATE |

**Why #1 over #2**: faster path to first revenue (digital download vs. ongoing curation pipeline). Lower ongoing maintenance. Direct programming leverage. Better fit for "build once, sell many."

**Rejected**: Notion Templates (weak skill leverage), Query Optimizer SaaS (recurring infra conflicts with lifestyle/maintenance constraint).

## 4. Platform model

| Model | Verdict |
|-------|---------|
| **Standalone tools, dual CLI + GUI (chosen)** | **CHOSEN** (revised v1.2). Build once, no hosting, no SaaS support. GUI captures non-tech buyer; CLI captures power users. |
| SaaS web app | Rejected. Recurring hosting + support conflicts with minimal-maintenance constraint. |
| CLI-only | Rejected (revised v1.2). Wrong fit for non-tech buyer; produces refunds. |
| Browser extension | Rejected. Sandbox limits, wrong tool for files. |
| Notion / Airtable templates | Rejected. Doesn't leverage programming. |

**v1.2 rationale**:
- Buyer persona ("hates Excel work but can't code") won't learn a CLI. Refunds at this price.
- Deduplicator needs interactive review — not viable in pure CLI.
- Dual interface keeps CLI for automation without sacrificing primary buyer surface.

## 4a. Functional scope principle (v1.2)

**Decision**: each script ships **complete coverage of the workflow it names**, including features Excel does free.

**Why**: one-stop shopping is the value. Forcing buyers to bounce between this product and Excel/OpenRefine for parts of one task defeats the value prop.

**Anti-rule**: not license to scope-creep. Boundary = the named workflow. Dedup includes normalization + survivor + audit. NOT format conversion or charting (those belong to other scripts).

## 4b. UX standards for GUI (v1.2 — load-bearing)

| Standard | What it means |
|----------|---------------|
| Works out of the box | Drop file → useful result, zero config. |
| Sensible defaults visible | Every option has a default that works for the common case. |
| Progressive disclosure | Default view = file uploader + go button + results. Advanced in expander panes. |
| Plain-English labels | "Find duplicates" not "Apply Levenshtein at 0.85". Tooltips carry technical detail. |
| Visible safety | Dry-run / preview by default. Original input never modified. |
| No multi-step setup | Single window for the basic task. |
| Errors name problem + fix | "Column 'email' not found. Available: name, phone. Did you mean 'phone'?" not `KeyError`. |
| Identical core to CLI | No drift. Anything CLI does, GUI does (minus interactive review = GUI-natural). |

**"Intuitive enough" test**: a non-technical user who's never seen the tool can complete the lead use case on first launch with no docs read.

## 4c. GUI framework: Streamlit (v1.3)

| Framework | Verdict |
|-----------|---------|
| **Streamlit** | **CHOSEN** |
| Tkinter + CustomTkinter | Rejected — maintainer absent (last release Jan 2024, ~28 mo). Snyk: Inactive. |
| Plain Tkinter | Rejected — UX gap unacceptable at $49-79 in 2026 without heavy hand-styling. |
| Flet | Rejected — ecosystem too young for build-once-maintain-for-years. |
| PySide6 / Qt | Rejected — overkill, steepest learning curve, biggest bundles. |
| NiceGUI | Rejected — same browser tradeoff as Streamlit, smaller community + ecosystem. |

### Scored matrix (1-5, 5 = best for this product)

| Dimension | Tk | Tk+CTk | Streamlit | Flet | PySide6 | NiceGUI |
|-----------|----|----|-----------|------|---------|---------|
| Non-tech UX | 1 | 3 | 4 | 4 | 5 | 4 |
| Native window (no browser) | 5 | 5 | 1 | 5 | 5 | 1 |
| Build speed v1 | 3 | 3 | 5 | 4 | 2 | 4 |
| Build speed per feature | 3 | 3 | 5 | 4 | 2 | 4 |
| PyInstaller compat | 5 | 4 | 2 | 3 | 3 | 2 |
| Bundle size (smaller better) | 5 | 4 | 1 | 3 | 2 | 1 |
| Maintenance burden | 4 | 3 | 4 | 3 | 4 | 3 |
| Ecosystem maturity | 5 | 3 | 4 | 2 | 5 | 3 |
| Solo-dev learning curve | 4 | 4 | 5 | 4 | 2 | 4 |
| Drop-file-see-result fit | 3 | 3 | 5 | 4 | 4 | 5 |
| **Total /50** | **38** | **37** | **38** | **36** | **34** | **35** |

**Sums lie.** Tk ties Streamlit but loses on look-and-feel + data-app fit (the dimensions that matter). Verdict is per-dimension, not total.

### Why Streamlit won

1. **Fastest build velocity** — "drop CSV, see results" is native. Tables, file uploads, dataframes are first-class. Compounds across 9-script lead + 5 future bundles.
2. **Lowest maintenance burden** — active, large community, mature ecosystem. Bugs fixed upstream.
3. **Hosted demo as marketing asset** — Streamlit Community Cloud (free) lets the landing page offer "Try free in browser" with sample data. Tk-family options can't.
4. **Future SaaS optionality** — same code runs unchanged on a hosted server (modulo auth + per-user isolation). Tk would require rewrite. Zero implementation now, meaningful flexibility later.

### Tradeoffs accepted

1. **Browser-launch UX** — buyer double-click → default browser opens to localhost. Mitigated: install email + welcome dialog + persistent in-app message. Pywebview wrap is the v1.1 fallback if confusing.
2. **Bundle size** — ~300-500 MB vs. ~50 MB for Tk. Acceptable in 2026.
3. **PyInstaller fiddly first time** — budget 1-3 days. Reusable across all bundles after.
4. **Streamlit's session re-run model** is unusual but manageable.

## 5. Distribution

**Primary**: Marketplaces (Gumroad, Lemon Squeezy). Built-in traffic, async payments/delivery/refunds, listing in days.

Own-domain SEO: long-term compounding asset (6-18 mo), not early-stage channel.

**v1.3 addition**: hosted browser demo as secondary distribution + primary conversion lever.

## 6. Pricing

$49-79/bundle · $149 full suite (when 3+ exist).

- < $99 → no procurement friction for solo operators.
- > $99 → triggers SaaS-support expectations conflicting with no-touch.
- $49-79 → right unit economics + impulse-purchase territory.

## 7. Decision log

| Date | Decision | Rationale |
|------|----------|-----------|
| Apr 2026 | Lock operating criteria | Project kickoff |
| Apr 2026 | Python Bundles selected | Highest score |
| Apr 2026 | Excel/CSV Cleaning as lead bundle | Highest pain, broadest demand |
| Apr 2026 (v1.1) | PyInstaller cross-platform pipeline | Eliminates "install Python" friction |
| Apr 2026 (v1.1) | Apple Developer Program ($99/yr) | Required for clean macOS install |
| Apr 2026 (v1.1) | Tiered revenue targets ($5k @ 12mo, $10k @ 24mo) | Original $50k unsupported by evidence |
| Apr 2026 (v1.1) | Tag "no-touch" for revisit at $5k/mo | Strict adherence pre-PMF may cost more revenue than it saves |
| Apr 28 (v1.2) | Functional scope: include workflow features even if free elsewhere | One-stop shopping is the value prop. See §4a. |
| Apr 28 (v1.2) | Promote GUI to required at v1; ship dual CLI + GUI | Buyer persona won't use CLI. See §4. |
| Apr 28 (v1.2) | Lock UX standards (works OOTB, sensible defaults, progressive disclosure, dry-run) | Load-bearing for non-tech buyer. See §4b. |
| Apr 28 (v1.3) | Lock GUI framework as Streamlit | Fastest velocity, lowest maintenance, hosted demo, SaaS optionality. See §4c. |
| Apr 28 (v1.3) | Add hosted browser demo as conversion lever | Direct consequence of Streamlit choice. See §5. |
| Apr 28 (v1.4) | Re-apply 04/06 boundary work (silent-drift recovery) | Stream B v1.2 content overwritten in parallel v1.3 work. Restored per no-silent-drift rule. |
| Apr 28 (v1.5) | Add `02_text_cleaner.py`; renumber 02-08 → 03-09 | Character-level hygiene had no clear owner. See TECHNICAL §10. |
| Apr 29 (v1.7) | Adopt Text Cleaner Tier 1/2/3 spec; lock `excel-hygiene` default | Promotes from stub to buildable v1 target. Full spec in TECHNICAL §11.2. |
| Apr 28 (v1.6) | Fold conversation-history content into docs (deduplicator spec, lead bundle use cases, full GUI matrix, 04/06 examples, Streamlit-to-SaaS reasoning) | No new decisions; promote at-risk analysis from chat history per no-silent-drift rule. |
| May 1 (v1.6) | Mark Format Standardizer **Ready** | 199-row buyer corpus passing; Tier 1 + most Tier 2 built. |
| May 1 (v1.6) | Add `src/core/errors.py` structured hierarchy | Uniform helpful messages across CLI + GUI. See TECHNICAL §7. |
| May 13 (v1.6) | Ship in-house JSON i18n + EN/ES packs | Expand addressable market (Spanish-first buyers, LatAm bookkeepers) without a `gettext` build step. JSON packs editable by non-devs; parity test prevents drift. See TECHNICAL §10b. |
| May 13 (v1.6) | Ship licensing: 1-year HMAC-signed blobs, name+email registration, offline verification, tier-scaffolded for future SKUs | Unlock the lifetime-update business model without recurring infra. Honor-system DRM (HMAC + 30-day refund) — sufficient at $49. See §9b below. |
| May 13 (v1.6) | Add Lite SKU (Dedup + Text Cleaner + Format Standardizer) | Lower-priced entry point for buyers who only need the three universal tools. Per-tool feature gating + lock badges on the home grid surface the upgrade path. See §9b. |
| May 13 (v1.6) | Remove user-facing free trial | A 1-year all-features trial undercut the paid Lite SKU. Paid-only keeps tier economics clean. Internal ``_mint`` API still exists for tests and the seller's key generator. See §9b. |

## 9b. Licensing model

**Decision (v1.6)**: offline HMAC-signed license blobs, 1-year lifetime, name + email registration required. Tier-scaffolded so future SKUs (PRO, ENTERPRISE) can carve per-tool feature sets without code changes.

| Option | Verdict |
|---|---|
| **Offline HMAC blob (chosen)** | **CHOSEN.** No server, no internet, fits the no-touch constraint. Honor-system at this price point. |
| Online activation check | Rejected. Conflicts with the "your data never leaves your computer" promise; introduces support load (server downtime, network issues). |
| No license at all | Rejected. The lifetime-update value prop requires *some* gating to make renewal meaningful. |
| Time-bombed binary (PyInstaller --no-license) | Rejected. Can't deliver renewals without re-shipping the installer. |
| Hardware-locked license | Rejected. Friction on legitimate device-swaps; doesn't match the buyer persona's tolerance. |

**Threat model**: a motivated reverse engineer can pull the HMAC secret out of the binary, mint their own licenses, and bypass the check. That's acceptable — the goal is to discourage casual blob-sharing among non-technical buyers, not stop targeted piracy. The 30-day refund window covers the same gap from a different angle (anyone who shares their blob is implicitly authorizing the buyer to issue them a refund-on-demand).

**What's enforced**:
- License blob signature must match (HMAC-SHA256 with the build secret).
- Buyer-entered name + email must match the values embedded in the blob.
- Expiry date must be in the future.
- Tier must include the requested feature.

**What's NOT enforced**:
- Number of devices the same blob is used on (no concurrent-use detection).
- Reverse-engineered re-signing of expired blobs (would require RSA / online check).

**Future SKUs**: the ``FEATURES_BY_TIER`` table in ``src/license/features.py`` is the single source of truth for "which tools each tier unlocks". Adding a PRO SKU that excludes the pipeline runner is a 1-line edit there + a 1-line edit at the gate site. No consumer-code churn.

**v1.6 SKU lineup**:

| Tier | Tools unlocked | Notes |
|---|---|---|
| LITE | Deduplicator, Text Cleaner, Format Standardizer | Entry SKU. Three universal tools that handle the most common bookkeeping / RevOps / Klaviyo prep workflows. |
| CORE | All 9 tools | Full v1 suite. |
| PRO | All 9 tools (scaffolded) | Reserved for future per-feature carve-outs (e.g., scheduled pipelines, API access). |
| ENTERPRISE | All 9 tools (scaffolded) | Reserved for future bulk / multi-seat SKUs. |
| TRIAL | Same as LITE | Deprecated — no longer issuable. Mapping kept for any legacy on-disk trial licenses to load without error. |

**Trial removed (v1.6)**: a 1-year free trial that unlocked every tool would undercut the paid Lite SKU (why pay for Lite when trial gives more for longer?). Paid-only keeps the funnel clean. The internal ``LicenseManager._mint`` API still exists for tests and for the seller's ``scripts/generate_license.py`` key generator; there's no user-facing way to self-issue a license.

## 8. Re-lock triggers

These criteria are load-bearing. Triggers for explicit re-evaluation:

- $5k/mo MRR (revisit async constraint).
- $10k/mo MRR (revisit time-budget allocation).
- Marketplace shutdown (Gumroad / Lemon Squeezy policy).
- New skill that opens a higher-leverage product category.
- Burnout signal — time/recovery balance broken.
- Streamlit hard direction change breaking desktop packaging (low probability).

Any re-lock writes new criteria here with date + rationale. **No silent drift.**