Files
datatools-dev/docs/PLAN.md
Michael db5ec084da docs+code: rename tool labels everywhere
Sweep follow-up to 93e43fc. Display labels now consistent across docs,
landing pages, CLI output, code comments, docstrings, and test prose.
Five parallel surfaces touched:

- docs (EN + ES): README, USER-GUIDE, CLI-REFERENCE, and 11 internal
  design/planning docs
- landing pages: index + bookkeeper/revops/shopify-pet
- src: CLI module docstrings, _TOOL_DISPLAY dicts in cli_analyze.py
  and gui/components/_legacy.py, core module headers, every tool
  page's module docstring
- tests: class/method/module docstrings and section-header comments
- test-cases READMEs

Page slugs (1_Deduplicator etc.), tool_id strings (01_deduplicator
etc.), Python class names (TestDeduplicatorWorkflow, FeatureFlag.*),
URL paths, anchor IDs, CSS classes, and asset filenames were left
intact since they're code identifiers / structural references.

All 2033 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 19:50:09 +00:00

221 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Strategic Plan — DataTools
> Creator-only. Locks the "what next" in light of the locked criteria
> (DECISIONS.md §1) and the v1.6 honest status (BUSINESS.md §13).
> **Version**: 1.0 · **Adopted**: 2026-05-01 · **Owner**: Michael
This document is the active plan, derived from the strategic review of
2026-05-01. It compresses the eight strategic moves and a 90-day
execution sequence onto one page so the next decision (build vs.
ship vs. market) has a single reference.
It is **not** a re-lock of operating criteria — those still live in
DECISIONS.md and have not changed. This plan is downstream of those
criteria; if a move below conflicts with §1 of Decisions, the criteria
win.
## 1. Frame
**Locked context** (BUSINESS.md, DECISIONS.md):
- Niche Python automation tools, $4979 single / $149 suite.
- Cash budget ≤ $1,200/mo recurring · Time ≤ 10 hr/wk · No external funding.
- Async + no-touch sales (revisit at $5k/mo MRR).
- Marketplace-first distribution (Gumroad / Lemon Squeezy).
- Streamlit GUI + CLI dual interface, runs locally.
- Lifestyle cashflow goal (no exit needed).
**Honest current state** (2026-05-01):
| Asset | State |
|---|---|
| Tools 15 (Find Duplicates, Clean Text, Standardize Formats, Fix Missing Values, Map Columns) | Ready · 1,691 tests passing · 0 xfailed |
| Tools 69 (Find Unusual Values, Combine Files, Quality Check, Automated Workflows) | Coming Soon |
| PyInstaller installer pipeline | Not started |
| macOS code signing (Apple Dev Program) | Not started |
| Hosted browser demo (Streamlit Cloud) | Not deployed |
| Landing page | Not live |
| Marketplace listing (Gumroad) | Not listed |
| Paying customers | 0 |
**Diagnosis**: the bottleneck is not feature count — it's distribution.
The next $1 of value comes from closing the gap between "code-complete"
and "buyer-pulls-out-card", not from tool 6.
## 2. The eight strategic moves
Numbered moves. Each is consistent with locked criteria.
### 2.1 Freeze new-tool development (one exception). Ship what exists.
Tools 68 are blocked behind a **distribution gate**: no work on them
until the existing 5 tools have a paying customer + one external review
(BUSINESS.md §4 sequence rule, applied recursively inside the bundle).
**Exception granted 2026-05-01**: Tool 09 Automated Workflows is built
*now*. Rationale: the pipeline transforms the bundle from "5 tools you
buy" into "an automatable workflow you depend on." That conversion is
what produces retention and word-of-mouth — the only marketing channel
that scales under the no-network/no-touch constraint.
### 2.2 The demo *is* the product. Make it embarrassingly good.
- Three persona-tagged sample datasets, not one generic CSV: Shopify
customers / bookkeeper bank export / agency lead list.
- Run the *full pipeline* on the sample (Review → Dedup → Text Clean →
Format → Missing → Column Map). Free version caps **output rows**,
not the experience.
- Embed the demo as an **iframe on the landing page** (not "click to
open"). Friction kills conversion.
- Persistent CTA after demo: *"Run this on your own 50 k-row file →
buy for $49 →"* directly above the Gumroad button.
### 2.3 Niche down. Stop selling "data cleaning."
One engine, three landing pages:
| Persona | Landing-page lead | Demo dataset |
|---|---|---|
| Shopify operator (priority: pet supplies) | "Clean your customer / vendor / subscriber exports" | uc01_shopify_customer_list |
| Bookkeeper / freelance accountant | "Reconcile bank exports + vendor lists. Auditable changes." | uc06_bank_export_overlap |
| Marketing / RevOps agency | "Dedupe lead lists. Standardize phones across vendors." | uc13_combined_lead_sources |
Generic copy competes with `pip install pandas`. Vertical copy
competes with nothing.
### 2.3a Top pain points per niche
The "what does this actually fix?" question. Each pain point below is
sourced from operator-domain knowledge of these markets and the
buyer-use-case research already captured in `BUSINESS.md §4a`. Pain
points are ranked by **frequency × dollar impact** for that persona —
high-frequency / high-cost pains lead the landing-page copy and the
demo dataset.
> **Validation gap (honest disclaimer)**: these pains are derived from
> operator knowledge of the categories, not from a sample of buyer
> interviews. Per `BUSINESS.md §8` (no-touch constraint review at $5k/mo
> MRR), validate the top-3 per persona via 5 buyer interviews before the
> first $200 of paid acquisition spend. If any pain ranks below the
> assumed level, swap it for the next-highest in this list.
#### Shopify operator (priority: pet supplies)
| # | Pain | $ / time impact | Tools that fix it |
|---|------|-----------------|---|
| S1 | **Klaviyo / Mailchimp / Omnisend per-contact billing.** Subscriber list with 1018 % duplicate rate (case drift, plus signs in Gmail addresses, multiple devices) → recurring overpay forever. | $30300/mo per percent of dupes on a 50 k list — recurring | Dedup + Format Standardize (email canonicalization) + Pipeline (re-run weekly) |
| S2 | **Product feed rejected by Google Merchant Center / Meta Catalog.** Smart quotes in titles, NBSP in SKU, inconsistent attributes; campaign launch delayed 2472 h while feed gets fixed. | 13 days delayed launch × campaign value | Clean Text + Standardize Formats |
| S3 | **Multi-channel order consolidation.** Shopify + Etsy + Amazon + Faire + wholesale spreadsheet, each with a different column for "customer email" / "order total" / "ship country". | 48 hr / month manually merging | Map Columns + Find Duplicates + Automated Workflows |
| S4 | **Subscription identity fragmentation.** Pet-box subscribers cancel and re-sub under a different email; cohort analysis says churn is 20 % when it's actually 12 % — pricing decisions wrong. | Mis-priced LTV → over- or under-paid acquisition | Dedup with `merge=true` survivor |
| S5 | **International tax / VAT MOSS compliance.** Country column is `UK` / `U.K.` / `United Kingdom` / `GB` in the same export; VAT report breaks. Phone formats per region break call-center routing. | Compliance penalty risk + ops friction | Standardize Formats (per-row country) + Map Columns |
#### Bookkeeper / freelance accountant
| # | Pain | $ / time impact | Tools that fix it |
|---|------|-----------------|---|
| B1 | **Bank-export month-overlap re-import.** Same transaction posts twice when Jan and Feb exports overlap at the boundary; client's books understate cash by 14 %. | 24 hr / month / client + reconciliation errors | Dedup with explicit Date+Amount+fuzzy Vendor strategy |
| B2 | **QBO / Xero vendor consolidation for 1099 reports.** "Amazon" / "amazon.com" / "AMAZON.COM*4F2X9" become 3 vendors; 1099 reports break, P&L by vendor unusable. | 12 hr / 1099 cycle + IRS-paper-trail risk | Format Standardize (name canonicalization) + Dedup |
| B3 | **Liability / professional indemnity.** Cannot use AI tools that don't show their work; client audit response window is 2448 h. | Per-firm liability premium ≈ $5002,500 / yr | Audit log built into every tool — every change row-logged |
| B4 | **Per-license-not-per-client economics.** Most cleanup tools are per-seat / per-client SaaS; bookkeepers managing 1030 clients hit price walls fast. | $30/mo × N clients vs. $49 once | Desktop license, no per-client constraint |
| B5 | **Multi-currency books.** US-domiciled clients with EU customers; comma-decimal amounts (`€1.234,56`) crash standard parsers; parens-negative (`($89.50)`) treated as positive. | 3060 min per multi-currency client per month | Format Standardize (`currency_decimal=auto`, parens-negative) |
#### Marketing / RevOps agency
| # | Pain | $ / time impact | Tools that fix it |
|---|------|-----------------|---|
| R1 | **HubSpot / Marketo / Iterable per-contact tier pricing.** 10 k contacts → enterprise tier at $48 k/mo. Every duplicate is a recurring tax. | $200800 / month per 1 k duplicate contacts — recurring | Dedup with cross-source merge + Pipeline |
| R2 | **Email-deliverability / sender reputation.** Sending to invalid or duplicate addresses tanks reputation; recovery takes weeks. | Catastrophic — entire email programme degraded | Format Standardize (email canonicalization) + Missing (sentinel detection) |
| R3 | **GDPR / contact-data privacy.** Uploading lead data to a third-party cleaning SaaS is itself a GDPR concern; legal review blocks adoption. | Compliance risk + 48 wk legal-review delay | Local-only desktop app, zero outbound calls |
| R4 | **Multi-vendor lead-source unification.** Apollo, ZoomInfo, LinkedIn Sales Nav, manual scrapes — each export has different headers, scoring, country format. | 13 days per campaign of manual unification | Map Columns (alias matching) + Standardize Formats (per-row country) + Find Duplicates |
| R5 | **Suppression-list management across 5+ platforms.** Each platform has its own format; un-deduped suppression lists let opt-outs slip through, triggering CAN-SPAM / GDPR exposure. | Compliance risk + churn-back cost | Pipeline saved as JSON, re-run on each new suppression batch |
### 2.4 Operationalize the moat the docs already name.
Three durable advantages, each promoted from buried feature to
landing-page H1:
- **Quality**: 1 GB international standardization in ~2.5 minutes,
locally. Excel can't do this; OpenRefine fights you for an hour.
- **Privacy**: "Your data never leaves this computer." Already in the
GUI footer — promote to landing-page lead, screenshot the empty
network tab.
- **Update cadence**: ship a v1.1 patch within 30 days of v1.0 launch.
Not features — *evidence* the product is alive. "Added Czech Republic
phone format support" beats "no updates in 6 months" every time.
### 2.5 Surface the audit-trail feature in sales copy.
Every tool has a structured audit log. Most cleaning tools do not.
Bookkeepers and consultants get fired if they can't show what changed
to a client. The audit feature is currently invisible on every
proposed landing page and should be the **second-largest callout**
right after "runs locally."
Copy seed: *"Every change auditable. Hand the audit CSV to your client
with the cleaned file."*
### 2.6 Automated Workflows is the retention multiplier.
A buyer with a saved pipeline isn't a one-off purchase — they're a
recurring user who recommends the product. This is exactly the
behavioural lever the no-touch constraint needs (DECISIONS.md §8
trigger). Build it now (see §2.1 exception).
### 2.7 Add a $199 "priority support" tier post-launch.
Same code, async-email SLA (24 h response). Targets the bookkeeper /
consultant persona whose own time is $300/hr. Zero new product work,
~3× ARPU on 510 % of buyers. Lock the SLA to **async only** so the
no-touch constraint isn't violated. Defer until $5 k/mo MRR (the same
trigger DECISIONS.md §8 already names).
### 2.8 Dependency-aware pipeline UX.
Tools have soft execution-order preferences (Text Clean before Format
Standardize, Format before Dedup, Missing before Dedup). Automated
Workflows *recommends* the order, *warns* on reversals, and **never
forces** — the user owns their workflow. Implementation: see
`src/core/pipeline.py` `SOFT_DEPENDENCIES`.
## 3. 90-day execution sequence
| Week | Action | Done when |
|---|---|---|
| 1 | PyInstaller pipeline · Mac/Win unsigned installers · Apple Dev Program enrollment (12 wk lead) | `dist/datatools-mac.dmg` and `dist/datatools-win.exe` install on a clean machine |
| 2 | Demo deployed to Streamlit Cloud · landing page v1 with embedded demo · 3 persona datasets in the demo | Public URL serves a working pipeline run on a sample dataset in < 30 s |
| 3 | Gumroad listing live · share value-first in 3 niche communities (no pitch) · 1 long-tail SEO post for the lead persona | First listing impression captured · post not removed for self-promotion |
| 4 | Automated Workflows v1.0 shipped (this week, 2026-05-01 — exception per §2.1) · v1.1 patch announced with Tool 09 + intl improvements | Pipeline saves/loads JSON · 3 demo pipelines preloaded |
| 58 | Bookkeeper landing page · agency landing page · second tool's promo cycle · priority-support tier added (defer purchase until §2.7 trigger) | Three live landing pages with distinct H1, demo dataset, conversion target |
| 913 | Tool 0608 only **if** revenue trajectory supports continued investment · otherwise more market work on the existing 5 + 09 | Decision made on 13 Aug 2026 with revenue data, not feature ambition |
## 4. Decision triggers (re-evaluation prompts)
These flip the plan, not the underlying criteria:
| Trigger | Reaction |
|---|---|
| First paying customer in week 413 | Continue. Plan is working. |
| **Zero** paid in 90 days | Audit the funnel. Demo conversion? Niche fit? Price? Don't add features. |
| $5 k/mo MRR | DECISIONS.md §8 trigger fires: revisit async + priority-support tier. |
| Marketplace policy / shutdown | Switch to own-domain Stripe immediately; landing pages are already self-hosted. |
| Streamlit hard direction change | Low-probability re-lock per DECISIONS.md §8. Tk fallback is documented. |
## 5. Anti-temptations (things the plan refuses)
- **More tools before more buyers.** Locked. Exception only for Automated Workflows per §2.1.
- **SaaS pivot.** Recurring infra conflicts with the lifestyle constraint (DECISIONS.md §4).
- **Live chat / sales calls.** Conflicts with no-touch (DECISIONS.md §1 #8).
- **Custom integrations / one-off consulting.** $300/hr looks tempting; breaks the "build once, sell many" model that justifies the entire strategy.
- **Going broad on personas.** "All small businesses" is a generic landing page that converts at 1 %; "Shopify pet-supply operators with 1k50k customers" converts at 515 % in the right communities.
## 6. What this plan deliberately leaves open
- Whether tools 0608 ever ship. Decided on revenue, not roadmap.
- Whether to add a fourth niche landing page. Decided on which of the
three is producing.
- Whether to invest in own-domain SEO. Compounding 618 mo asset; not
the early-stage channel. Revisit when marketplace + community
produces baseline traffic to optimise.
- Whether to add a Notion / Slack support community. If support volume
per 100 sales > 10 (BUSINESS.md §12 target), revisit; else leave async-email only.