Sweep follow-up to 93e43fc. Display labels now consistent across docs,
landing pages, CLI output, code comments, docstrings, and test prose.
Five parallel surfaces touched:
- docs (EN + ES): README, USER-GUIDE, CLI-REFERENCE, and 11 internal
design/planning docs
- landing pages: index + bookkeeper/revops/shopify-pet
- src: CLI module docstrings, _TOOL_DISPLAY dicts in cli_analyze.py
and gui/components/_legacy.py, core module headers, every tool
page's module docstring
- tests: class/method/module docstrings and section-header comments
- test-cases READMEs
Page slugs (1_Deduplicator etc.), tool_id strings (01_deduplicator
etc.), Python class names (TestDeduplicatorWorkflow, FeatureFlag.*),
URL paths, anchor IDs, CSS classes, and asset filenames were left
intact since they're code identifiers / structural references.
All 2033 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
221 lines
14 KiB
Markdown
221 lines
14 KiB
Markdown
# Strategic Plan — DataTools
|
||
|
||
> Creator-only. Locks the "what next" in light of the locked criteria
|
||
> (DECISIONS.md §1) and the v1.6 honest status (BUSINESS.md §13).
|
||
> **Version**: 1.0 · **Adopted**: 2026-05-01 · **Owner**: Michael
|
||
|
||
This document is the active plan, derived from the strategic review of
|
||
2026-05-01. It compresses the eight strategic moves and a 90-day
|
||
execution sequence onto one page so the next decision (build vs.
|
||
ship vs. market) has a single reference.
|
||
|
||
It is **not** a re-lock of operating criteria — those still live in
|
||
DECISIONS.md and have not changed. This plan is downstream of those
|
||
criteria; if a move below conflicts with §1 of Decisions, the criteria
|
||
win.
|
||
|
||
## 1. Frame
|
||
|
||
**Locked context** (BUSINESS.md, DECISIONS.md):
|
||
|
||
- Niche Python automation tools, $49–79 single / $149 suite.
|
||
- Cash budget ≤ $1,200/mo recurring · Time ≤ 10 hr/wk · No external funding.
|
||
- Async + no-touch sales (revisit at $5k/mo MRR).
|
||
- Marketplace-first distribution (Gumroad / Lemon Squeezy).
|
||
- Streamlit GUI + CLI dual interface, runs locally.
|
||
- Lifestyle cashflow goal (no exit needed).
|
||
|
||
**Honest current state** (2026-05-01):
|
||
|
||
| Asset | State |
|
||
|---|---|
|
||
| Tools 1–5 (Find Duplicates, Clean Text, Standardize Formats, Fix Missing Values, Map Columns) | Ready · 1,691 tests passing · 0 xfailed |
|
||
| Tools 6–9 (Find Unusual Values, Combine Files, Quality Check, Automated Workflows) | Coming Soon |
|
||
| PyInstaller installer pipeline | Not started |
|
||
| macOS code signing (Apple Dev Program) | Not started |
|
||
| Hosted browser demo (Streamlit Cloud) | Not deployed |
|
||
| Landing page | Not live |
|
||
| Marketplace listing (Gumroad) | Not listed |
|
||
| Paying customers | 0 |
|
||
|
||
**Diagnosis**: the bottleneck is not feature count — it's distribution.
|
||
The next $1 of value comes from closing the gap between "code-complete"
|
||
and "buyer-pulls-out-card", not from tool 6.
|
||
|
||
## 2. The eight strategic moves
|
||
|
||
Numbered moves. Each is consistent with locked criteria.
|
||
|
||
### 2.1 Freeze new-tool development (one exception). Ship what exists.
|
||
|
||
Tools 6–8 are blocked behind a **distribution gate**: no work on them
|
||
until the existing 5 tools have a paying customer + one external review
|
||
(BUSINESS.md §4 sequence rule, applied recursively inside the bundle).
|
||
|
||
**Exception granted 2026-05-01**: Tool 09 Automated Workflows is built
|
||
*now*. Rationale: the pipeline transforms the bundle from "5 tools you
|
||
buy" into "an automatable workflow you depend on." That conversion is
|
||
what produces retention and word-of-mouth — the only marketing channel
|
||
that scales under the no-network/no-touch constraint.
|
||
|
||
### 2.2 The demo *is* the product. Make it embarrassingly good.
|
||
|
||
- Three persona-tagged sample datasets, not one generic CSV: Shopify
|
||
customers / bookkeeper bank export / agency lead list.
|
||
- Run the *full pipeline* on the sample (Review → Dedup → Text Clean →
|
||
Format → Missing → Column Map). Free version caps **output rows**,
|
||
not the experience.
|
||
- Embed the demo as an **iframe on the landing page** (not "click to
|
||
open"). Friction kills conversion.
|
||
- Persistent CTA after demo: *"Run this on your own 50 k-row file →
|
||
buy for $49 →"* directly above the Gumroad button.
|
||
|
||
### 2.3 Niche down. Stop selling "data cleaning."
|
||
|
||
One engine, three landing pages:
|
||
|
||
| Persona | Landing-page lead | Demo dataset |
|
||
|---|---|---|
|
||
| Shopify operator (priority: pet supplies) | "Clean your customer / vendor / subscriber exports" | uc01_shopify_customer_list |
|
||
| Bookkeeper / freelance accountant | "Reconcile bank exports + vendor lists. Auditable changes." | uc06_bank_export_overlap |
|
||
| Marketing / RevOps agency | "Dedupe lead lists. Standardize phones across vendors." | uc13_combined_lead_sources |
|
||
|
||
Generic copy competes with `pip install pandas`. Vertical copy
|
||
competes with nothing.
|
||
|
||
### 2.3a Top pain points per niche
|
||
|
||
The "what does this actually fix?" question. Each pain point below is
|
||
sourced from operator-domain knowledge of these markets and the
|
||
buyer-use-case research already captured in `BUSINESS.md §4a`. Pain
|
||
points are ranked by **frequency × dollar impact** for that persona —
|
||
high-frequency / high-cost pains lead the landing-page copy and the
|
||
demo dataset.
|
||
|
||
> **Validation gap (honest disclaimer)**: these pains are derived from
|
||
> operator knowledge of the categories, not from a sample of buyer
|
||
> interviews. Per `BUSINESS.md §8` (no-touch constraint review at $5k/mo
|
||
> MRR), validate the top-3 per persona via 5 buyer interviews before the
|
||
> first $200 of paid acquisition spend. If any pain ranks below the
|
||
> assumed level, swap it for the next-highest in this list.
|
||
|
||
#### Shopify operator (priority: pet supplies)
|
||
|
||
| # | Pain | $ / time impact | Tools that fix it |
|
||
|---|------|-----------------|---|
|
||
| S1 | **Klaviyo / Mailchimp / Omnisend per-contact billing.** Subscriber list with 10–18 % duplicate rate (case drift, plus signs in Gmail addresses, multiple devices) → recurring overpay forever. | $30–300/mo per percent of dupes on a 50 k list — recurring | Dedup + Format Standardize (email canonicalization) + Pipeline (re-run weekly) |
|
||
| S2 | **Product feed rejected by Google Merchant Center / Meta Catalog.** Smart quotes in titles, NBSP in SKU, inconsistent attributes; campaign launch delayed 24–72 h while feed gets fixed. | 1–3 days delayed launch × campaign value | Clean Text + Standardize Formats |
|
||
| S3 | **Multi-channel order consolidation.** Shopify + Etsy + Amazon + Faire + wholesale spreadsheet, each with a different column for "customer email" / "order total" / "ship country". | 4–8 hr / month manually merging | Map Columns + Find Duplicates + Automated Workflows |
|
||
| S4 | **Subscription identity fragmentation.** Pet-box subscribers cancel and re-sub under a different email; cohort analysis says churn is 20 % when it's actually 12 % — pricing decisions wrong. | Mis-priced LTV → over- or under-paid acquisition | Dedup with `merge=true` survivor |
|
||
| S5 | **International tax / VAT MOSS compliance.** Country column is `UK` / `U.K.` / `United Kingdom` / `GB` in the same export; VAT report breaks. Phone formats per region break call-center routing. | Compliance penalty risk + ops friction | Standardize Formats (per-row country) + Map Columns |
|
||
|
||
#### Bookkeeper / freelance accountant
|
||
|
||
| # | Pain | $ / time impact | Tools that fix it |
|
||
|---|------|-----------------|---|
|
||
| B1 | **Bank-export month-overlap re-import.** Same transaction posts twice when Jan and Feb exports overlap at the boundary; client's books understate cash by 1–4 %. | 2–4 hr / month / client + reconciliation errors | Dedup with explicit Date+Amount+fuzzy Vendor strategy |
|
||
| B2 | **QBO / Xero vendor consolidation for 1099 reports.** "Amazon" / "amazon.com" / "AMAZON.COM*4F2X9" become 3 vendors; 1099 reports break, P&L by vendor unusable. | 1–2 hr / 1099 cycle + IRS-paper-trail risk | Format Standardize (name canonicalization) + Dedup |
|
||
| B3 | **Liability / professional indemnity.** Cannot use AI tools that don't show their work; client audit response window is 24–48 h. | Per-firm liability premium ≈ $500–2,500 / yr | Audit log built into every tool — every change row-logged |
|
||
| B4 | **Per-license-not-per-client economics.** Most cleanup tools are per-seat / per-client SaaS; bookkeepers managing 10–30 clients hit price walls fast. | $30/mo × N clients vs. $49 once | Desktop license, no per-client constraint |
|
||
| B5 | **Multi-currency books.** US-domiciled clients with EU customers; comma-decimal amounts (`€1.234,56`) crash standard parsers; parens-negative (`($89.50)`) treated as positive. | 30–60 min per multi-currency client per month | Format Standardize (`currency_decimal=auto`, parens-negative) |
|
||
|
||
#### Marketing / RevOps agency
|
||
|
||
| # | Pain | $ / time impact | Tools that fix it |
|
||
|---|------|-----------------|---|
|
||
| R1 | **HubSpot / Marketo / Iterable per-contact tier pricing.** 10 k contacts → enterprise tier at $4–8 k/mo. Every duplicate is a recurring tax. | $200–800 / month per 1 k duplicate contacts — recurring | Dedup with cross-source merge + Pipeline |
|
||
| R2 | **Email-deliverability / sender reputation.** Sending to invalid or duplicate addresses tanks reputation; recovery takes weeks. | Catastrophic — entire email programme degraded | Format Standardize (email canonicalization) + Missing (sentinel detection) |
|
||
| R3 | **GDPR / contact-data privacy.** Uploading lead data to a third-party cleaning SaaS is itself a GDPR concern; legal review blocks adoption. | Compliance risk + 4–8 wk legal-review delay | Local-only desktop app, zero outbound calls |
|
||
| R4 | **Multi-vendor lead-source unification.** Apollo, ZoomInfo, LinkedIn Sales Nav, manual scrapes — each export has different headers, scoring, country format. | 1–3 days per campaign of manual unification | Map Columns (alias matching) + Standardize Formats (per-row country) + Find Duplicates |
|
||
| R5 | **Suppression-list management across 5+ platforms.** Each platform has its own format; un-deduped suppression lists let opt-outs slip through, triggering CAN-SPAM / GDPR exposure. | Compliance risk + churn-back cost | Pipeline saved as JSON, re-run on each new suppression batch |
|
||
|
||
### 2.4 Operationalize the moat the docs already name.
|
||
|
||
Three durable advantages, each promoted from buried feature to
|
||
landing-page H1:
|
||
|
||
- **Quality**: 1 GB international standardization in ~2.5 minutes,
|
||
locally. Excel can't do this; OpenRefine fights you for an hour.
|
||
- **Privacy**: "Your data never leaves this computer." Already in the
|
||
GUI footer — promote to landing-page lead, screenshot the empty
|
||
network tab.
|
||
- **Update cadence**: ship a v1.1 patch within 30 days of v1.0 launch.
|
||
Not features — *evidence* the product is alive. "Added Czech Republic
|
||
phone format support" beats "no updates in 6 months" every time.
|
||
|
||
### 2.5 Surface the audit-trail feature in sales copy.
|
||
|
||
Every tool has a structured audit log. Most cleaning tools do not.
|
||
Bookkeepers and consultants get fired if they can't show what changed
|
||
to a client. The audit feature is currently invisible on every
|
||
proposed landing page and should be the **second-largest callout** —
|
||
right after "runs locally."
|
||
|
||
Copy seed: *"Every change auditable. Hand the audit CSV to your client
|
||
with the cleaned file."*
|
||
|
||
### 2.6 Automated Workflows is the retention multiplier.
|
||
|
||
A buyer with a saved pipeline isn't a one-off purchase — they're a
|
||
recurring user who recommends the product. This is exactly the
|
||
behavioural lever the no-touch constraint needs (DECISIONS.md §8
|
||
trigger). Build it now (see §2.1 exception).
|
||
|
||
### 2.7 Add a $199 "priority support" tier post-launch.
|
||
|
||
Same code, async-email SLA (24 h response). Targets the bookkeeper /
|
||
consultant persona whose own time is $300/hr. Zero new product work,
|
||
~3× ARPU on 5–10 % of buyers. Lock the SLA to **async only** so the
|
||
no-touch constraint isn't violated. Defer until $5 k/mo MRR (the same
|
||
trigger DECISIONS.md §8 already names).
|
||
|
||
### 2.8 Dependency-aware pipeline UX.
|
||
|
||
Tools have soft execution-order preferences (Text Clean before Format
|
||
Standardize, Format before Dedup, Missing before Dedup). Automated
|
||
Workflows *recommends* the order, *warns* on reversals, and **never
|
||
forces** — the user owns their workflow. Implementation: see
|
||
`src/core/pipeline.py` `SOFT_DEPENDENCIES`.
|
||
|
||
## 3. 90-day execution sequence
|
||
|
||
| Week | Action | Done when |
|
||
|---|---|---|
|
||
| 1 | PyInstaller pipeline · Mac/Win unsigned installers · Apple Dev Program enrollment (1–2 wk lead) | `dist/datatools-mac.dmg` and `dist/datatools-win.exe` install on a clean machine |
|
||
| 2 | Demo deployed to Streamlit Cloud · landing page v1 with embedded demo · 3 persona datasets in the demo | Public URL serves a working pipeline run on a sample dataset in < 30 s |
|
||
| 3 | Gumroad listing live · share value-first in 3 niche communities (no pitch) · 1 long-tail SEO post for the lead persona | First listing impression captured · post not removed for self-promotion |
|
||
| 4 | Automated Workflows v1.0 shipped (this week, 2026-05-01 — exception per §2.1) · v1.1 patch announced with Tool 09 + intl improvements | Pipeline saves/loads JSON · 3 demo pipelines preloaded |
|
||
| 5–8 | Bookkeeper landing page · agency landing page · second tool's promo cycle · priority-support tier added (defer purchase until §2.7 trigger) | Three live landing pages with distinct H1, demo dataset, conversion target |
|
||
| 9–13 | Tool 06–08 only **if** revenue trajectory supports continued investment · otherwise more market work on the existing 5 + 09 | Decision made on 13 Aug 2026 with revenue data, not feature ambition |
|
||
|
||
## 4. Decision triggers (re-evaluation prompts)
|
||
|
||
These flip the plan, not the underlying criteria:
|
||
|
||
| Trigger | Reaction |
|
||
|---|---|
|
||
| First paying customer in week 4–13 | Continue. Plan is working. |
|
||
| **Zero** paid in 90 days | Audit the funnel. Demo conversion? Niche fit? Price? Don't add features. |
|
||
| $5 k/mo MRR | DECISIONS.md §8 trigger fires: revisit async + priority-support tier. |
|
||
| Marketplace policy / shutdown | Switch to own-domain Stripe immediately; landing pages are already self-hosted. |
|
||
| Streamlit hard direction change | Low-probability re-lock per DECISIONS.md §8. Tk fallback is documented. |
|
||
|
||
## 5. Anti-temptations (things the plan refuses)
|
||
|
||
- **More tools before more buyers.** Locked. Exception only for Automated Workflows per §2.1.
|
||
- **SaaS pivot.** Recurring infra conflicts with the lifestyle constraint (DECISIONS.md §4).
|
||
- **Live chat / sales calls.** Conflicts with no-touch (DECISIONS.md §1 #8).
|
||
- **Custom integrations / one-off consulting.** $300/hr looks tempting; breaks the "build once, sell many" model that justifies the entire strategy.
|
||
- **Going broad on personas.** "All small businesses" is a generic landing page that converts at 1 %; "Shopify pet-supply operators with 1k–50k customers" converts at 5–15 % in the right communities.
|
||
|
||
## 6. What this plan deliberately leaves open
|
||
|
||
- Whether tools 06–08 ever ship. Decided on revenue, not roadmap.
|
||
- Whether to add a fourth niche landing page. Decided on which of the
|
||
three is producing.
|
||
- Whether to invest in own-domain SEO. Compounding 6–18 mo asset; not
|
||
the early-stage channel. Revisit when marketplace + community
|
||
produces baseline traffic to optimise.
|
||
- Whether to add a Notion / Slack support community. If support volume
|
||
per 100 sales > 10 (BUSINESS.md §12 target), revisit; else leave async-email only.
|