# Strategic Plan — DataTools > Creator-only. Locks the "what next" in light of the locked criteria > (DECISIONS.md §1) and the v1.6 honest status (BUSINESS.md §13). > **Version**: 1.0 · **Adopted**: 2026-05-01 · **Owner**: Michael This document is the active plan, derived from the strategic review of 2026-05-01. It compresses the eight strategic moves and a 90-day execution sequence onto one page so the next decision (build vs. ship vs. market) has a single reference. It is **not** a re-lock of operating criteria — those still live in DECISIONS.md and have not changed. This plan is downstream of those criteria; if a move below conflicts with §1 of Decisions, the criteria win. ## 1. Frame **Locked context** (BUSINESS.md, DECISIONS.md): - Niche Python automation tools, $49–79 single / $149 suite. - Cash budget ≤ $1,200/mo recurring · Time ≤ 10 hr/wk · No external funding. - Async + no-touch sales (revisit at $5k/mo MRR). - Marketplace-first distribution (Gumroad / Lemon Squeezy). - Streamlit GUI + CLI dual interface, runs locally. - Lifestyle cashflow goal (no exit needed). **Honest current state** (2026-05-01): | Asset | State | |---|---| | Tools 1–5 (Dedup, Text Clean, Format Standardize, Missing, Column Mapper) | Ready · 1,691 tests passing · 0 xfailed | | Tools 6–9 (Outlier, Multi-File Merge, Validator, Pipeline) | Coming Soon | | PyInstaller installer pipeline | Not started | | macOS code signing (Apple Dev Program) | Not started | | Hosted browser demo (Streamlit Cloud) | Not deployed | | Landing page | Not live | | Marketplace listing (Gumroad) | Not listed | | Paying customers | 0 | **Diagnosis**: the bottleneck is not feature count — it's distribution. The next $1 of value comes from closing the gap between "code-complete" and "buyer-pulls-out-card", not from tool 6. ## 2. The eight strategic moves Numbered moves. Each is consistent with locked criteria. ### 2.1 Freeze new-tool development (one exception). Ship what exists. Tools 6–8 are blocked behind a **distribution gate**: no work on them until the existing 5 tools have a paying customer + one external review (BUSINESS.md §4 sequence rule, applied recursively inside the bundle). **Exception granted 2026-05-01**: Tool 09 Pipeline Runner is built *now*. Rationale: the pipeline transforms the bundle from "5 tools you buy" into "an automatable workflow you depend on." That conversion is what produces retention and word-of-mouth — the only marketing channel that scales under the no-network/no-touch constraint. ### 2.2 The demo *is* the product. Make it embarrassingly good. - Three persona-tagged sample datasets, not one generic CSV: Shopify customers / bookkeeper bank export / agency lead list. - Run the *full pipeline* on the sample (Review → Dedup → Text Clean → Format → Missing → Column Map). Free version caps **output rows**, not the experience. - Embed the demo as an **iframe on the landing page** (not "click to open"). Friction kills conversion. - Persistent CTA after demo: *"Run this on your own 50 k-row file → buy for $49 →"* directly above the Gumroad button. ### 2.3 Niche down. Stop selling "data cleaning." One engine, three landing pages: | Persona | Landing-page lead | Demo dataset | |---|---|---| | Shopify operator (priority: pet supplies) | "Clean your customer / vendor / subscriber exports" | uc01_shopify_customer_list | | Bookkeeper / freelance accountant | "Reconcile bank exports + vendor lists. Auditable changes." | uc06_bank_export_overlap | | Marketing / RevOps agency | "Dedupe lead lists. Standardize phones across vendors." | uc13_combined_lead_sources | Generic copy competes with `pip install pandas`. Vertical copy competes with nothing. ### 2.3a Top pain points per niche The "what does this actually fix?" question. Each pain point below is sourced from operator-domain knowledge of these markets and the buyer-use-case research already captured in `BUSINESS.md §4a`. Pain points are ranked by **frequency × dollar impact** for that persona — high-frequency / high-cost pains lead the landing-page copy and the demo dataset. > **Validation gap (honest disclaimer)**: these pains are derived from > operator knowledge of the categories, not from a sample of buyer > interviews. Per `BUSINESS.md §8` (no-touch constraint review at $5k/mo > MRR), validate the top-3 per persona via 5 buyer interviews before the > first $200 of paid acquisition spend. If any pain ranks below the > assumed level, swap it for the next-highest in this list. #### Shopify operator (priority: pet supplies) | # | Pain | $ / time impact | Tools that fix it | |---|------|-----------------|---| | S1 | **Klaviyo / Mailchimp / Omnisend per-contact billing.** Subscriber list with 10–18 % duplicate rate (case drift, plus signs in Gmail addresses, multiple devices) → recurring overpay forever. | $30–300/mo per percent of dupes on a 50 k list — recurring | Dedup + Format Standardize (email canonicalization) + Pipeline (re-run weekly) | | S2 | **Product feed rejected by Google Merchant Center / Meta Catalog.** Smart quotes in titles, NBSP in SKU, inconsistent attributes; campaign launch delayed 24–72 h while feed gets fixed. | 1–3 days delayed launch × campaign value | Text Cleaner + Format Standardize | | S3 | **Multi-channel order consolidation.** Shopify + Etsy + Amazon + Faire + wholesale spreadsheet, each with a different column for "customer email" / "order total" / "ship country". | 4–8 hr / month manually merging | Column Mapper + Dedup + Pipeline | | S4 | **Subscription identity fragmentation.** Pet-box subscribers cancel and re-sub under a different email; cohort analysis says churn is 20 % when it's actually 12 % — pricing decisions wrong. | Mis-priced LTV → over- or under-paid acquisition | Dedup with `merge=true` survivor | | S5 | **International tax / VAT MOSS compliance.** Country column is `UK` / `U.K.` / `United Kingdom` / `GB` in the same export; VAT report breaks. Phone formats per region break call-center routing. | Compliance penalty risk + ops friction | Format Standardize (per-row country) + Column Mapper | #### Bookkeeper / freelance accountant | # | Pain | $ / time impact | Tools that fix it | |---|------|-----------------|---| | B1 | **Bank-export month-overlap re-import.** Same transaction posts twice when Jan and Feb exports overlap at the boundary; client's books understate cash by 1–4 %. | 2–4 hr / month / client + reconciliation errors | Dedup with explicit Date+Amount+fuzzy Vendor strategy | | B2 | **QBO / Xero vendor consolidation for 1099 reports.** "Amazon" / "amazon.com" / "AMAZON.COM*4F2X9" become 3 vendors; 1099 reports break, P&L by vendor unusable. | 1–2 hr / 1099 cycle + IRS-paper-trail risk | Format Standardize (name canonicalization) + Dedup | | B3 | **Liability / professional indemnity.** Cannot use AI tools that don't show their work; client audit response window is 24–48 h. | Per-firm liability premium ≈ $500–2,500 / yr | Audit log built into every tool — every change row-logged | | B4 | **Per-license-not-per-client economics.** Most cleanup tools are per-seat / per-client SaaS; bookkeepers managing 10–30 clients hit price walls fast. | $30/mo × N clients vs. $49 once | Desktop license, no per-client constraint | | B5 | **Multi-currency books.** US-domiciled clients with EU customers; comma-decimal amounts (`€1.234,56`) crash standard parsers; parens-negative (`($89.50)`) treated as positive. | 30–60 min per multi-currency client per month | Format Standardize (`currency_decimal=auto`, parens-negative) | #### Marketing / RevOps agency | # | Pain | $ / time impact | Tools that fix it | |---|------|-----------------|---| | R1 | **HubSpot / Marketo / Iterable per-contact tier pricing.** 10 k contacts → enterprise tier at $4–8 k/mo. Every duplicate is a recurring tax. | $200–800 / month per 1 k duplicate contacts — recurring | Dedup with cross-source merge + Pipeline | | R2 | **Email-deliverability / sender reputation.** Sending to invalid or duplicate addresses tanks reputation; recovery takes weeks. | Catastrophic — entire email programme degraded | Format Standardize (email canonicalization) + Missing (sentinel detection) | | R3 | **GDPR / contact-data privacy.** Uploading lead data to a third-party cleaning SaaS is itself a GDPR concern; legal review blocks adoption. | Compliance risk + 4–8 wk legal-review delay | Local-only desktop app, zero outbound calls | | R4 | **Multi-vendor lead-source unification.** Apollo, ZoomInfo, LinkedIn Sales Nav, manual scrapes — each export has different headers, scoring, country format. | 1–3 days per campaign of manual unification | Column Mapper (alias matching) + Format Standardize (per-row country) + Dedup | | R5 | **Suppression-list management across 5+ platforms.** Each platform has its own format; un-deduped suppression lists let opt-outs slip through, triggering CAN-SPAM / GDPR exposure. | Compliance risk + churn-back cost | Pipeline saved as JSON, re-run on each new suppression batch | ### 2.4 Operationalize the moat the docs already name. Three durable advantages, each promoted from buried feature to landing-page H1: - **Quality**: 1 GB international standardization in ~2.5 minutes, locally. Excel can't do this; OpenRefine fights you for an hour. - **Privacy**: "Your data never leaves this computer." Already in the GUI footer — promote to landing-page lead, screenshot the empty network tab. - **Update cadence**: ship a v1.1 patch within 30 days of v1.0 launch. Not features — *evidence* the product is alive. "Added Czech Republic phone format support" beats "no updates in 6 months" every time. ### 2.5 Surface the audit-trail feature in sales copy. Every tool has a structured audit log. Most cleaning tools do not. Bookkeepers and consultants get fired if they can't show what changed to a client. The audit feature is currently invisible on every proposed landing page and should be the **second-largest callout** — right after "runs locally." Copy seed: *"Every change auditable. Hand the audit CSV to your client with the cleaned file."* ### 2.6 The Pipeline Runner is the retention multiplier. A buyer with a saved pipeline isn't a one-off purchase — they're a recurring user who recommends the product. This is exactly the behavioural lever the no-touch constraint needs (DECISIONS.md §8 trigger). Build it now (see §2.1 exception). ### 2.7 Add a $199 "priority support" tier post-launch. Same code, async-email SLA (24 h response). Targets the bookkeeper / consultant persona whose own time is $300/hr. Zero new product work, ~3× ARPU on 5–10 % of buyers. Lock the SLA to **async only** so the no-touch constraint isn't violated. Defer until $5 k/mo MRR (the same trigger DECISIONS.md §8 already names). ### 2.8 Dependency-aware pipeline UX. Tools have soft execution-order preferences (Text Clean before Format Standardize, Format before Dedup, Missing before Dedup). The Pipeline Runner *recommends* the order, *warns* on reversals, and **never forces** — the user owns their workflow. Implementation: see `src/core/pipeline.py` `SOFT_DEPENDENCIES`. ## 3. 90-day execution sequence | Week | Action | Done when | |---|---|---| | 1 | PyInstaller pipeline · Mac/Win unsigned installers · Apple Dev Program enrollment (1–2 wk lead) | `dist/datatools-mac.dmg` and `dist/datatools-win.exe` install on a clean machine | | 2 | Demo deployed to Streamlit Cloud · landing page v1 with embedded demo · 3 persona datasets in the demo | Public URL serves a working pipeline run on a sample dataset in < 30 s | | 3 | Gumroad listing live · share value-first in 3 niche communities (no pitch) · 1 long-tail SEO post for the lead persona | First listing impression captured · post not removed for self-promotion | | 4 | Pipeline Runner v1.0 shipped (this week, 2026-05-01 — exception per §2.1) · v1.1 patch announced with Tool 09 + intl improvements | Pipeline saves/loads JSON · 3 demo pipelines preloaded | | 5–8 | Bookkeeper landing page · agency landing page · second tool's promo cycle · priority-support tier added (defer purchase until §2.7 trigger) | Three live landing pages with distinct H1, demo dataset, conversion target | | 9–13 | Tool 06–08 only **if** revenue trajectory supports continued investment · otherwise more market work on the existing 5 + 09 | Decision made on 13 Aug 2026 with revenue data, not feature ambition | ## 4. Decision triggers (re-evaluation prompts) These flip the plan, not the underlying criteria: | Trigger | Reaction | |---|---| | First paying customer in week 4–13 | Continue. Plan is working. | | **Zero** paid in 90 days | Audit the funnel. Demo conversion? Niche fit? Price? Don't add features. | | $5 k/mo MRR | DECISIONS.md §8 trigger fires: revisit async + priority-support tier. | | Marketplace policy / shutdown | Switch to own-domain Stripe immediately; landing pages are already self-hosted. | | Streamlit hard direction change | Low-probability re-lock per DECISIONS.md §8. Tk fallback is documented. | ## 5. Anti-temptations (things the plan refuses) - **More tools before more buyers.** Locked. Exception only for Pipeline Runner per §2.1. - **SaaS pivot.** Recurring infra conflicts with the lifestyle constraint (DECISIONS.md §4). - **Live chat / sales calls.** Conflicts with no-touch (DECISIONS.md §1 #8). - **Custom integrations / one-off consulting.** $300/hr looks tempting; breaks the "build once, sell many" model that justifies the entire strategy. - **Going broad on personas.** "All small businesses" is a generic landing page that converts at 1 %; "Shopify pet-supply operators with 1k–50k customers" converts at 5–15 % in the right communities. ## 6. What this plan deliberately leaves open - Whether tools 06–08 ever ship. Decided on revenue, not roadmap. - Whether to add a fourth niche landing page. Decided on which of the three is producing. - Whether to invest in own-domain SEO. Compounding 6–18 mo asset; not the early-stage channel. Revisit when marketplace + community produces baseline traffic to optimise. - Whether to add a Notion / Slack support community. If support volume per 100 sales > 10 (BUSINESS.md §12 target), revisit; else leave async-email only.