Files
datatools-dev/docs/BUSINESS.md
Michael abb720997e docs: tight, scannable rewrite — every item earns its place
Refactors all 10 docs (README, USER-GUIDE, CLI-REFERENCE, REQUIREMENTS,
TECHNICAL, DEVELOPER, BUSINESS, DECISIONS, RECOVERY, docs/README) from
prose-heavy to bullet-heavy + table-heavy. Same information density,
significantly less reading load.

Net: 2600 → 1652 lines (~37% reduction) WHILE adding the new content
that landed since v1.6:

- Format Standardizer (3rd Ready tool)
- 199-row buyer corpus
- src/core/errors.py structured hierarchy + ensure_dataframe /
  ensure_choice / wrap_file_read|write / format_for_user helpers
- src/core/_constants.py shared USPS/state lookup tables
- Cross-tool audit fixes (NaN matching, removed_df schema, validation,
  enum-bounds checks, forward-compat config)
- Per-domain error_policy across format standardizers
- Inconsistent-date-format detector
- Excel header-row auto-detection + write_file delimiter param

Per-doc changes:

- README.md (175 → 71): 9-tool table at top, status column, 3 CLI
  entry points listed, dropped repeated marketing prose.
- docs/README.md (38 → 27): pure index — buyer-facing vs creator-only
  split + version footer.
- USER-GUIDE.md (208 → 118): tool table replaces script descriptions,
  troubleshooting compressed to bullets, gate explanation tightened.
- CLI-REFERENCE.md (451 → 235): collapsed flag tables, removed
  redundant intro text, kept full recipes section.
- REQUIREMENTS.md (146 → 129): 18 numbered sections (was 17), added
  §18 Error Handling, formatting tightened to single-line entries.
- TECHNICAL.md (570 → 350): collapsed §3 build pipeline tables, merged
  redundant §3.5-3.7 OS sections, added §7 (Error handling) +
  §11.3 (Format Standardizer spec) + §11.4-11.7 (analyzer / gate /
  Review page / repair_bytes promoted from §10.2.x sub-numbering).
- DEVELOPER.md (285 → 161): module map table replaces per-file prose,
  extension recipes condensed, new §Errors covers when to use each
  hierarchy class.
- BUSINESS.md (278 → 225): collapsed prose to tables (use cases,
  competitive landscape, costs, risks); honest-status updated.
- DECISIONS.md (269 → 189): scoring rubric + GUI matrix preserved,
  decision log compressed to single-line entries, added v1.6 entries
  (Format Standardizer Ready, errors module).
- RECOVERY.md (180 → 147): rebuild steps as numbered + tabular,
  external dependencies as one table, recovery priorities tightened.

No information removed; redundancy compressed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 02:49:29 +00:00

9.9 KiB
Raw Blame History

Business

Creator-only. Do not ship to buyers. Version: 1.6 · Updated: 2026-05-01 · Owner: Michael

1. Executive summary

Sell niche Python automation tools as one-time downloadable digital products. Target non-technical users who hate Excel/CSV grunt work but can't code. Distribute via Gumroad / Lemon Squeezy with automated delivery. Cross-platform from launch. Each bundle ships GUI (primary, browser-local) + CLI.

  • Pricing: $49-79 per bundle · $149 full suite (when 3+ exist).
  • Goal: lifestyle cashflow. No saleable-asset exit required.

2. Market opportunity

  • Persistent, evergreen pain: data cleaning is universal.
  • Low competition in vertical niches (Shopify pet-supplies feeds vs. generic CSV cleaners).
  • ~100% gross margin after creation.
  • Hosted browser demo as try-before-buy conversion lever (added v1.3).

Timing reality: marketplaces + community posts → days/weeks to first sale. Own-domain SEO is a 6-18 month compounding asset, not an early channel.

3. Target customers

Primary:

  • Shopify owners (Pet Supplies = priority niche).
  • Small business owners needing reporting + finance.
  • Freelancers / consultants handling client data.
  • Local marketing agencies.

Anti-personas:

  • Enterprise data teams (build their own).
  • Pure technical buyers (pip install something free).

4. Product strategy

Lead: Excel & CSV Data Cleaning Mastery Bundle (highest pain, broadest demand).

Roadmap:

  1. Data Cleaning Mastery (in progress)
  2. Automated Business Reporting
  3. Ecommerce Data Pipeline
  4. Small Business Finance
  5. Marketing Public Data Aggregation
  6. AI Ecommerce Aggregation — Shopify Pet Supplies

Sequence rule: don't start bundle 2 until bundle 1 has paying customers + one external review. Five parallel skeletons is a known failure mode.

Surface: desktop install per OS (PyInstaller) with Streamlit GUI + CLI. Constrained demo on Streamlit Community Cloud.

4a. Lead bundle — Deduplicator

Highest pain density across all 4 personas. Feeds landing copy, demo design, feature priority. Tech spec: TECHNICAL.md §11.1.

Use cases by persona

Shopify:

  1. Customer list cleanup (john@gmail.com vs John@Gmail.com vs j.ohn@gmail.com).
  2. Product catalog dedup (SKU whitespace, near-identical names).
  3. Abandoned-cart cleanup before re-engagement.
  4. Order export consolidation across channels.
  5. Subscriber-list hygiene before Klaviyo / Mailchimp import (per-contact pricing).

Bookkeeper: 6. Bank export reconciliation across overlapping date ranges. 7. Vendor list consolidation across QB + spreadsheets + email. 8. Customer master cleanup pre-invoicing migration. 9. Expense report dedup (same receipt twice).

Freelancer: 10. Pre-analysis cleanup of client dumps. 11. Survey response dedup (same respondent, multiple devices). 12. Lead list cleanup before client handoff.

Marketing agency: 13. Email-list dedup across lead sources. 14. Multi-platform audience reconciliation. 15. Suppression-list management.

Highest pain × frequency: 1, 5, 6, 13. Build feature set + demo dataset + landing copy around these.

Competitive landscape

Tool Price Strength Weakness
Excel Remove Duplicates Free Universal, zero install Exact only. No fuzzy. No audit.
Pandas drop_duplicates Free Powerful Requires Python.
OpenRefine Free Powerful clustering Steep curve, dated GUI.
Dedupe.io $30+/mo ML fuzzy Recurring + cloud upload.
WinPure / Data Ladder $300-2000+ Enterprise Wrong tier.
Power Query Free Integrated Exact only without M-code.

Market gap

Fuzzy match quality of OpenRefine, with the zero-learning UX of Excel, sold once for under $100, runs locally.

Defensible only if fuzzy matching works without docs. Mediocre fuzzy → loses to free Excel. Learning required → loses to free OpenRefine. Tier 1 spec (TECHNICAL.md §11.1) is the minimum viable feature set to occupy this gap.

5. Pricing

Tier Price Notes
Single bundle $49-79 Impulse-purchase sweet spot for solo operators
Full suite (3+ bundles) $149 Anchor; drives bundle attach

Why: < $99 avoids procurement friction. > $99 triggers SaaS-support expectations that conflict with no-touch. < $30 competes with free, signals "toy".

6. Revenue targets

Horizon Target Notes
90 days First paying customer Validates funnel, not business
6 months $1k-3k/mo Lead bundle + marketplace + demo
12 months $5k/mo Triggers "fully async" revisit
24 months $10k/mo Stretch. Needs hit product or 3+ bundles compounding

$20k+/mo achievable but requires audience/brand asset that operator constraints exclude.

7. Marketing

Channels (priority order, early stage)

  1. Hosted browser demo — free Streamlit Community Cloud, linked from every listing. Direct conversion lever for digital downloads where buyers can't evaluate quality otherwise.
  2. Marketplace listings — Gumroad search, Lemon Squeezy directory, GitHub.
  3. Niche communities — value-first posts in subreddits, Indie Hackers, niche Slack/Discord. Demo doubles as the shareable asset.
  4. Programmatic SEO — long-tail landing pages (compounds over months).
  5. Strong GitHub README as discovery surface.

Demo design

  • Same core engine as paid product, GUI-only.
  • Constraints: row limit (100), output watermark, sample dataset preloaded + small upload (capped).
  • Persistent CTA: "Like what you see? Get the full version for $49 →".
  • No login. Friction kills conversion.
  • Streamlit Community Cloud (free) at launch. $5/mo VPS if rate-limited.

Target keywords

python csv cleaner bundle · excel data cleaning scripts · automated data deduplicator python · csv duplicate removal tool · shopify product feed cleaner.

Funnel

Discovery → Demo (try-before-buy) → Landing page → Gumroad → Stripe → automated email delivery → upsell sequence to next bundle.

Support

Self-serve docs in every download. Email only. No live chat, no calls.

8. The "fully async, no-touch" constraint

Locked criteria require automated, no-touch marketing + sales. Long-term steady state.

Revisit trigger: $5k/mo MRR.

Why: pre-PMF, the no-touch rule excludes the channels most likely to produce first traction (founder outreach to 50 Shopify pet operators, community participation, customer interviews). Strict adherence may cost more revenue than it saves time.

Action at trigger: re-evaluate selective non-async (e.g., 2 hr/wk community participation) vs. additional bundle dev. Decision lives with the operator; this just flags the trigger.

9. Cost structure

Recurring monthly cap: $1,200.

Item Cost
Gumroad / Lemon Squeezy fees ~10% of revenue
Domain ~$15/yr
Landing-page hosting $0-20/mo (static via Cloudflare/Netlify/GH Pages)
Demo hosting $0 at launch (Streamlit Community Cloud); plan $5-10/mo VPS migration
Email service $0-30/mo
Apple Developer Program $99/yr (~$8/mo)
Inno Setup, PyInstaller, Python Free
Total fixed monthly ~$30-70/mo

Headroom enables optional ad spend ($100-200/mo) once a bundle has proven conversion data.

10. macOS code signing

Cost: $99/yr to Apple Developer Program. Decision: pay it.

Why required: macOS Gatekeeper hard-blocks unsigned apps with "This app cannot be opened because the developer cannot be verified" — the only obvious button is "Move to Trash." The bypass (right-click → Open) exists but the target buyer won't perform it.

What $99 buys: code-signing certificate (removes hard block) + notarization service (removes "downloaded from internet" warning). Result: clean double-click experience.

Setup: Apple ID + government ID (individuals) or D-U-N-S number (orgs). First approval takes 1-2 weeks. Once approved, sign + notarize is automated in CI.

11. Risks & mitigation

Risk Mitigation
Free GitHub scripts commoditize Niche verticals + polished GUI + cross-platform installers + hosted demo
Slow early traction Lead with demo + marketplaces + communities, not own-domain SEO
Refund chargebacks Clear scope on landing, demo lets buyers validate, working samples included
macOS install friction Apple Dev Program ($99/yr), code sign + notarize
Browser-launch UX confusion One sentence in installer + email; persistent in-app "runs locally" message; pywebview wrap as v1.1 if needed
Support burden Robust installers, idiot-proof docs, sample data included
IP theft / resale License file. Accept partial protection; focus on staying ahead via updates
Marketplace policy change Multi-marketplace day 1; own domain as fallback
Streamlit direction change Low probability; flagged as criteria-relock trigger in DECISIONS §8

12. Success metrics (monthly)

  • Units sold per bundle.
  • Conversion rate (landing → purchase).
  • Demo-to-purchase rate (added v1.3): demo visits → Gumroad clicks → purchases.
  • Refund rate (target < 5%).
  • Support tickets / 100 sales (target < 10).
  • Organic traffic to product pages.
  • Per-platform install success.

13. Honest status (2026-05-01)

  • 3 of 9 tools shipped (Dedup, Text Cleaner, Format Standardizer).
  • Cross-platform build pipeline designed, not yet built.
  • macOS code signing not yet set up.
  • Streamlit GUI shipped for the 3 ready tools.
  • Hosted demo not yet deployed.
  • No paying customers.
  • No live landing page.

Next concrete steps before marketing spend:

  1. Stand up the PyInstaller pipeline with Streamlit launcher (1-3 days first time).
  2. Deploy constrained demo to Streamlit Community Cloud.
  3. Enroll in Apple Developer Program (start in parallel — 1-2 wk lead time).
  4. Single landing page for the lead bundle, demo prominently linked.
  5. Finish 2 more tools to Ready state (CLI + GUI).
  6. List on Gumroad with sample output proof, per-platform installers, demo link.