feat: 3 new tools, format streaming, distribution-ready demo + landing pages

Tools shipped this batch (4 → 6 of 9 Ready):
  04 Missing Value Handler   src/core/missing.py + cli_missing.py + GUI
  05 Column Mapper           src/core/column_mapper.py + cli_column_map.py + GUI
  09 Pipeline Runner         src/core/pipeline.py + cli_pipeline.py + GUI
                             with soft tool-dependency graph (recommended,
                             not enforced) and JSON save/load for repeatable
                             weekly cleanups.

Format Standardizer reworked for 1 GB international files:
  • Vectorised dispatch + LRU cache over phone/date/currency/boolean/email
  • Per-row country / address columns drive parsing
  • Audit cap (default 10 k rows, ~50 MB RAM)
  • standardize_file(): chunked streaming entry point (~165 k rows/sec)
  • currency_decimal="auto" for EU comma-decimal locales
  • R$ / kr / zł multi-char currency prefixes
  • cli_format.py with auto-stream above 100 MB inputs

Encoding detection arbiter + language-aware probe:
  Closes the last 4 xfails (cp1250 / mac_iceland / shift_jis_2004 / lying-BOM)
  via tied-confidence arbiter + Cyrillic / EE-Latin coverage probes.

Distribution-readiness assets:
  • streamlit_app.py — Streamlit Community Cloud entry shim
  • src/gui/app_demo.py — single-page demo, ?p=<persona> routing,
    100-row cap + watermark, free-vs-paid boundary enforced at surface
  • samples/demo/ — 3 niche datasets + pre-tuned pipeline JSONs
  • landing/ — 4 static HTML pages (apex chooser + 3 niche),
    shared CSS, deploy.py URL-substitution script,
    auto-generated robots.txt + sitemap.xml + 404.html + favicon
  • docs/PLAN.md, DEMO-PLAN.md, DEPLOYMENT.md, POST-LAUNCH.md, NEXT-STEPS.md
    — full strategy + measurement + deployment + master checklist

Test counts:
  before: 1,520 passed · 4 skipped · 17 xfailed
  after:  1,729 passed · 0 skipped · 0  xfailed

Tier-1 corpora added:
  • missing-corpus           3 use cases + 16 edge cases
  • column-mapper-corpus     3 use cases + 5 edge cases
  • format-cleaner intl      20-row 13-country stress fixture

Engine hardening flushed out by the corpora:
  • interpolate guards against object-dtype columns
  • mean/median skip all-NaN columns (silences numpy warning)
  • fillna runs under future.no_silent_downcasting (silences pandas warning)
  • mojibake test no longer skips when ftfy installed (monkeypatch path)
  • drop-row threshold semantics: strict-greater (consistent across rows / cols)
  • currency_decimal validator allow-set updated for "auto"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-01 22:31:26 +00:00
parent d18b95880d
commit 966af8ef94
89 changed files with 12039 additions and 284 deletions

236
landing/index.html Normal file
View File

@@ -0,0 +1,236 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>DataTools — Local CSV / Excel Cleaning for Shopify, Bookkeepers, and RevOps</title>
<meta name="description" content="One desktop tool. Three workflows. Clean Shopify customer exports, reconcile messy bank statements, or dedupe lead lists across HubSpot and LinkedIn — all locally. $49 one-time." />
<link rel="canonical" href="https://datatools.app/" />
<link rel="stylesheet" href="_shared/styles.css" />
<meta property="og:title" content="DataTools — Local CSV / Excel Cleaning" />
<meta property="og:description" content="One desktop tool, three niche workflows. Runs entirely offline. $49 one-time." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://datatools.app/" />
<style>
/* Apex-pageonly tweaks: persona cards are slightly bigger and use
per-card accent borders so the visitor visually identifies which
card matches their work in <2 seconds. */
.persona-grid {
display: grid; gap: 24px;
grid-template-columns: repeat(auto-fit, minmax(300px, 1fr));
margin-top: 28px;
}
.persona-card {
background: var(--surface);
border: 1px solid var(--rule);
border-radius: var(--radius);
padding: 28px;
display: flex; flex-direction: column;
transition: transform 0.08s ease, border-color 0.15s ease, box-shadow 0.2s ease;
text-decoration: none;
color: inherit;
}
.persona-card:hover {
transform: translateY(-2px);
border-color: var(--card-accent, var(--accent));
box-shadow: var(--shadow);
text-decoration: none;
}
.persona-card.shopify { --card-accent: #6ee7b7; }
.persona-card.bookkeeper{ --card-accent: #7dd3fc; }
.persona-card.revops { --card-accent: #c4b5fd; }
.persona-card .pill {
display: inline-block;
background: rgba(255,255,255,0.04);
color: var(--card-accent, var(--accent));
border: 1px solid var(--card-accent, var(--accent));
padding: 4px 10px; border-radius: 999px;
font-size: 12px; font-weight: 600;
letter-spacing: 0.04em;
margin-bottom: 12px;
align-self: flex-start;
}
.persona-card h3 {
color: var(--text);
font-size: 22px;
margin-bottom: 12px;
}
.persona-card p {
color: var(--text-soft);
flex: 1;
margin-bottom: 16px;
}
.persona-card .pain {
font-size: 14px; color: var(--text-mute);
margin: 8px 0 18px;
}
.persona-card .pain li { margin-bottom: 4px; }
.persona-card .open {
color: var(--card-accent, var(--accent));
font-weight: 600;
font-size: 15px;
}
.persona-card .open::after {
content: " →";
transition: margin-left 0.15s ease;
}
.persona-card:hover .open::after { margin-left: 4px; }
</style>
</head>
<body>
<!-- Sticky brand bar (no buy CTA on the apex — visitor hasn't picked a niche yet) -->
<div class="buybar">
<div class="buybar-inner">
<div class="brand"><span class="brand-mark"></span> DataTools</div>
<div>
<span class="price-tag">Pick your workflow ↓</span>
</div>
</div>
</div>
<section class="hero">
<div class="container">
<div class="eyebrow">For Shopify operators · bookkeepers · marketing & RevOps agencies</div>
<h1>Local CSV / Excel cleaning.<br /><strong>One tool. Three workflows.</strong></h1>
<p class="lead">
DataTools is a desktop app that fixes the data-cleaning headaches
every small business hits — duplicates Excel can't catch,
international phones it can't parse, dates and currencies in three
different formats per export. One $49 download. Works on Mac,
Windows, and Linux. <strong>Your data never leaves your
computer.</strong>
</p>
<div class="persona-grid">
<a class="persona-card shopify" href="shopify-pet/">
<span class="pill">🛍️ Shopify operator</span>
<h3>Customer / vendor / subscriber export cleanup</h3>
<p>
Klaviyo-import-ready customer lists in 30 seconds. Catches
cross-device duplicates, standardizes international phones
and addresses, fixes the disguised nulls that break product
feeds.
</p>
<ul class="pain">
<li>· Fix Klaviyo per-contact billing on phantom dupes</li>
<li>· Repair feeds rejected by Google Merchant / Meta</li>
<li>· Unify orders from Shopify + Etsy + Amazon + Faire</li>
<li>· Resolve VAT-MOSS country-name drift</li>
</ul>
<span class="open">Open the Shopify demo &amp; pricing</span>
</a>
<a class="persona-card bookkeeper" href="bookkeeper/">
<span class="pill">📒 Bookkeeper / accountant</span>
<h3>Bank-export reconciliation with audit trail</h3>
<p>
Catches the duplicate transaction QuickBooks imported twice
when Jan and Feb exports overlap. Standardizes dates,
amounts, and vendor casing. Hands you a row-level audit log
to share with the client.
</p>
<ul class="pain">
<li>· Catch month-overlap re-import dupes</li>
<li>· Consolidate vendors for clean 1099 reports</li>
<li>· Produce hand-off-ready audit trail</li>
<li>· Multi-currency books (EUR / GBP / BRL)</li>
</ul>
<span class="open">Open the bookkeeper demo &amp; pricing</span>
</a>
<a class="persona-card revops" href="revops/">
<span class="pill">🪢 Marketing / RevOps</span>
<h3>Lead-list dedup across HubSpot, LinkedIn, scrapes</h3>
<p>
One canonical lead per real person — across HubSpot,
LinkedIn, Apollo, ZoomInfo, and manual scrapes.
International phones (50+ country codes), per-row country
column, fuzzy match with merge.
</p>
<ul class="pain">
<li>· Stop paying HubSpot tier price for cross-source dupes</li>
<li>· Protect sender reputation from invalid emails</li>
<li>· Skip the 48 wk GDPR review on cloud cleaners</li>
<li>· Suppression-list sync across 5+ platforms</li>
</ul>
<span class="open">Open the RevOps demo &amp; pricing</span>
</a>
</div>
</div>
</section>
<section>
<div class="container">
<div class="eyebrow">What's the same across all three</div>
<h2>One engine. Same six tools. Same $49.</h2>
<p>
The persona pages above are positioning, not different products.
Whichever you buy, you get the full bundle: Deduplicator, Text
Cleaner, Format Standardizer, Missing-Value Handler, Column
Mapper, and Pipeline Runner — pre-tuned with a saved pipeline
that matches your workflow.
</p>
<div class="grid">
<div class="card">
<span class="icon">🔒</span>
<h3>Local-first</h3>
<p>Desktop app. No cloud upload, no SaaS account, no subscription. Verify zero outbound calls in your browser's network tab.</p>
</div>
<div class="card">
<span class="icon">📋</span>
<h3>Auditable</h3>
<p>Every cell change is logged with the original value, the new value, and which rule fired. Hand the audit CSV to your client.</p>
</div>
<div class="card">
<span class="icon">🌍</span>
<h3>International</h3>
<p>50+ country codes, per-row country awareness, EU comma decimals, parens-negative amounts, locale-aware month names.</p>
</div>
<div class="card">
<span class="icon">⚙️</span>
<h3>Repeatable</h3>
<p>Save your cleanup as a JSON pipeline. Re-run on next week's export with one CLI command. Same cleanup, zero re-config.</p>
</div>
<div class="card">
<span class="icon">📦</span>
<h3>Cross-platform</h3>
<p>Mac · Windows · Linux installers. Code-signed for macOS Gatekeeper. Free updates for the v1.x line.</p>
</div>
<div class="card">
<span class="icon">💰</span>
<h3>$49 one-time</h3>
<p>No subscription. No per-client license. No row caps. No AI black-box.</p>
</div>
</div>
</div>
</section>
<section>
<div class="container" style="text-align: center;">
<h2>Pick your workflow above to try the live demo.</h2>
<p class="muted">Or read the docs first — every tool has a CLI, every pipeline is JSON, every change is audited.</p>
</div>
</section>
<footer>
<div class="container">
<div>
<p><strong>DataTools</strong> — local data-cleaning for Shopify, bookkeepers, and RevOps teams.</p>
<p class="muted">© 2026 · Built solo · Shipped from a small office.</p>
</div>
<div>
<p>
<a href="shopify-pet/">For Shopify operators</a> ·
<a href="bookkeeper/">For bookkeepers</a> ·
<a href="revops/">For RevOps agencies</a><br />
<a href="mailto:hello@datatools.app">Email support</a>
</p>
</div>
</div>
</footer>
</body>
</html>