Tools shipped this batch (4 → 6 of 9 Ready):
04 Missing Value Handler src/core/missing.py + cli_missing.py + GUI
05 Column Mapper src/core/column_mapper.py + cli_column_map.py + GUI
09 Pipeline Runner src/core/pipeline.py + cli_pipeline.py + GUI
with soft tool-dependency graph (recommended,
not enforced) and JSON save/load for repeatable
weekly cleanups.
Format Standardizer reworked for 1 GB international files:
• Vectorised dispatch + LRU cache over phone/date/currency/boolean/email
• Per-row country / address columns drive parsing
• Audit cap (default 10 k rows, ~50 MB RAM)
• standardize_file(): chunked streaming entry point (~165 k rows/sec)
• currency_decimal="auto" for EU comma-decimal locales
• R$ / kr / zł multi-char currency prefixes
• cli_format.py with auto-stream above 100 MB inputs
Encoding detection arbiter + language-aware probe:
Closes the last 4 xfails (cp1250 / mac_iceland / shift_jis_2004 / lying-BOM)
via tied-confidence arbiter + Cyrillic / EE-Latin coverage probes.
Distribution-readiness assets:
• streamlit_app.py — Streamlit Community Cloud entry shim
• src/gui/app_demo.py — single-page demo, ?p=<persona> routing,
100-row cap + watermark, free-vs-paid boundary enforced at surface
• samples/demo/ — 3 niche datasets + pre-tuned pipeline JSONs
• landing/ — 4 static HTML pages (apex chooser + 3 niche),
shared CSS, deploy.py URL-substitution script,
auto-generated robots.txt + sitemap.xml + 404.html + favicon
• docs/PLAN.md, DEMO-PLAN.md, DEPLOYMENT.md, POST-LAUNCH.md, NEXT-STEPS.md
— full strategy + measurement + deployment + master checklist
Test counts:
before: 1,520 passed · 4 skipped · 17 xfailed
after: 1,729 passed · 0 skipped · 0 xfailed
Tier-1 corpora added:
• missing-corpus 3 use cases + 16 edge cases
• column-mapper-corpus 3 use cases + 5 edge cases
• format-cleaner intl 20-row 13-country stress fixture
Engine hardening flushed out by the corpora:
• interpolate guards against object-dtype columns
• mean/median skip all-NaN columns (silences numpy warning)
• fillna runs under future.no_silent_downcasting (silences pandas warning)
• mojibake test no longer skips when ftfy installed (monkeypatch path)
• drop-row threshold semantics: strict-greater (consistent across rows / cols)
• currency_decimal validator allow-set updated for "auto"
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
237 lines
9.0 KiB
HTML
237 lines
9.0 KiB
HTML
<!DOCTYPE html>
|
||
<html lang="en">
|
||
<head>
|
||
<meta charset="utf-8" />
|
||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||
<title>DataTools — Local CSV / Excel Cleaning for Shopify, Bookkeepers, and RevOps</title>
|
||
<meta name="description" content="One desktop tool. Three workflows. Clean Shopify customer exports, reconcile messy bank statements, or dedupe lead lists across HubSpot and LinkedIn — all locally. $49 one-time." />
|
||
<link rel="canonical" href="https://datatools.app/" />
|
||
<link rel="stylesheet" href="_shared/styles.css" />
|
||
|
||
<meta property="og:title" content="DataTools — Local CSV / Excel Cleaning" />
|
||
<meta property="og:description" content="One desktop tool, three niche workflows. Runs entirely offline. $49 one-time." />
|
||
<meta property="og:type" content="website" />
|
||
<meta property="og:url" content="https://datatools.app/" />
|
||
|
||
<style>
|
||
/* Apex-page–only tweaks: persona cards are slightly bigger and use
|
||
per-card accent borders so the visitor visually identifies which
|
||
card matches their work in <2 seconds. */
|
||
.persona-grid {
|
||
display: grid; gap: 24px;
|
||
grid-template-columns: repeat(auto-fit, minmax(300px, 1fr));
|
||
margin-top: 28px;
|
||
}
|
||
.persona-card {
|
||
background: var(--surface);
|
||
border: 1px solid var(--rule);
|
||
border-radius: var(--radius);
|
||
padding: 28px;
|
||
display: flex; flex-direction: column;
|
||
transition: transform 0.08s ease, border-color 0.15s ease, box-shadow 0.2s ease;
|
||
text-decoration: none;
|
||
color: inherit;
|
||
}
|
||
.persona-card:hover {
|
||
transform: translateY(-2px);
|
||
border-color: var(--card-accent, var(--accent));
|
||
box-shadow: var(--shadow);
|
||
text-decoration: none;
|
||
}
|
||
.persona-card.shopify { --card-accent: #6ee7b7; }
|
||
.persona-card.bookkeeper{ --card-accent: #7dd3fc; }
|
||
.persona-card.revops { --card-accent: #c4b5fd; }
|
||
.persona-card .pill {
|
||
display: inline-block;
|
||
background: rgba(255,255,255,0.04);
|
||
color: var(--card-accent, var(--accent));
|
||
border: 1px solid var(--card-accent, var(--accent));
|
||
padding: 4px 10px; border-radius: 999px;
|
||
font-size: 12px; font-weight: 600;
|
||
letter-spacing: 0.04em;
|
||
margin-bottom: 12px;
|
||
align-self: flex-start;
|
||
}
|
||
.persona-card h3 {
|
||
color: var(--text);
|
||
font-size: 22px;
|
||
margin-bottom: 12px;
|
||
}
|
||
.persona-card p {
|
||
color: var(--text-soft);
|
||
flex: 1;
|
||
margin-bottom: 16px;
|
||
}
|
||
.persona-card .pain {
|
||
font-size: 14px; color: var(--text-mute);
|
||
margin: 8px 0 18px;
|
||
}
|
||
.persona-card .pain li { margin-bottom: 4px; }
|
||
.persona-card .open {
|
||
color: var(--card-accent, var(--accent));
|
||
font-weight: 600;
|
||
font-size: 15px;
|
||
}
|
||
.persona-card .open::after {
|
||
content: " →";
|
||
transition: margin-left 0.15s ease;
|
||
}
|
||
.persona-card:hover .open::after { margin-left: 4px; }
|
||
</style>
|
||
</head>
|
||
<body>
|
||
|
||
<!-- Sticky brand bar (no buy CTA on the apex — visitor hasn't picked a niche yet) -->
|
||
<div class="buybar">
|
||
<div class="buybar-inner">
|
||
<div class="brand"><span class="brand-mark">●</span> DataTools</div>
|
||
<div>
|
||
<span class="price-tag">Pick your workflow ↓</span>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<section class="hero">
|
||
<div class="container">
|
||
<div class="eyebrow">For Shopify operators · bookkeepers · marketing & RevOps agencies</div>
|
||
<h1>Local CSV / Excel cleaning.<br /><strong>One tool. Three workflows.</strong></h1>
|
||
<p class="lead">
|
||
DataTools is a desktop app that fixes the data-cleaning headaches
|
||
every small business hits — duplicates Excel can't catch,
|
||
international phones it can't parse, dates and currencies in three
|
||
different formats per export. One $49 download. Works on Mac,
|
||
Windows, and Linux. <strong>Your data never leaves your
|
||
computer.</strong>
|
||
</p>
|
||
|
||
<div class="persona-grid">
|
||
<a class="persona-card shopify" href="shopify-pet/">
|
||
<span class="pill">🛍️ Shopify operator</span>
|
||
<h3>Customer / vendor / subscriber export cleanup</h3>
|
||
<p>
|
||
Klaviyo-import-ready customer lists in 30 seconds. Catches
|
||
cross-device duplicates, standardizes international phones
|
||
and addresses, fixes the disguised nulls that break product
|
||
feeds.
|
||
</p>
|
||
<ul class="pain">
|
||
<li>· Fix Klaviyo per-contact billing on phantom dupes</li>
|
||
<li>· Repair feeds rejected by Google Merchant / Meta</li>
|
||
<li>· Unify orders from Shopify + Etsy + Amazon + Faire</li>
|
||
<li>· Resolve VAT-MOSS country-name drift</li>
|
||
</ul>
|
||
<span class="open">Open the Shopify demo & pricing</span>
|
||
</a>
|
||
|
||
<a class="persona-card bookkeeper" href="bookkeeper/">
|
||
<span class="pill">📒 Bookkeeper / accountant</span>
|
||
<h3>Bank-export reconciliation with audit trail</h3>
|
||
<p>
|
||
Catches the duplicate transaction QuickBooks imported twice
|
||
when Jan and Feb exports overlap. Standardizes dates,
|
||
amounts, and vendor casing. Hands you a row-level audit log
|
||
to share with the client.
|
||
</p>
|
||
<ul class="pain">
|
||
<li>· Catch month-overlap re-import dupes</li>
|
||
<li>· Consolidate vendors for clean 1099 reports</li>
|
||
<li>· Produce hand-off-ready audit trail</li>
|
||
<li>· Multi-currency books (EUR / GBP / BRL)</li>
|
||
</ul>
|
||
<span class="open">Open the bookkeeper demo & pricing</span>
|
||
</a>
|
||
|
||
<a class="persona-card revops" href="revops/">
|
||
<span class="pill">🪢 Marketing / RevOps</span>
|
||
<h3>Lead-list dedup across HubSpot, LinkedIn, scrapes</h3>
|
||
<p>
|
||
One canonical lead per real person — across HubSpot,
|
||
LinkedIn, Apollo, ZoomInfo, and manual scrapes.
|
||
International phones (50+ country codes), per-row country
|
||
column, fuzzy match with merge.
|
||
</p>
|
||
<ul class="pain">
|
||
<li>· Stop paying HubSpot tier price for cross-source dupes</li>
|
||
<li>· Protect sender reputation from invalid emails</li>
|
||
<li>· Skip the 4–8 wk GDPR review on cloud cleaners</li>
|
||
<li>· Suppression-list sync across 5+ platforms</li>
|
||
</ul>
|
||
<span class="open">Open the RevOps demo & pricing</span>
|
||
</a>
|
||
</div>
|
||
</div>
|
||
</section>
|
||
|
||
<section>
|
||
<div class="container">
|
||
<div class="eyebrow">What's the same across all three</div>
|
||
<h2>One engine. Same six tools. Same $49.</h2>
|
||
<p>
|
||
The persona pages above are positioning, not different products.
|
||
Whichever you buy, you get the full bundle: Deduplicator, Text
|
||
Cleaner, Format Standardizer, Missing-Value Handler, Column
|
||
Mapper, and Pipeline Runner — pre-tuned with a saved pipeline
|
||
that matches your workflow.
|
||
</p>
|
||
<div class="grid">
|
||
<div class="card">
|
||
<span class="icon">🔒</span>
|
||
<h3>Local-first</h3>
|
||
<p>Desktop app. No cloud upload, no SaaS account, no subscription. Verify zero outbound calls in your browser's network tab.</p>
|
||
</div>
|
||
<div class="card">
|
||
<span class="icon">📋</span>
|
||
<h3>Auditable</h3>
|
||
<p>Every cell change is logged with the original value, the new value, and which rule fired. Hand the audit CSV to your client.</p>
|
||
</div>
|
||
<div class="card">
|
||
<span class="icon">🌍</span>
|
||
<h3>International</h3>
|
||
<p>50+ country codes, per-row country awareness, EU comma decimals, parens-negative amounts, locale-aware month names.</p>
|
||
</div>
|
||
<div class="card">
|
||
<span class="icon">⚙️</span>
|
||
<h3>Repeatable</h3>
|
||
<p>Save your cleanup as a JSON pipeline. Re-run on next week's export with one CLI command. Same cleanup, zero re-config.</p>
|
||
</div>
|
||
<div class="card">
|
||
<span class="icon">📦</span>
|
||
<h3>Cross-platform</h3>
|
||
<p>Mac · Windows · Linux installers. Code-signed for macOS Gatekeeper. Free updates for the v1.x line.</p>
|
||
</div>
|
||
<div class="card">
|
||
<span class="icon">💰</span>
|
||
<h3>$49 one-time</h3>
|
||
<p>No subscription. No per-client license. No row caps. No AI black-box.</p>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</section>
|
||
|
||
<section>
|
||
<div class="container" style="text-align: center;">
|
||
<h2>Pick your workflow above to try the live demo.</h2>
|
||
<p class="muted">Or read the docs first — every tool has a CLI, every pipeline is JSON, every change is audited.</p>
|
||
</div>
|
||
</section>
|
||
|
||
<footer>
|
||
<div class="container">
|
||
<div>
|
||
<p><strong>DataTools</strong> — local data-cleaning for Shopify, bookkeepers, and RevOps teams.</p>
|
||
<p class="muted">© 2026 · Built solo · Shipped from a small office.</p>
|
||
</div>
|
||
<div>
|
||
<p>
|
||
<a href="shopify-pet/">For Shopify operators</a> ·
|
||
<a href="bookkeeper/">For bookkeepers</a> ·
|
||
<a href="revops/">For RevOps agencies</a><br />
|
||
<a href="mailto:hello@datatools.app">Email support</a>
|
||
</p>
|
||
</div>
|
||
</div>
|
||
</footer>
|
||
|
||
</body>
|
||
</html>
|