Files
datatools-dev/landing/index.html
Michael 966af8ef94 feat: 3 new tools, format streaming, distribution-ready demo + landing pages
Tools shipped this batch (4 → 6 of 9 Ready):
  04 Missing Value Handler   src/core/missing.py + cli_missing.py + GUI
  05 Column Mapper           src/core/column_mapper.py + cli_column_map.py + GUI
  09 Pipeline Runner         src/core/pipeline.py + cli_pipeline.py + GUI
                             with soft tool-dependency graph (recommended,
                             not enforced) and JSON save/load for repeatable
                             weekly cleanups.

Format Standardizer reworked for 1 GB international files:
  • Vectorised dispatch + LRU cache over phone/date/currency/boolean/email
  • Per-row country / address columns drive parsing
  • Audit cap (default 10 k rows, ~50 MB RAM)
  • standardize_file(): chunked streaming entry point (~165 k rows/sec)
  • currency_decimal="auto" for EU comma-decimal locales
  • R$ / kr / zł multi-char currency prefixes
  • cli_format.py with auto-stream above 100 MB inputs

Encoding detection arbiter + language-aware probe:
  Closes the last 4 xfails (cp1250 / mac_iceland / shift_jis_2004 / lying-BOM)
  via tied-confidence arbiter + Cyrillic / EE-Latin coverage probes.

Distribution-readiness assets:
  • streamlit_app.py — Streamlit Community Cloud entry shim
  • src/gui/app_demo.py — single-page demo, ?p=<persona> routing,
    100-row cap + watermark, free-vs-paid boundary enforced at surface
  • samples/demo/ — 3 niche datasets + pre-tuned pipeline JSONs
  • landing/ — 4 static HTML pages (apex chooser + 3 niche),
    shared CSS, deploy.py URL-substitution script,
    auto-generated robots.txt + sitemap.xml + 404.html + favicon
  • docs/PLAN.md, DEMO-PLAN.md, DEPLOYMENT.md, POST-LAUNCH.md, NEXT-STEPS.md
    — full strategy + measurement + deployment + master checklist

Test counts:
  before: 1,520 passed · 4 skipped · 17 xfailed
  after:  1,729 passed · 0 skipped · 0  xfailed

Tier-1 corpora added:
  • missing-corpus           3 use cases + 16 edge cases
  • column-mapper-corpus     3 use cases + 5 edge cases
  • format-cleaner intl      20-row 13-country stress fixture

Engine hardening flushed out by the corpora:
  • interpolate guards against object-dtype columns
  • mean/median skip all-NaN columns (silences numpy warning)
  • fillna runs under future.no_silent_downcasting (silences pandas warning)
  • mojibake test no longer skips when ftfy installed (monkeypatch path)
  • drop-row threshold semantics: strict-greater (consistent across rows / cols)
  • currency_decimal validator allow-set updated for "auto"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 22:31:26 +00:00

237 lines
9.0 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>DataTools — Local CSV / Excel Cleaning for Shopify, Bookkeepers, and RevOps</title>
<meta name="description" content="One desktop tool. Three workflows. Clean Shopify customer exports, reconcile messy bank statements, or dedupe lead lists across HubSpot and LinkedIn — all locally. $49 one-time." />
<link rel="canonical" href="https://datatools.app/" />
<link rel="stylesheet" href="_shared/styles.css" />
<meta property="og:title" content="DataTools — Local CSV / Excel Cleaning" />
<meta property="og:description" content="One desktop tool, three niche workflows. Runs entirely offline. $49 one-time." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://datatools.app/" />
<style>
/* Apex-pageonly tweaks: persona cards are slightly bigger and use
per-card accent borders so the visitor visually identifies which
card matches their work in <2 seconds. */
.persona-grid {
display: grid; gap: 24px;
grid-template-columns: repeat(auto-fit, minmax(300px, 1fr));
margin-top: 28px;
}
.persona-card {
background: var(--surface);
border: 1px solid var(--rule);
border-radius: var(--radius);
padding: 28px;
display: flex; flex-direction: column;
transition: transform 0.08s ease, border-color 0.15s ease, box-shadow 0.2s ease;
text-decoration: none;
color: inherit;
}
.persona-card:hover {
transform: translateY(-2px);
border-color: var(--card-accent, var(--accent));
box-shadow: var(--shadow);
text-decoration: none;
}
.persona-card.shopify { --card-accent: #6ee7b7; }
.persona-card.bookkeeper{ --card-accent: #7dd3fc; }
.persona-card.revops { --card-accent: #c4b5fd; }
.persona-card .pill {
display: inline-block;
background: rgba(255,255,255,0.04);
color: var(--card-accent, var(--accent));
border: 1px solid var(--card-accent, var(--accent));
padding: 4px 10px; border-radius: 999px;
font-size: 12px; font-weight: 600;
letter-spacing: 0.04em;
margin-bottom: 12px;
align-self: flex-start;
}
.persona-card h3 {
color: var(--text);
font-size: 22px;
margin-bottom: 12px;
}
.persona-card p {
color: var(--text-soft);
flex: 1;
margin-bottom: 16px;
}
.persona-card .pain {
font-size: 14px; color: var(--text-mute);
margin: 8px 0 18px;
}
.persona-card .pain li { margin-bottom: 4px; }
.persona-card .open {
color: var(--card-accent, var(--accent));
font-weight: 600;
font-size: 15px;
}
.persona-card .open::after {
content: " →";
transition: margin-left 0.15s ease;
}
.persona-card:hover .open::after { margin-left: 4px; }
</style>
</head>
<body>
<!-- Sticky brand bar (no buy CTA on the apex — visitor hasn't picked a niche yet) -->
<div class="buybar">
<div class="buybar-inner">
<div class="brand"><span class="brand-mark"></span> DataTools</div>
<div>
<span class="price-tag">Pick your workflow ↓</span>
</div>
</div>
</div>
<section class="hero">
<div class="container">
<div class="eyebrow">For Shopify operators · bookkeepers · marketing & RevOps agencies</div>
<h1>Local CSV / Excel cleaning.<br /><strong>One tool. Three workflows.</strong></h1>
<p class="lead">
DataTools is a desktop app that fixes the data-cleaning headaches
every small business hits — duplicates Excel can't catch,
international phones it can't parse, dates and currencies in three
different formats per export. One $49 download. Works on Mac,
Windows, and Linux. <strong>Your data never leaves your
computer.</strong>
</p>
<div class="persona-grid">
<a class="persona-card shopify" href="shopify-pet/">
<span class="pill">🛍️ Shopify operator</span>
<h3>Customer / vendor / subscriber export cleanup</h3>
<p>
Klaviyo-import-ready customer lists in 30 seconds. Catches
cross-device duplicates, standardizes international phones
and addresses, fixes the disguised nulls that break product
feeds.
</p>
<ul class="pain">
<li>· Fix Klaviyo per-contact billing on phantom dupes</li>
<li>· Repair feeds rejected by Google Merchant / Meta</li>
<li>· Unify orders from Shopify + Etsy + Amazon + Faire</li>
<li>· Resolve VAT-MOSS country-name drift</li>
</ul>
<span class="open">Open the Shopify demo &amp; pricing</span>
</a>
<a class="persona-card bookkeeper" href="bookkeeper/">
<span class="pill">📒 Bookkeeper / accountant</span>
<h3>Bank-export reconciliation with audit trail</h3>
<p>
Catches the duplicate transaction QuickBooks imported twice
when Jan and Feb exports overlap. Standardizes dates,
amounts, and vendor casing. Hands you a row-level audit log
to share with the client.
</p>
<ul class="pain">
<li>· Catch month-overlap re-import dupes</li>
<li>· Consolidate vendors for clean 1099 reports</li>
<li>· Produce hand-off-ready audit trail</li>
<li>· Multi-currency books (EUR / GBP / BRL)</li>
</ul>
<span class="open">Open the bookkeeper demo &amp; pricing</span>
</a>
<a class="persona-card revops" href="revops/">
<span class="pill">🪢 Marketing / RevOps</span>
<h3>Lead-list dedup across HubSpot, LinkedIn, scrapes</h3>
<p>
One canonical lead per real person — across HubSpot,
LinkedIn, Apollo, ZoomInfo, and manual scrapes.
International phones (50+ country codes), per-row country
column, fuzzy match with merge.
</p>
<ul class="pain">
<li>· Stop paying HubSpot tier price for cross-source dupes</li>
<li>· Protect sender reputation from invalid emails</li>
<li>· Skip the 48 wk GDPR review on cloud cleaners</li>
<li>· Suppression-list sync across 5+ platforms</li>
</ul>
<span class="open">Open the RevOps demo &amp; pricing</span>
</a>
</div>
</div>
</section>
<section>
<div class="container">
<div class="eyebrow">What's the same across all three</div>
<h2>One engine. Same six tools. Same $49.</h2>
<p>
The persona pages above are positioning, not different products.
Whichever you buy, you get the full bundle: Deduplicator, Text
Cleaner, Format Standardizer, Missing-Value Handler, Column
Mapper, and Pipeline Runner — pre-tuned with a saved pipeline
that matches your workflow.
</p>
<div class="grid">
<div class="card">
<span class="icon">🔒</span>
<h3>Local-first</h3>
<p>Desktop app. No cloud upload, no SaaS account, no subscription. Verify zero outbound calls in your browser's network tab.</p>
</div>
<div class="card">
<span class="icon">📋</span>
<h3>Auditable</h3>
<p>Every cell change is logged with the original value, the new value, and which rule fired. Hand the audit CSV to your client.</p>
</div>
<div class="card">
<span class="icon">🌍</span>
<h3>International</h3>
<p>50+ country codes, per-row country awareness, EU comma decimals, parens-negative amounts, locale-aware month names.</p>
</div>
<div class="card">
<span class="icon">⚙️</span>
<h3>Repeatable</h3>
<p>Save your cleanup as a JSON pipeline. Re-run on next week's export with one CLI command. Same cleanup, zero re-config.</p>
</div>
<div class="card">
<span class="icon">📦</span>
<h3>Cross-platform</h3>
<p>Mac · Windows · Linux installers. Code-signed for macOS Gatekeeper. Free updates for the v1.x line.</p>
</div>
<div class="card">
<span class="icon">💰</span>
<h3>$49 one-time</h3>
<p>No subscription. No per-client license. No row caps. No AI black-box.</p>
</div>
</div>
</div>
</section>
<section>
<div class="container" style="text-align: center;">
<h2>Pick your workflow above to try the live demo.</h2>
<p class="muted">Or read the docs first — every tool has a CLI, every pipeline is JSON, every change is audited.</p>
</div>
</section>
<footer>
<div class="container">
<div>
<p><strong>DataTools</strong> — local data-cleaning for Shopify, bookkeepers, and RevOps teams.</p>
<p class="muted">© 2026 · Built solo · Shipped from a small office.</p>
</div>
<div>
<p>
<a href="shopify-pet/">For Shopify operators</a> ·
<a href="bookkeeper/">For bookkeepers</a> ·
<a href="revops/">For RevOps agencies</a><br />
<a href="mailto:hello@datatools.app">Email support</a>
</p>
</div>
</div>
</footer>
</body>
</html>