Tools shipped this batch (4 → 6 of 9 Ready):
04 Missing Value Handler src/core/missing.py + cli_missing.py + GUI
05 Column Mapper src/core/column_mapper.py + cli_column_map.py + GUI
09 Pipeline Runner src/core/pipeline.py + cli_pipeline.py + GUI
with soft tool-dependency graph (recommended,
not enforced) and JSON save/load for repeatable
weekly cleanups.
Format Standardizer reworked for 1 GB international files:
• Vectorised dispatch + LRU cache over phone/date/currency/boolean/email
• Per-row country / address columns drive parsing
• Audit cap (default 10 k rows, ~50 MB RAM)
• standardize_file(): chunked streaming entry point (~165 k rows/sec)
• currency_decimal="auto" for EU comma-decimal locales
• R$ / kr / zł multi-char currency prefixes
• cli_format.py with auto-stream above 100 MB inputs
Encoding detection arbiter + language-aware probe:
Closes the last 4 xfails (cp1250 / mac_iceland / shift_jis_2004 / lying-BOM)
via tied-confidence arbiter + Cyrillic / EE-Latin coverage probes.
Distribution-readiness assets:
• streamlit_app.py — Streamlit Community Cloud entry shim
• src/gui/app_demo.py — single-page demo, ?p=<persona> routing,
100-row cap + watermark, free-vs-paid boundary enforced at surface
• samples/demo/ — 3 niche datasets + pre-tuned pipeline JSONs
• landing/ — 4 static HTML pages (apex chooser + 3 niche),
shared CSS, deploy.py URL-substitution script,
auto-generated robots.txt + sitemap.xml + 404.html + favicon
• docs/PLAN.md, DEMO-PLAN.md, DEPLOYMENT.md, POST-LAUNCH.md, NEXT-STEPS.md
— full strategy + measurement + deployment + master checklist
Test counts:
before: 1,520 passed · 4 skipped · 17 xfailed
after: 1,729 passed · 0 skipped · 0 xfailed
Tier-1 corpora added:
• missing-corpus 3 use cases + 16 edge cases
• column-mapper-corpus 3 use cases + 5 edge cases
• format-cleaner intl 20-row 13-country stress fixture
Engine hardening flushed out by the corpora:
• interpolate guards against object-dtype columns
• mean/median skip all-NaN columns (silences numpy warning)
• fillna runs under future.no_silent_downcasting (silences pandas warning)
• mojibake test no longer skips when ftfy installed (monkeypatch path)
• drop-row threshold semantics: strict-greater (consistent across rows / cols)
• currency_decimal validator allow-set updated for "auto"
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
382 lines
19 KiB
HTML
382 lines
19 KiB
HTML
<!DOCTYPE html>
|
||
<html lang="en">
|
||
<head>
|
||
<meta charset="utf-8" />
|
||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||
<title>DataTools for Shopify — Clean Customer & Product Exports Locally · $49</title>
|
||
<meta name="description" content="Clean Shopify customer, product, and subscriber exports — locally. Klaviyo-import-ready in 30 seconds. Catches duplicates Excel misses. Your data never leaves your computer. $49 one-time." />
|
||
<meta name="keywords" content="shopify customer cleanup, shopify csv cleaner, shopify product feed cleaner, klaviyo deduplicate, shopify customer dedup tool, shopify pet supplies" />
|
||
<link rel="canonical" href="https://datatools.app/shopify/" />
|
||
<link rel="stylesheet" href="../_shared/styles.css" />
|
||
|
||
<!-- Persona accent: Shopify pet → mint green (default in shared sheet) -->
|
||
|
||
<!-- Open Graph -->
|
||
<meta property="og:title" content="DataTools for Shopify — Clean Customer & Product Exports Locally" />
|
||
<meta property="og:description" content="Klaviyo-import-ready in 30 seconds. Local. No upload. $49 one-time." />
|
||
<meta property="og:type" content="product" />
|
||
<meta property="og:url" content="https://datatools.app/shopify/" />
|
||
|
||
<!-- Schema.org Product -->
|
||
<script type="application/ld+json">
|
||
{
|
||
"@context": "https://schema.org",
|
||
"@type": "SoftwareApplication",
|
||
"name": "DataTools for Shopify",
|
||
"operatingSystem": "Windows, macOS, Linux",
|
||
"applicationCategory": "BusinessApplication",
|
||
"offers": {
|
||
"@type": "Offer",
|
||
"price": "49",
|
||
"priceCurrency": "USD"
|
||
},
|
||
"description": "Clean Shopify customer, product, and subscriber CSV exports locally. Six-tool data-cleaning bundle: dedupe, text-clean, format-standardize, missing-value handle, column-map, pipeline.",
|
||
"softwareVersion": "1.0"
|
||
}
|
||
</script>
|
||
</head>
|
||
<body>
|
||
|
||
<!-- ============= Sticky buy bar ============= -->
|
||
<div class="buybar">
|
||
<div class="buybar-inner">
|
||
<div class="brand"><span class="brand-mark">●</span> DataTools <span class="muted">/ for Shopify</span></div>
|
||
<div>
|
||
<span class="price-tag">$49 — one-time, no subscription</span>
|
||
<a class="btn" href="https://gumroad.com/l/datatools?from=shopify-pet" rel="noopener">Get DataTools →</a>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- ============= Hero ============= -->
|
||
<section class="hero">
|
||
<div class="container">
|
||
<div class="eyebrow">For Shopify operators · pet supplies · subscription stores · DTC</div>
|
||
<h1>Klaviyo-import-ready customer lists.<br /><strong>In 30 seconds. Locally.</strong></h1>
|
||
<p class="lead">
|
||
Your Shopify customer export is a mess of formatting drift, disguised
|
||
duplicates, and inconsistent phone numbers. DataTools fixes all of it
|
||
in one pass — fuzzy-dedupes the same customer Klaviyo would charge
|
||
you for twice, standardises phones across your international
|
||
subscribers, and hands you a cleaned CSV. <strong>Your data never
|
||
leaves your computer.</strong>
|
||
</p>
|
||
<div class="cta-row">
|
||
<a class="btn btn-large" href="https://gumroad.com/l/datatools?from=shopify-pet" rel="noopener">Get DataTools — $49 →</a>
|
||
<a class="btn btn-ghost btn-large" href="#demo">Try the live demo ↓</a>
|
||
<span class="price-note">One-time payment · cross-platform · runs offline</span>
|
||
</div>
|
||
<div class="stats">
|
||
<div class="stat"><div class="num">6</div><div class="label">tools, one bundle</div></div>
|
||
<div class="stat"><div class="num">1 GB</div><div class="label">customer file in 2.5 min</div></div>
|
||
<div class="stat"><div class="num">0</div><div class="label">cloud uploads ever</div></div>
|
||
</div>
|
||
</div>
|
||
</section>
|
||
|
||
<!-- ============= Pain points ============= -->
|
||
<section>
|
||
<div class="container">
|
||
<div class="eyebrow">If any of these sound like your Tuesday</div>
|
||
<h2>Five pains DataTools fixes in one pass</h2>
|
||
<div class="grid">
|
||
<div class="card">
|
||
<span class="icon">💸</span>
|
||
<h3>Klaviyo / Mailchimp / Omnisend bills you for every duplicate</h3>
|
||
<p>Same customer signs up twice — once with a typo, once with a plus-tag, once on mobile. Your subscriber list has 10–18 % duplicate rate and you're paying for every one of them, every month, forever.</p>
|
||
<p class="muted"><strong>What it costs:</strong> $30–$300/mo per percent of dupes on a 50 k-list — recurring.</p>
|
||
</div>
|
||
<div class="card">
|
||
<span class="icon">📵</span>
|
||
<h3>Your product feed got rejected by Google Merchant Center</h3>
|
||
<p>Smart quotes from a copy-paste in product titles. NBSP in SKU. Inconsistent attribute casing. Feed bounces, the launch sits for 24–72 hours while you try to find the bad row in a 12,000-line CSV.</p>
|
||
<p class="muted"><strong>What it costs:</strong> 1–3 days of delayed campaign × the campaign value.</p>
|
||
</div>
|
||
<div class="card">
|
||
<span class="icon">🪢</span>
|
||
<h3>Orders from Shopify + Etsy + Amazon + Faire don't speak the same language</h3>
|
||
<p>Each platform's export uses different column names for "customer email" / "ship country" / "order total." Merging takes hours of manual rename and copy-paste before the analysis can even begin.</p>
|
||
<p class="muted"><strong>What it costs:</strong> 4–8 hours per month manually merging exports.</p>
|
||
</div>
|
||
<div class="card">
|
||
<span class="icon">🔁</span>
|
||
<h3>Subscription churn looks higher than it is</h3>
|
||
<p>Pet-box subscribers cancel, then re-sub three months later under a different email or device. Your cohort report says churn is 20 % when it's actually 12 % — and you're over-paying for acquisition because LTV is mis-calculated.</p>
|
||
<p class="muted"><strong>What it costs:</strong> wrong CAC ceiling for the next year of paid ads.</p>
|
||
</div>
|
||
<div class="card">
|
||
<span class="icon">🌍</span>
|
||
<h3>VAT MOSS / EU tax breaks because country is spelled three ways</h3>
|
||
<p>Your UK customers are tagged <code>UK</code>, <code>U.K.</code>, and <code>United Kingdom</code> — all in one export. The VAT report aggregates them as three different markets. Compliance friction every quarter.</p>
|
||
<p class="muted"><strong>What it costs:</strong> compliance risk + repeated manual normalization.</p>
|
||
</div>
|
||
<div class="card">
|
||
<span class="icon">🔒</span>
|
||
<h3>Cloud cleaners want you to upload your customer list</h3>
|
||
<p>Your customer list is your single most valuable business asset. Uploading it to a SaaS to clean it is the privacy story you do not want. DataTools is desktop-only — your list never leaves your computer.</p>
|
||
<p class="muted"><strong>What it costs:</strong> nothing — and that's the point.</p>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</section>
|
||
|
||
<!-- ============= Live demo ============= -->
|
||
<section id="demo">
|
||
<div class="container">
|
||
<div class="eyebrow">Live demo · runs in your browser</div>
|
||
<h2>Try it on a real-looking Shopify customer export</h2>
|
||
<p>
|
||
The demo below loads a sample 15-row Shopify customer file with
|
||
pollution we've seen in actual stores: smart quotes from copy-paste,
|
||
duplicates with email-case drift, international phones from the UK,
|
||
Spain, Germany, Australia, and Japan, and the usual mess of
|
||
<code>N/A</code> / <code>(blank)</code> / <code>?</code> sentinels.
|
||
Click <strong>Run pipeline</strong> and watch every column get
|
||
cleaned in under a second.
|
||
</p>
|
||
<div class="demo-frame">
|
||
<iframe
|
||
src="https://demo.datatools.app/?p=shopify-pet"
|
||
loading="lazy"
|
||
title="DataTools live demo — Shopify pet supplies"
|
||
sandbox="allow-scripts allow-same-origin allow-downloads allow-forms"></iframe>
|
||
<div class="demo-caption">
|
||
Demo runs on free hosting (Streamlit Community Cloud). Capped at
|
||
100 input rows · output watermarked with one trailing row. The
|
||
paid product has no caps and runs entirely offline.
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</section>
|
||
|
||
<!-- ============= Built for Shopify ============= -->
|
||
<section>
|
||
<div class="container">
|
||
<div class="eyebrow">Built for the Shopify operator</div>
|
||
<h2>Five workflows you do every week</h2>
|
||
<div class="grid">
|
||
<div class="card">
|
||
<span class="icon">🧹</span>
|
||
<h3>Customer-list cleanup</h3>
|
||
<p>Catches the same customer who shows up as <code>john@gmail.com</code>, <code>John@Gmail.com</code>, and <code>j.ohn@gmail.com</code>. Fuzzy match merges the spellings, exact match catches the obvious ones.</p>
|
||
</div>
|
||
<div class="card">
|
||
<span class="icon">📦</span>
|
||
<h3>Product catalogue dedup</h3>
|
||
<p>SKU whitespace, near-identical product names, copy-paste smart quotes in titles — gone. Audit log shows every change.</p>
|
||
</div>
|
||
<div class="card">
|
||
<span class="icon">🛒</span>
|
||
<h3>Abandoned-cart hygiene</h3>
|
||
<p>Before re-engagement: dedupe across email + phone, drop sentinels-as-missing, format dates so your sequence triggers fire correctly.</p>
|
||
</div>
|
||
<div class="card">
|
||
<span class="icon">📥</span>
|
||
<h3>Subscriber-list import to Klaviyo</h3>
|
||
<p>Klaviyo charges per contact. Every duplicate you don't catch costs you for the life of the subscription. Catch them once, pay once.</p>
|
||
</div>
|
||
<div class="card">
|
||
<span class="icon">🔗</span>
|
||
<h3>Multi-channel order consolidation</h3>
|
||
<p>Orders from Shopify + Etsy + a wholesale spreadsheet, each with a different column for "customer email." Column-mapper aligns them; dedup merges across channels.</p>
|
||
</div>
|
||
<div class="card">
|
||
<span class="icon">⚙️</span>
|
||
<h3>Repeatable pipeline</h3>
|
||
<p>Save the cleanup as a JSON file. Drop next week's export on it. Same cleanup, zero re-configuration. Automatable via the CLI.</p>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</section>
|
||
|
||
<!-- ============= Privacy moat ============= -->
|
||
<section>
|
||
<div class="container">
|
||
<div class="eyebrow">The thing every cloud cleaner can't say</div>
|
||
<h2>Your customer list never leaves your computer.</h2>
|
||
<p>
|
||
DataTools is a desktop app. There's no upload step, no SaaS account,
|
||
no subscription, no "trust our security policy." The first thing you
|
||
can do after install is open your browser's network tab, run the
|
||
cleaner on your real customer file, and verify zero outbound
|
||
requests.
|
||
</p>
|
||
<div class="callout">
|
||
<strong>Why it matters for Shopify:</strong> your customer list is
|
||
your single most valuable business asset. Cloud cleaners require
|
||
you to upload it. We don't.
|
||
</div>
|
||
<div class="terminal"><span class="prompt">$</span> python -m src.cli_pipeline customers.csv --apply
|
||
Reading customers.csv...
|
||
47,832 rows, 14 columns
|
||
Executing pipeline:
|
||
<span class="ok">✓</span> text_clean (140 ms) {cells_changed: 12,408}
|
||
<span class="ok">✓</span> format_standardize (810 ms) {cells_changed: 31,202}
|
||
<span class="ok">✓</span> missing (95 ms) {sentinels_standardized: 8,129}
|
||
<span class="ok">✓</span> dedup (3.1 s) {duplicates_removed: 2,347}
|
||
|
||
Initial rows: 47,832 → Final rows: 45,485
|
||
Total elapsed: 4.2 s
|
||
<span class="prompt">$</span> # zero network calls. zero. promise.</div>
|
||
</div>
|
||
</section>
|
||
|
||
<!-- ============= Audit moat ============= -->
|
||
<section>
|
||
<div class="container">
|
||
<div class="eyebrow">For when your client asks "what changed?"</div>
|
||
<h2>Every change auditable. Every cell logged.</h2>
|
||
<p>
|
||
Every modification is recorded with the original value, the new
|
||
value, and which rule fired. Hand the audit CSV to your accountant,
|
||
your marketing manager, or your boss along with the cleaned file.
|
||
No <em>"I trust the AI"</em> hand-waving — they see exactly what
|
||
happened.
|
||
</p>
|
||
<div class="callout">
|
||
<strong>Real example:</strong> the demo above standardized 27
|
||
cells across 15 customers. The audit log lists each one — row,
|
||
column, before, after, which standardizer fired. The dedup audit
|
||
lists every duplicate group with the survivor and its losers.
|
||
</div>
|
||
</div>
|
||
</section>
|
||
|
||
<!-- ============= International ============= -->
|
||
<section>
|
||
<div class="container">
|
||
<div class="eyebrow">If you sell internationally — most pet brands do</div>
|
||
<h2>Phones, addresses, and currencies from anywhere on Earth.</h2>
|
||
<p>
|
||
Your subscriber from London entered her phone as <code>020 7946
|
||
0958</code>. Your Tokyo customer entered <code>03-3210-7000</code>.
|
||
Your German wholesale buyer wrote <code>€2.410,75</code>. Excel
|
||
thinks all of them are mistakes. DataTools knows what country each
|
||
row is from (per-row country column) and parses every one correctly
|
||
to E.164 phones, ISO dates, and numeric amounts.
|
||
</p>
|
||
<ul class="bullets">
|
||
<li><strong>50+ country codes</strong> via Google's libphonenumber.</li>
|
||
<li><strong>Currency auto-detect</strong> for $ / £ / € / ¥ / R$ / kr / zł — including the EU comma-decimal that breaks Excel.</li>
|
||
<li><strong>Address shape detection</strong> for US, UK, Canada, Germany, Australia.</li>
|
||
<li><strong>Locale-aware month names</strong> in English, French, German.</li>
|
||
</ul>
|
||
</div>
|
||
</section>
|
||
|
||
<!-- ============= What you get ============= -->
|
||
<section>
|
||
<div class="container">
|
||
<div class="eyebrow">In the bundle</div>
|
||
<h2>Six tools. One pipeline. One $49 download.</h2>
|
||
<div class="grid">
|
||
<div class="card"><h3>1 · Deduplicator</h3><p>Fuzzy match (Jaro-Winkler), 5 normalizers, survivor rules, interactive review.</p></div>
|
||
<div class="card"><h3>2 · Text Cleaner</h3><p>Whitespace, smart chars, NBSP, BOM, line endings, case ops.</p></div>
|
||
<div class="card"><h3>3 · Format Standardizer</h3><p>Dates, phones, emails, addresses, names, currencies, booleans.</p></div>
|
||
<div class="card"><h3>4 · Missing Value Handler</h3><p>Disguised-null detection, profile, mean/median/mode/ffill, drop strategies.</p></div>
|
||
<div class="card"><h3>5 · Column Mapper</h3><p>Fuzzy auto-rename, target schema, type coercion, required-field defaults.</p></div>
|
||
<div class="card"><h3>6 · Pipeline Runner</h3><p>Chain tools in recommended order, save/load JSON, automate weekly cleanups.</p></div>
|
||
</div>
|
||
</div>
|
||
</section>
|
||
|
||
<!-- ============= Pricing ============= -->
|
||
<section>
|
||
<div class="container">
|
||
<div class="eyebrow">Pricing — pay once, own it</div>
|
||
<h2>$49. No subscription. No ceiling on rows or files.</h2>
|
||
<div class="pricing">
|
||
<div class="card featured">
|
||
<div class="row"><div class="price">$49</div><div class="price-suffix">one-time</div></div>
|
||
<h3>DataTools for Shopify</h3>
|
||
<ul>
|
||
<li>All 6 tools, full pipeline</li>
|
||
<li>Mac · Windows · Linux installers</li>
|
||
<li>Code-signed (no Gatekeeper warnings)</li>
|
||
<li>Free updates for the v1.x line</li>
|
||
<li>Bonus: 3 ready-made Shopify pipelines</li>
|
||
</ul>
|
||
<a class="btn btn-large" href="https://gumroad.com/l/datatools?from=shopify-pet" rel="noopener">Buy on Gumroad →</a>
|
||
</div>
|
||
<div class="card">
|
||
<div class="row"><div class="price">$149</div><div class="price-suffix">one-time</div></div>
|
||
<h3>Full DataTools Suite</h3>
|
||
<p class="muted">Available when 3+ bundles ship. Includes everything in the Shopify pack plus the Bookkeeper and RevOps bundles. Save $48.</p>
|
||
<a class="btn btn-ghost btn-large" href="#" aria-disabled="true">Coming when ready</a>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</section>
|
||
|
||
<!-- ============= FAQ ============= -->
|
||
<section>
|
||
<div class="container">
|
||
<h2>Questions</h2>
|
||
|
||
<details class="faq">
|
||
<summary>Does this work with Shopify Plus?</summary>
|
||
<p>Yes — the input is just CSV / Excel from any source. Your Shopify Plus exports work the same as the standard plan, the same as a Shopify-to-CSV pipeline you've stitched together yourself. The cleaner doesn't care.</p>
|
||
</details>
|
||
|
||
<details class="faq">
|
||
<summary>How does this compare to Excel's "Remove Duplicates"?</summary>
|
||
<p>Excel does <em>exact</em> deduplication. <code>John@Gmail.com</code> and <code>john@gmail.com</code> are different customers to Excel. DataTools fuzzy-matches across case, whitespace, formatting, and even close-but-not-identical strings. The demo above merges 4 customer pairs Excel would leave duplicated.</p>
|
||
</details>
|
||
|
||
<details class="faq">
|
||
<summary>How big a file can it handle?</summary>
|
||
<p>1 GB CSV with international phones + addresses processes in about 2.5 minutes on a typical workstation. Streaming mode keeps memory bounded regardless of input size — we tested it on 26 million rows.</p>
|
||
</details>
|
||
|
||
<details class="faq">
|
||
<summary>Do I need to know Python to use it?</summary>
|
||
<p>No. The GUI is a browser interface that opens automatically when you double-click the app. It loads your file, you click Run, you download the cleaned file. The CLI is there for power users who want to script weekly cleanups.</p>
|
||
</details>
|
||
|
||
<details class="faq">
|
||
<summary>What about my privacy?</summary>
|
||
<p>Your customer list never leaves your computer. There is no cloud component, no telemetry, no "anonymous usage stats." When the app is running you can confirm zero outbound network requests in your browser's developer tools.</p>
|
||
</details>
|
||
|
||
<details class="faq">
|
||
<summary>What's your refund policy?</summary>
|
||
<p>Try the live demo above on the sample dataset before you buy. If you still find DataTools doesn't fit your workflow within 14 days, email for a refund — no questions asked.</p>
|
||
</details>
|
||
|
||
<details class="faq">
|
||
<summary>Will there be updates?</summary>
|
||
<p>Yes. The v1.x line is included free for everyone who buys DataTools today. We ship a patch every 30 days adding country support, edge-case fixes, and small features.</p>
|
||
</details>
|
||
</div>
|
||
</section>
|
||
|
||
<!-- ============= Final CTA ============= -->
|
||
<section>
|
||
<div class="container" style="text-align: center;">
|
||
<h2>Stop deduplicating customers by hand.</h2>
|
||
<p class="lead" style="margin: 0 auto 28px;">One $49 download. Mac, Windows, or Linux. Runs offline. Catches the duplicates Excel misses, standardizes the phones from your international customers, and saves a pipeline you can re-run on next week's export.</p>
|
||
<a class="btn btn-large" href="https://gumroad.com/l/datatools?from=shopify-pet" rel="noopener">Get DataTools — $49 →</a>
|
||
</div>
|
||
</section>
|
||
|
||
<!-- ============= Footer ============= -->
|
||
<footer>
|
||
<div class="container">
|
||
<div>
|
||
<p><strong>DataTools</strong> — local data-cleaning for Shopify, bookkeepers, and RevOps teams.</p>
|
||
<p class="muted">© 2026 · Built solo · Shipped from a small office.</p>
|
||
</div>
|
||
<div>
|
||
<p>
|
||
<a href="../bookkeeper/">For bookkeepers</a> ·
|
||
<a href="../revops/">For RevOps agencies</a><br />
|
||
<a href="https://gumroad.com/l/datatools?from=shopify-pet">Buy on Gumroad</a> ·
|
||
<a href="mailto:hello@datatools.app">Email support</a>
|
||
</p>
|
||
</div>
|
||
</div>
|
||
</footer>
|
||
|
||
</body>
|
||
</html>
|