Files
datatools-dev/landing/bookkeeper/index.html
Michael 966af8ef94 feat: 3 new tools, format streaming, distribution-ready demo + landing pages
Tools shipped this batch (4 → 6 of 9 Ready):
  04 Missing Value Handler   src/core/missing.py + cli_missing.py + GUI
  05 Column Mapper           src/core/column_mapper.py + cli_column_map.py + GUI
  09 Pipeline Runner         src/core/pipeline.py + cli_pipeline.py + GUI
                             with soft tool-dependency graph (recommended,
                             not enforced) and JSON save/load for repeatable
                             weekly cleanups.

Format Standardizer reworked for 1 GB international files:
  • Vectorised dispatch + LRU cache over phone/date/currency/boolean/email
  • Per-row country / address columns drive parsing
  • Audit cap (default 10 k rows, ~50 MB RAM)
  • standardize_file(): chunked streaming entry point (~165 k rows/sec)
  • currency_decimal="auto" for EU comma-decimal locales
  • R$ / kr / zł multi-char currency prefixes
  • cli_format.py with auto-stream above 100 MB inputs

Encoding detection arbiter + language-aware probe:
  Closes the last 4 xfails (cp1250 / mac_iceland / shift_jis_2004 / lying-BOM)
  via tied-confidence arbiter + Cyrillic / EE-Latin coverage probes.

Distribution-readiness assets:
  • streamlit_app.py — Streamlit Community Cloud entry shim
  • src/gui/app_demo.py — single-page demo, ?p=<persona> routing,
    100-row cap + watermark, free-vs-paid boundary enforced at surface
  • samples/demo/ — 3 niche datasets + pre-tuned pipeline JSONs
  • landing/ — 4 static HTML pages (apex chooser + 3 niche),
    shared CSS, deploy.py URL-substitution script,
    auto-generated robots.txt + sitemap.xml + 404.html + favicon
  • docs/PLAN.md, DEMO-PLAN.md, DEPLOYMENT.md, POST-LAUNCH.md, NEXT-STEPS.md
    — full strategy + measurement + deployment + master checklist

Test counts:
  before: 1,520 passed · 4 skipped · 17 xfailed
  after:  1,729 passed · 0 skipped · 0  xfailed

Tier-1 corpora added:
  • missing-corpus           3 use cases + 16 edge cases
  • column-mapper-corpus     3 use cases + 5 edge cases
  • format-cleaner intl      20-row 13-country stress fixture

Engine hardening flushed out by the corpora:
  • interpolate guards against object-dtype columns
  • mean/median skip all-NaN columns (silences numpy warning)
  • fillna runs under future.no_silent_downcasting (silences pandas warning)
  • mojibake test no longer skips when ftfy installed (monkeypatch path)
  • drop-row threshold semantics: strict-greater (consistent across rows / cols)
  • currency_decimal validator allow-set updated for "auto"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 22:31:26 +00:00

355 lines
18 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>DataTools for Bookkeepers — Reconcile Bank Exports With An Audit Trail · $49</title>
<meta name="description" content="Reconcile messy bank exports. Catch duplicate transactions QuickBooks imported twice. Standardize dates, amounts, and vendor casing — locally. Every change auditable. $49 one-time." />
<meta name="keywords" content="reconcile bank export csv, quickbooks duplicate transactions, vendor list cleanup, bookkeeper csv tool, bank export deduplicator, bookkeeper audit trail" />
<link rel="canonical" href="https://datatools.app/bookkeeper/" />
<link rel="stylesheet" href="../_shared/styles.css" />
<!-- Persona accent: Bookkeeper → calm steel-blue -->
<style>
:root {
--accent: #7dd3fc;
--accent-ink: #042c43;
}
</style>
<!-- Open Graph -->
<meta property="og:title" content="DataTools for Bookkeepers — Reconcile Bank Exports With An Audit Trail" />
<meta property="og:description" content="Catch duplicate transactions. Standardize dates and amounts. Hand your client an audit trail. $49 one-time." />
<meta property="og:type" content="product" />
<meta property="og:url" content="https://datatools.app/bookkeeper/" />
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "SoftwareApplication",
"name": "DataTools for Bookkeepers",
"operatingSystem": "Windows, macOS, Linux",
"applicationCategory": "BusinessApplication",
"offers": {
"@type": "Offer",
"price": "49",
"priceCurrency": "USD"
},
"description": "Reconcile bank exports, dedupe vendor lists, and produce a hand-off-ready audit trail. Six-tool data-cleaning bundle for bookkeepers and freelance accountants.",
"softwareVersion": "1.0"
}
</script>
</head>
<body>
<div class="buybar">
<div class="buybar-inner">
<div class="brand"><span class="brand-mark"></span> DataTools <span class="muted">/ for Bookkeepers</span></div>
<div>
<span class="price-tag">$49 — one-time, no subscription</span>
<a class="btn" href="https://gumroad.com/l/datatools?from=bookkeeper" rel="noopener">Get DataTools →</a>
</div>
</div>
</div>
<section class="hero">
<div class="container">
<div class="eyebrow">For bookkeepers · freelance accountants · small-firm partners</div>
<h1>Reconcile messy bank exports.<br /><strong>Hand your client an audit trail.</strong></h1>
<p class="lead">
The Jan and Feb exports overlap and you've got the same transaction
booked twice. Vendor names are <em>"Amazon"</em>, <em>"amazon.com"</em>,
and <em>"AMAZON.COM*4F2X9"</em> in three different rows. Dates are a
smoosh of <code>01/15/2025</code>, <code>2025-01-15</code>, and
<code>Jan 18 2025</code>. DataTools fixes all of it in one pass —
and produces a row-by-row CSV showing every change so your client
can verify your work.
</p>
<div class="cta-row">
<a class="btn btn-large" href="https://gumroad.com/l/datatools?from=bookkeeper" rel="noopener">Get DataTools — $49 →</a>
<a class="btn btn-ghost btn-large" href="#demo">Try the live demo ↓</a>
<span class="price-note">One-time payment · cross-platform · runs offline</span>
</div>
<div class="stats">
<div class="stat"><div class="num">6</div><div class="label">tools, one bundle</div></div>
<div class="stat"><div class="num">100 %</div><div class="label">auditable changes</div></div>
<div class="stat"><div class="num">0</div><div class="label">cloud uploads ever</div></div>
</div>
</div>
</section>
<!-- ============= Pain points ============= -->
<section>
<div class="container">
<div class="eyebrow">If you've spent a Saturday on this, you already know</div>
<h2>Five pains DataTools fixes in one pass</h2>
<div class="grid">
<div class="card">
<span class="icon">📅</span>
<h3>Jan and Feb bank exports overlap — the same transaction posts twice</h3>
<p>QuickBooks (or any reconciler) silently double-counts the month-boundary rows. Your client's books understate cash by 14 % and nobody notices until tax season.</p>
<p class="muted"><strong>What it costs:</strong> 24 hours per month per client + reconciliation errors that can compound.</p>
</div>
<div class="card">
<span class="icon">📒</span>
<h3>1099 reports break because vendors are spelled three ways</h3>
<p>"Amazon", "amazon.com", "AMAZON.COM*4F2X9" become three separate vendors in QBO. You ship three 1099s instead of one — and the 1099-NEC threshold breaks both ways.</p>
<p class="muted"><strong>What it costs:</strong> 12 hours per 1099 cycle + IRS-paper-trail risk.</p>
</div>
<div class="card">
<span class="icon">🛡️</span>
<h3>"Show me what you changed" — your liability hangs on the answer</h3>
<p>Cloud cleaners that "just clean your data" don't give you a row-level audit log. Your professional indemnity insurance hates that. Your client's auditor hates that. You hate explaining it.</p>
<p class="muted"><strong>What it costs:</strong> per-firm liability premium + 2448 hr audit-response window stress.</p>
</div>
<div class="card">
<span class="icon">👥</span>
<h3>Per-client SaaS pricing destroys your margins at 10+ clients</h3>
<p>$30/mo per client × 20 clients = $600/mo, every month, for tooling. DataTools is a one-time desktop license you use on every client's books for the same $49. Forever.</p>
<p class="muted"><strong>What it costs:</strong> the difference between a $30/mo/client subscription and $49 once.</p>
</div>
<div class="card">
<span class="icon">🌍</span>
<h3>Multi-currency books break standard parsers</h3>
<p>Your client has EU customers. Their amounts come in as <code>€1.234,56</code> (comma decimal). Standard import tools see "1.234" as the whole-dollar amount and drop the rest. Parens-negative <code>($89.50)</code> gets read as positive.</p>
<p class="muted"><strong>What it costs:</strong> 3060 min per multi-currency client per month + occasional silent errors.</p>
</div>
<div class="card">
<span class="icon">🔒</span>
<h3>Your client's books are too sensitive for a cloud cleaner</h3>
<p>One "vendor breach" email to your clients ends the relationship. DataTools is desktop-only. No upload, no SaaS account, no third party seeing a single transaction. Verifiable in your browser's network tab.</p>
<p class="muted"><strong>What it costs:</strong> nothing — and that's exactly the point.</p>
</div>
</div>
</div>
</section>
<section id="demo">
<div class="container">
<div class="eyebrow">Live demo · runs in your browser</div>
<h2>Try it on a sample bank export with a known overlap</h2>
<p>
The demo below loads a 25-row export combining January and February
activity, with the month-boundary rows duplicated across exports —
the exact scenario where QuickBooks (or any reconciler) silently
double-counts transactions. Click <strong>Run pipeline</strong> and
watch the dedup catch every overlap, dates land in ISO format, and
the parens-negative amounts (<code>($89.50)</code>) become proper
negative numbers.
</p>
<div class="demo-frame">
<iframe
src="https://demo.datatools.app/?p=bookkeeper"
loading="lazy"
title="DataTools live demo — Bookkeeper"
sandbox="allow-scripts allow-same-origin allow-downloads allow-forms"></iframe>
<div class="demo-caption">
Demo runs on free hosting. Capped at 100 input rows · output
watermarked. The paid product has no caps and runs entirely offline.
</div>
</div>
</div>
</section>
<section>
<div class="container">
<div class="eyebrow">Built for the bookkeeper's actual day</div>
<h2>Four workflows the rest of the industry tax-codes around</h2>
<div class="grid">
<div class="card">
<span class="icon">🏦</span>
<h3>Bank export reconciliation</h3>
<p>Two months of activity overlap at the boundary. The same transaction posts twice — once in each export — with different formatting. DataTools dedups on Date + Amount + fuzzy Vendor and catches all of them.</p>
</div>
<div class="card">
<span class="icon">📒</span>
<h3>Vendor list consolidation</h3>
<p>QuickBooks has <code>amazon.com</code>. Your spreadsheet has <code>Amazon</code>. The bank statement has <code>AMAZON.COM*4F2X9</code>. Standardize the casing, fuzzy-match across sources, hand the client one clean vendor list.</p>
</div>
<div class="card">
<span class="icon">👥</span>
<h3>Customer master cleanup pre-migration</h3>
<p>Before moving from one accounting system to another, the customer master needs to be deduped, standardized, and audited. One tool, one pipeline, one CSV in / clean CSV out.</p>
</div>
<div class="card">
<span class="icon">🧾</span>
<h3>Expense report dedup</h3>
<p>Same receipt scanned twice. Same Uber ride entered manually and then imported from the corporate card. Catch them once — and produce the audit log that proves the duplicate <em>was</em> a duplicate.</p>
</div>
</div>
</div>
</section>
<section>
<div class="container">
<div class="eyebrow">The feature your liability insurance cares about</div>
<h2>Every change auditable. Period.</h2>
<p>
Every cell DataTools modifies is logged with the original value, the
new value, and which rule fired. When your client asks why a
transaction got merged or a date got reformatted, you don't say
"the AI did it." You hand them the CSV.
</p>
<div class="callout">
<strong>Why this matters specifically to bookkeepers:</strong> your
professional liability hangs on traceability. Cloud cleaners that
"just clean your data" without a row-level audit are unsafe at any
price. DataTools writes the audit by default, downloadable as a
separate CSV alongside the cleaned file.
</div>
<div class="terminal"><span class="prompt">$</span> head -5 client_jan2025_changes.csv
row,column,field_type,old,new
0,"Date ",date,"01/15/2025","2025-01-15"
0,Description,name," AMAZON.COM*4F2X9 PURCHASE","Amazon.com*4F2X9 Purchase"
0,Amount,currency,"-$129.99","-129.99"
1,Date ,date,"2025-01-15","2025-01-15"
<span class="prompt">$</span> # one row of audit per cell change. handed to the client. signed off.</div>
</div>
</section>
<section>
<div class="container">
<div class="eyebrow">The thing every cloud reconciler can't say</div>
<h2>Your client's books never leave your computer.</h2>
<p>
Your clients trust you with their books. That trust is one
"we noticed our data appeared in a vendor breach" email away from
gone. DataTools is a desktop app — no upload, no SaaS, no
subscription, no third party seeing a single transaction.
</p>
<div class="callout">
<strong>Confirm it yourself.</strong> Open your browser's network
tab when DataTools is running. Click around. Run the pipeline.
Zero outbound requests. Ever.
</div>
</div>
</section>
<section>
<div class="container">
<div class="eyebrow">If your clients run multi-currency books</div>
<h2>$ £ € ¥ R$ kr zł — handled.</h2>
<p>
Standardize <code>$1,234.56</code>, <code>1.234,56 €</code> (EU
decimal), <code>($89.50)</code> (parens-negative),
<code>R$ 250,00</code>, <code>kr 1.250,50</code>, and the rest of
the long tail. Output is canonical numeric (your import tool's
favourite shape) with optional ISO 4217 prefix
(<code>USD 1234.56</code>) when you need to preserve the
currency.
</p>
<ul class="bullets">
<li><strong>Auto-detect</strong> EU comma decimal so your French and German clients' books reconcile without per-locale config.</li>
<li><strong>Parens-negative</strong> handled — accounting convention, not just a math style.</li>
<li><strong>Multi-character prefixes</strong> like <code>R$</code> (Brazilian Real) and <code>kr</code> (Nordic) detected before the single-symbol regex so they don't get bucketed as USD.</li>
</ul>
</div>
</section>
<section>
<div class="container">
<div class="eyebrow">In the bundle</div>
<h2>Six tools. One pipeline. One $49 download.</h2>
<div class="grid">
<div class="card"><h3>1 · Deduplicator</h3><p>Fuzzy match (Jaro-Winkler), explicit strategies for Date+Amount+Vendor, survivor rules.</p></div>
<div class="card"><h3>2 · Text Cleaner</h3><p>Header whitespace, smart quotes from copy-paste, em-dash sentinels.</p></div>
<div class="card"><h3>3 · Format Standardizer</h3><p>ISO dates, numeric amounts (parens-negative), vendor casing, multi-currency.</p></div>
<div class="card"><h3>4 · Missing Value Handler</h3><p>Disguised-null detection: <code></code>, <code>N/A</code>, <code>(blank)</code>, <code>?</code>.</p></div>
<div class="card"><h3>5 · Column Mapper</h3><p>Project to your accounting tool's required schema, coerce types, drop extras.</p></div>
<div class="card"><h3>6 · Pipeline Runner</h3><p>Save the cleanup. Run it on next month's export with one command. Same audit, automated.</p></div>
</div>
</div>
</section>
<section>
<div class="container">
<div class="eyebrow">Pricing — pay once, own it</div>
<h2>$49. No subscription. No per-client license.</h2>
<div class="pricing">
<div class="card featured">
<div class="row"><div class="price">$49</div><div class="price-suffix">one-time</div></div>
<h3>DataTools for Bookkeepers</h3>
<ul>
<li>All 6 tools, full pipeline</li>
<li>Mac · Windows · Linux installers</li>
<li>Code-signed (no Gatekeeper warnings)</li>
<li>Free updates for the v1.x line</li>
<li>Bonus: ready-made bank-reconcile and vendor-cleanup pipelines</li>
<li><strong>Use on any number of clients</strong> — no seat limits</li>
</ul>
<a class="btn btn-large" href="https://gumroad.com/l/datatools?from=bookkeeper" rel="noopener">Buy on Gumroad →</a>
</div>
<div class="card">
<div class="row"><div class="price">$199</div><div class="price-suffix">one-time</div></div>
<h3>+ Priority email support</h3>
<p class="muted">Available post-launch. 24-hour async response on edge cases. Same product. Targeted at bookkeepers whose own time is &gt; $200/hr.</p>
<a class="btn btn-ghost btn-large" href="#" aria-disabled="true">Coming soon</a>
</div>
</div>
</div>
</section>
<section>
<div class="container">
<h2>Questions</h2>
<details class="faq">
<summary>Does this replace QuickBooks / Xero?</summary>
<p>No — DataTools cleans the data <em>before</em> it goes into your accounting system, or after you export it for analysis. It sits alongside QB/Xero, not in place of them. Think of it as the import-clean-up step that should have shipped with the bank export feature in the first place.</p>
</details>
<details class="faq">
<summary>Can I use it on multiple clients without paying again?</summary>
<p>Yes. The licence is per-bookkeeper, not per-client. Run it on every client's books for the same $49.</p>
</details>
<details class="faq">
<summary>What's the audit log look like in court?</summary>
<p>It's a CSV with five columns per change: <code>row, column, field_type, old, new</code>. Plus a JSON pipeline file describing exactly which rules ran in which order. Together they reproduce the cleanup deterministically — your client (or their auditor) can verify it on their machine.</p>
</details>
<details class="faq">
<summary>How does it handle Excel-only weirdness like serial dates?</summary>
<p>Excel serial dates (the number 45295 = 2024-01-15) are detected and converted automatically. So are Unix timestamps in seconds and milliseconds, RFC 2822 dates from email exports, partial-precision dates (<code>2024-01</code>, <code>2024-Q1</code>), and locale-specific month names in English/French/German.</p>
</details>
<details class="faq">
<summary>What about my clients' privacy?</summary>
<p>Your clients' books never leave your computer. The cleaner is a desktop app with zero network code in the data path. You can verify this in your browser's network tab.</p>
</details>
<details class="faq">
<summary>What's your refund policy?</summary>
<p>Try the live demo above on the sample dataset before you buy. If DataTools doesn't fit your workflow within 14 days, email for a refund — no questions asked.</p>
</details>
</div>
</section>
<section>
<div class="container" style="text-align: center;">
<h2>Stop reconciling bank exports by hand.</h2>
<p class="lead" style="margin: 0 auto 28px;">One $49 download. Catches the duplicate transactions QuickBooks imported twice, standardises dates and amounts and vendor casing, and hands you a row-level audit log to share with your client.</p>
<a class="btn btn-large" href="https://gumroad.com/l/datatools?from=bookkeeper" rel="noopener">Get DataTools — $49 →</a>
</div>
</section>
<footer>
<div class="container">
<div>
<p><strong>DataTools</strong> — local data-cleaning for Shopify, bookkeepers, and RevOps teams.</p>
<p class="muted">© 2026 · Built solo · Shipped from a small office.</p>
</div>
<div>
<p>
<a href="../shopify-pet/">For Shopify operators</a> ·
<a href="../revops/">For RevOps agencies</a><br />
<a href="https://gumroad.com/l/datatools?from=bookkeeper">Buy on Gumroad</a> ·
<a href="mailto:hello@datatools.app">Email support</a>
</p>
</div>
</div>
</footer>
</body>
</html>