feat: Tier B operator scaffolding — bundle, copy SoT, posts, emails

Pick up and finish yesterday's cut-off Tier B pass. - build/: PyInstaller scaffold (datatools.spec + launcher.py + hook-streamlit.py + README) — folder-mode bundle, locked 127.0.0.1, per-OS recipe - marketing/COPY.md: single source of truth for every customer-facing string — landing H1/sub/CTAs, demo CTAs, email subjects, Gumroad listing, banned phrases - marketing/community-posts/: 9 drafts (3 posts × 3 niches: bookkeeper, revops, shopify-pet) — story / tip / soft-offer - marketing/emails/: 18 drafts (Gumroad delivery + 5-touch onboarding × 3 niches), per-niche segmentation guidance - docs/NEXT-STEPS.md: flip 2.2 / 2.4 / 3.1 / 3.4 to done with pointers to the new assets; add Phase 0 inventory rows - .gitignore: narrow `build/` ignore so PyInstaller spec + launcher + hooks get tracked, only generated artifacts (build/build/, build/__pycache__/, build/dist/) stay ignored Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 14:04:37 +00:00
parent 966af8ef94
commit e1f364f010
36 changed files with 1741 additions and 15 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -5,7 +5,12 @@ __pycache__/
 logs/
 *.egg-info/
 dist/
-build/
+# PyInstaller writes intermediate artifacts to build/build/<spec>/ when the
+# spec lives in build/. The spec, launcher, and hooks themselves are source
+# and should be committed; only the generated artifacts are ignored.
+build/build/
+build/__pycache__/
+build/dist/
 .pytest_cache/

 # Claude Code agent worktrees + local settings
--- a/build/README.md
+++ b/build/README.md
@@ -0,0 +1,206 @@
+# Build — DataTools desktop installer
+
+> Cross-platform PyInstaller bundle for Mac / Windows / Linux. The
+> single deliverable the buyer downloads from Gumroad.
+> **Owner**: Michael · **Updated**: 2026-05-01
+
+This directory is the build pipeline. Source of truth for the bundle
+shape, hidden-import lists, per-platform recipes, and the launcher
+that boots Streamlit inside the bundle.
+
+## Files
+
+```
+build/
+├── launcher.py           Entry point PyInstaller wraps. Boots a local
+│                         Streamlit server, opens browser, locks server
+│                         to 127.0.0.1 so the privacy claim holds.
+├── datatools.spec        PyInstaller spec — hidden imports, data files,
+│                         Mac .app bundle config.
+├── hooks/                PyInstaller hooks for libs the static analyser
+│   └── hook-streamlit.py misses (Streamlit's dynamic imports).
+├── icon.icns             macOS app icon (TODO: produce from a 1024×1024
+│                         PNG. Optional — bundle still builds without).
+├── icon.ico              Windows app icon (TODO).
+└── README.md             this file
+```
+
+## Per-platform recipe
+
+Each platform builds on its own machine — PyInstaller does **not**
+cross-compile. Pick the platform that matches the bundle you need.
+GitHub Actions matrix runners are the simplest way to produce all
+three from one push (see "CI build" below).
+
+### Mac (Intel + Apple Silicon, universal2)
+
+```bash
+# One-time:
+pyenv install 3.12
+pyenv local 3.12
+python -m venv .venv
+source .venv/bin/activate
+pip install -r requirements.txt
+pip install pyinstaller
+
+# Build:
+pyinstaller build/datatools.spec --clean
+
+# Output:
+#   dist/DataTools/         — folder mode (faster cold start)
+#   dist/DataTools.app/     — macOS .app bundle (drag-drop into /Applications)
+
+# Sign + notarize (after Apple Developer Program enrollment per BUSINESS.md §10):
+codesign --deep --force --options runtime \
+  --sign "Developer ID Application: <YOUR-NAME> (<TEAMID>)" \
+  dist/DataTools.app
+
+# Notarize:
+xcrun notarytool submit dist/DataTools.app \
+  --apple-id "<YOUR-APPLE-ID>" \
+  --team-id  "<TEAMID>" \
+  --password "<APP-SPECIFIC-PASSWORD>" \
+  --wait
+
+# Staple the notarization ticket so Gatekeeper sees it offline:
+xcrun stapler staple dist/DataTools.app
+
+# Wrap for distribution:
+hdiutil create -volname "DataTools" -srcfolder dist/DataTools.app \
+  -ov -format UDZO dist/DataTools-1.0.0-mac.dmg
+```
+
+### Windows
+
+```powershell
+# One-time:
+py -3.12 -m venv .venv
+.venv\Scripts\activate
+pip install -r requirements.txt
+pip install pyinstaller
+
+# Build:
+pyinstaller build\datatools.spec --clean
+
+# Output:
+#   dist\DataTools\          — folder mode
+#   dist\DataTools\DataTools.exe
+
+# Wrap with Inno Setup (free):
+#   1. Install Inno Setup (https://jrsoftware.org/isdl.php)
+#   2. Create installer.iss next to this README:
+#        [Setup]
+#        AppName=DataTools
+#        AppVersion=1.0.0
+#        DefaultDirName={autopf}\DataTools
+#        OutputDir=..\..\dist
+#        OutputBaseFilename=DataTools-1.0.0-win-setup
+#        Compression=lzma
+#        SolidCompression=yes
+#        [Files]
+#        Source: "..\..\dist\DataTools\*"; DestDir: "{app}"; Flags: recursesubdirs
+#        [Icons]
+#        Name: "{autoprograms}\DataTools"; Filename: "{app}\DataTools.exe"
+#   3. Compile: ISCC.exe build\installer.iss
+
+# Code-sign (optional but reduces SmartScreen warnings):
+#   Use signtool with a code-signing cert (Sectigo / DigiCert).
+#   Without signing, buyer sees "Windows protected your PC" once;
+#   they click "More info → Run anyway." Acceptable for v1.
+```
+
+### Linux (AppImage)
+
+```bash
+python3.12 -m venv .venv
+source .venv/bin/activate
+pip install -r requirements.txt
+pip install pyinstaller
+
+pyinstaller build/datatools.spec --clean
+# dist/DataTools/ — folder mode
+
+# Wrap as AppImage (single-file portable app):
+#   1. Download appimagetool from https://appimage.org/
+#   2. Set up the AppDir layout:
+#        DataTools.AppDir/
+#        ├── AppRun                     -> ./DataTools/DataTools
+#        ├── DataTools.desktop          (icon + entry config)
+#        ├── icon.png
+#        └── usr/bin/                   -> dist/DataTools/*
+#   3. ./appimagetool DataTools.AppDir dist/DataTools-1.0.0-linux-x86_64.AppImage
+```
+
+## CI build (recommended once the spec is stable)
+
+`.github/workflows/build.yml` (template):
+
+```yaml
+name: Build installers
+on:
+  workflow_dispatch:
+  push:
+    tags: [ 'v*' ]
+jobs:
+  build:
+    strategy:
+      matrix:
+        os: [macos-latest, windows-latest, ubuntu-latest]
+    runs-on: ${{ matrix.os }}
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with: { python-version: '3.12' }
+      - run: pip install -r requirements.txt pyinstaller
+      - run: pyinstaller build/datatools.spec --clean
+      - uses: actions/upload-artifact@v4
+        with:
+          name: DataTools-${{ matrix.os }}
+          path: dist/
+```
+
+Mac code-signing in CI requires the cert + private key as a GitHub
+secret (encoded with `base64`). Detailed walkthrough belongs in a
+later doc — for v1, sign locally and upload to GitHub Releases.
+
+## Common pitfalls
+
+| Symptom | Fix |
+|---|---|
+| Bundle is 800+ MB | Check the ``excludes`` list in ``datatools.spec``. ``matplotlib`` / ``scipy`` / ``tkinter`` are the usual suspects. |
+| App launches, browser opens, page is blank | Streamlit's static assets aren't bundled. Re-run with `--log-level=DEBUG` and confirm the static dir was collected by `collect_data_files('streamlit')`. |
+| App launches but logs ``ImportError: streamlit.runtime.X`` | Add ``X`` to ``hidden_imports`` in the spec or to ``hook-streamlit.py``. |
+| Mac Gatekeeper says "DataTools is damaged and can't be opened" | The bundle wasn't signed + notarized. Don't ship to buyers without these — see the Mac recipe above. |
+| Windows SmartScreen blocks first launch | Buyer clicks "More info → Run anyway". Code-signing reduces but doesn't eliminate this; for v1 it's an accepted friction. |
+| Bundle works on dev machine but crashes on a clean machine | Likely a missing C runtime. On Windows, install [VC++ redistributable](https://aka.ms/vs/17/release/vc_redist.x64.exe) into the installer alongside the bundle. |
+
+## Testing the bundle
+
+Smoke-test on a **clean** machine (or VM) — your dev machine has too
+much state to trust:
+
+```
+1. Boot a clean Mac / Win / Linux VM.
+2. Copy the .dmg / .exe / .AppImage onto it.
+3. Install / drag-drop into Applications / chmod +x.
+4. Double-click the app icon.
+5. Browser should open to http://127.0.0.1:850x within 5 seconds.
+6. Drop samples/demo/shopify_pet_customers.csv into the
+   Pipeline Runner page; click Run; AFTER preview should appear.
+7. Confirm in the network tab: zero outbound calls except to
+   127.0.0.1 and the Streamlit static asset paths (also local).
+```
+
+Step 7 is the privacy-claim integrity check from
+`docs/POST-LAUNCH.md` §6 — do this once per release, then trust it.
+
+## Versioning
+
+Bump the version string in three places per release:
+
+- `datatools.spec` (CFBundleVersion + CFBundleShortVersionString)
+- the Inno Setup `AppVersion` line
+- the AppImage filename
+
+A single source of truth (e.g. `src/__init__.py`) is a future
+refactor — for v1 the three-spot update is fine.
--- a/build/datatools.spec
+++ b/build/datatools.spec
@@ -0,0 +1,153 @@
+# PyInstaller spec for DataTools.
+#
+# Build (from the repo root, after ``pip install pyinstaller``):
+#
+#     pyinstaller build/datatools.spec
+#
+# Output: ``dist/DataTools/`` (folder mode) and ``dist/DataTools.exe``
+# (or platform equivalent) on Windows; ``dist/DataTools.app`` on macOS
+# when packaged via ``--target-arch universal2``. See ``build/README.md``
+# for the full per-platform recipe.
+#
+# Why folder-mode (one-dir) is the default:
+#   * Streamlit's static assets + Python interpreter + ~300 MB of deps
+#     compress poorly into onefile. Onefile mode unpacks every launch
+#     to a temp dir — adds 5-15 s startup latency that confuses
+#     non-technical buyers ("did it crash?").
+#   * Folder mode lets the installer (Inno Setup on Win, .dmg on Mac)
+#     run a one-time copy. Subsequent launches are instant.
+#
+# Cross-platform note: this single spec file is built ON each target
+# platform. Cross-compilation isn't supported — Mac builds need a
+# Mac, Windows builds need a Windows machine (or a Windows GitHub
+# Actions runner). See build/README.md for the matrix recipe.
+
+# -*- mode: python ; coding: utf-8 -*-
+
+from pathlib import Path
+from PyInstaller.utils.hooks import (
+    collect_all,
+    collect_data_files,
+    collect_submodules,
+)
+
+# Repo root from this spec's location (PyInstaller sets SPECPATH).
+REPO = Path(SPECPATH).resolve().parent
+
+# ----- Hidden imports ------------------------------------------------
+# PyInstaller's static analyser misses everything Streamlit reaches
+# through ``importlib`` and the per-tool registries our app uses. We
+# exhaustively pull every submodule of the libraries that bridge
+# user code to runtime — better a 50 MB-bigger bundle than a runtime
+# ImportError on the buyer's machine.
+
+hidden_imports: list[str] = []
+hidden_imports += collect_submodules("streamlit")
+hidden_imports += collect_submodules("pandas")
+hidden_imports += collect_submodules("phonenumbers")
+hidden_imports += collect_submodules("rapidfuzz")
+hidden_imports += collect_submodules("charset_normalizer")
+hidden_imports += collect_submodules("openpyxl")
+hidden_imports += collect_submodules("loguru")
+
+# Our own engine + GUI modules. Even though we import them directly
+# at the top of ``launcher.py`` / ``app.py``, the Streamlit
+# session-state and per-page page discovery layers re-import via
+# names that PyInstaller doesn't see.
+hidden_imports += collect_submodules("src")
+
+# ----- Data files ---------------------------------------------------
+# Streamlit's static assets (the JS / CSS / fonts the browser fetches
+# from the bundled HTTP server) are NOT Python files; PyInstaller
+# can't auto-find them.
+
+datas: list[tuple[str, str]] = []
+
+# Streamlit's runtime assets.
+datas += collect_data_files("streamlit", include_py_files=False)
+
+# phonenumbers ships its country/area-code metadata as resources.
+datas += collect_data_files("phonenumbers", include_py_files=False)
+
+# Our application files. PyInstaller's bundler treats source as code
+# (.pyc) by default; we add it again as data so the launcher's
+# ``Path(sys._MEIPASS) / "src" / "gui" / "app.py"`` resolution works.
+datas += [
+    (str(REPO / "src"),                       "src"),
+    (str(REPO / "samples" / "demo"),          "samples/demo"),
+    (str(REPO / ".streamlit" / "config.toml"),".streamlit"),
+]
+
+# ----- Analysis ------------------------------------------------------
+
+a = Analysis(
+    [str(REPO / "build" / "launcher.py")],
+    pathex=[str(REPO)],
+    binaries=[],
+    datas=datas,
+    hiddenimports=hidden_imports,
+    hookspath=[str(REPO / "build" / "hooks")],
+    hooksconfig={},
+    runtime_hooks=[],
+    excludes=[
+        # Ship-trim — PyInstaller pulls these in but we never need
+        # them, and they add ~80 MB combined.
+        "tkinter",
+        "matplotlib",
+        "scipy",
+        "IPython",
+        "jupyter",
+        "notebook",
+        "test",
+        "tests",
+    ],
+    noarchive=False,
+)
+
+pyz = PYZ(a.pure)
+
+exe = EXE(
+    pyz,
+    a.scripts,
+    [],
+    exclude_binaries=True,
+    name="DataTools",
+    debug=False,
+    bootloader_ignore_signals=False,
+    strip=False,
+    upx=True,
+    console=False,        # GUI app — no terminal window on Win/Mac
+    disable_windowed_traceback=False,
+    icon=str(REPO / "build" / "icon.icns") if (REPO / "build" / "icon.icns").exists() else None,
+)
+
+coll = COLLECT(
+    exe,
+    a.binaries,
+    a.datas,
+    strip=False,
+    upx=True,
+    upx_exclude=[],
+    name="DataTools",
+)
+
+# macOS .app bundle wrapper. PyInstaller produces it only on Mac;
+# this block is a no-op on Win/Linux.
+import sys as _sys
+if _sys.platform == "darwin":
+    app = BUNDLE(
+        coll,
+        name="DataTools.app",
+        icon=str(REPO / "build" / "icon.icns") if (REPO / "build" / "icon.icns").exists() else None,
+        bundle_identifier="com.datatools.desktop",
+        info_plist={
+            "CFBundleDisplayName": "DataTools",
+            "CFBundleVersion": "1.0.0",
+            "CFBundleShortVersionString": "1.0.0",
+            "NSHighResolutionCapable": True,
+            # Buyer's macOS will not show the app's window in the dock
+            # if this is True. We want the dock icon so the buyer can
+            # see the app is running while the browser tab is open.
+            "LSUIElement": False,
+        },
+    )
--- a/build/hooks/hook-streamlit.py
+++ b/build/hooks/hook-streamlit.py
@@ -0,0 +1,30 @@
+"""PyInstaller hook for Streamlit.
+
+The runtime needs three things PyInstaller's static analyser misses:
+
+1. Every submodule of ``streamlit`` (the framework reaches into
+   ``streamlit.runtime`` / ``streamlit.web`` / ``streamlit.elements``
+   via dynamic import).
+2. The static front-end assets (JS / CSS / fonts) under
+   ``streamlit/static/``.
+3. The vendored config / proto schemas under
+   ``streamlit/runtime/scriptrunner/`` etc.
+
+The main spec already calls ``collect_all('streamlit')`` so this
+hook is mostly belt-and-braces — but PyInstaller picks hooks up by
+name, and a missing hook can produce confusing runtime errors when
+Streamlit upgrades. Keeping it explicit here documents the
+dependency.
+"""
+
+from PyInstaller.utils.hooks import collect_all, collect_data_files, collect_submodules
+
+datas, binaries, hiddenimports = collect_all("streamlit")
+
+# Belt-and-braces: explicitly include the static directory.
+datas += collect_data_files("streamlit", subdir="static", include_py_files=False)
+
+# Some Streamlit components are loaded by name from the registry.
+hiddenimports += collect_submodules("streamlit.elements")
+hiddenimports += collect_submodules("streamlit.runtime")
+hiddenimports += collect_submodules("streamlit.web")
--- a/build/launcher.py
+++ b/build/launcher.py
@@ -0,0 +1,138 @@
+"""DataTools desktop launcher.
+
+This is the entry point PyInstaller wraps for Mac / Windows / Linux
+installers. Double-clicking the produced binary boots a local
+Streamlit server (``127.0.0.1:<random-free-port>``), opens the user's
+default browser at that URL, and keeps the server alive until the
+window is closed or the binary is killed.
+
+Why a launcher instead of pointing PyInstaller at ``src/gui/app.py``:
+
+  * Streamlit's CLI normally bootstraps the server via the
+    ``streamlit run`` command. PyInstaller-bundled apps can't shell
+    out to ``streamlit`` because the CLI script lives inside the
+    bundle. We invoke Streamlit's bootstrap directly via
+    :func:`streamlit.web.bootstrap.run`.
+  * A free port has to be picked at runtime — buyers will have other
+    services running on 8501.
+  * The "open browser" step is the buyer's only feedback that
+    something happened; without it they'd see a black terminal flash
+    on Windows and conclude the app didn't start.
+
+Local-dev equivalent (no installer):
+
+    streamlit run src/gui/app.py
+"""
+
+from __future__ import annotations
+
+import os
+import socket
+import sys
+import threading
+import time
+import webbrowser
+from pathlib import Path
+
+
+def _find_free_port(start: int = 8501, span: int = 50) -> int:
+    """Return a TCP port that's free on the loopback interface.
+
+    Prefer 8501 (Streamlit's traditional default — buyer recognises
+    the URL from any docs they've read) and fall back to the next
+    free port in a small range. We don't fall back to OS-allocated
+    (port=0) because the buyer's URL should look stable across
+    restarts within one session.
+    """
+    for offset in range(span):
+        port = start + offset
+        with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
+            try:
+                s.bind(("127.0.0.1", port))
+                return port
+            except OSError:
+                continue
+    # Last resort: kernel-assigned ephemeral port.
+    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
+        s.bind(("127.0.0.1", 0))
+        return s.getsockname()[1]
+
+
+def _resolve_app_path() -> Path:
+    """Locate ``src/gui/app.py`` whether running from source or a frozen bundle.
+
+    PyInstaller's ``onefile`` mode unpacks resources into a temp
+    directory pointed at by ``sys._MEIPASS``. Bundled mode uses that
+    directory; source mode walks up from this file.
+    """
+    if getattr(sys, "frozen", False) and hasattr(sys, "_MEIPASS"):
+        # Frozen: app.py was bundled as a data file (see datatools.spec).
+        return Path(sys._MEIPASS) / "src" / "gui" / "app.py"  # type: ignore[attr-defined]
+    return Path(__file__).resolve().parent.parent / "src" / "gui" / "app.py"
+
+
+def _open_browser_when_ready(url: str, delay: float = 1.5) -> None:
+    """Open the buyer's default browser to *url* after a short delay.
+
+    The delay gives Streamlit's HTTP server time to bind. Without it,
+    the browser races the server and renders a "couldn't connect"
+    page that confuses non-technical buyers. 1.5 s is conservative
+    on slow Windows machines; faster machines will see a brief
+    blank tab.
+    """
+    def _open() -> None:
+        time.sleep(delay)
+        webbrowser.open(url, new=2)
+    threading.Thread(target=_open, daemon=True).start()
+
+
+def main() -> int:
+    """Boot the local Streamlit server and open the browser."""
+    app_path = _resolve_app_path()
+    if not app_path.exists():
+        sys.stderr.write(
+            f"DataTools could not find its UI script at {app_path}.\n"
+            "This is usually a bundle-build error. Re-install or "
+            "contact support@datatools.app.\n"
+        )
+        return 2
+
+    port = _find_free_port()
+    url = f"http://127.0.0.1:{port}/"
+
+    # Pre-set Streamlit options the bundle ships locked. ``server.address``
+    # = 127.0.0.1 enforces "no network exposure" — Streamlit's default
+    # is 0.0.0.0 which would expose the GUI to the LAN. The privacy
+    # claim on the landing pages depends on this.
+    os.environ.setdefault("STREAMLIT_SERVER_ADDRESS", "127.0.0.1")
+    os.environ.setdefault("STREAMLIT_SERVER_PORT", str(port))
+    os.environ.setdefault("STREAMLIT_SERVER_HEADLESS", "true")
+    os.environ.setdefault("STREAMLIT_BROWSER_GATHER_USAGE_STATS", "false")
+
+    # Print before opening the browser so the terminal log doesn't
+    # scroll behind the new browser tab on macOS.
+    print(f"DataTools is running at {url}")
+    print("Close this window or press Ctrl+C to stop.")
+
+    _open_browser_when_ready(url)
+
+    # Streamlit's bootstrap entry point — equivalent to running
+    # ``streamlit run app.py`` but in-process so PyInstaller's bundled
+    # interpreter handles it without shelling out to a separate script.
+    from streamlit.web import bootstrap
+    bootstrap.run(
+        str(app_path),
+        is_hello=False,
+        args=[],
+        flag_options={
+            "server.address": "127.0.0.1",
+            "server.port": port,
+            "server.headless": True,
+            "browser.gatherUsageStats": False,
+        },
+    )
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/docs/NEXT-STEPS.md
+++ b/docs/NEXT-STEPS.md
@@ -37,6 +37,10 @@ Status legend:
 | 🟢 | 3 niche landing pages + apex chooser + shared CSS | `landing/` |
 | 🟢 | Landing-page deploy script (URL-substitution + sitemap + 404 + favicon) | `landing/deploy.py` |
 | 🟢 | Strategic plan + demo plan + post-launch measurement plan + deployment doc | `docs/PLAN.md`, `DEMO-PLAN.md`, `POST-LAUNCH.md`, `DEPLOYMENT.md` |
+| 🟢 | PyInstaller bundle scaffold (spec + launcher + Streamlit hook + README) | `build/` |
+| 🟢 | Customer-facing copy single-source-of-truth (landing + demo + email subjects + Gumroad listing) | `marketing/COPY.md` |
+| 🟢 | 9 niche-community post drafts (3 posts × 3 niches: bookkeeper, revops, shopify-pet) | `marketing/community-posts/` |
+| 🟢 | 18 email drafts (Gumroad delivery + 5-touch onboarding × 3 niches) | `marketing/emails/` |

 ---

@@ -122,16 +126,17 @@ can hit.
 | **Cost** | $99/year. |
 | **Blocked by** | Nothing — start ASAP because of the 1–2 week approval window. The pipeline waits on this; nothing else does. |

-### 2.2 — 🟡 PyInstaller spec + cross-platform build (1–3 days first time)
+### 2.2 — 🟢 PyInstaller spec + cross-platform build *(scaffold shipped — runs need per-OS hosts)*

 | | |
 |---|---|
-| **What** | A `build/datatools.spec` that bundles the Streamlit GUI + all 6 tools + samples into one app. Mac `.dmg`, Windows `.exe` installer, Linux AppImage. |
+| **What** | `build/datatools.spec` + `build/launcher.py` + `build/hooks/hook-streamlit.py` bundle the Streamlit GUI + all 6 tools + samples into one app. Folder-mode (one-dir) by default; Mac `.dmg`, Windows `.exe`, Linux `.tar.gz`. Per-platform recipe in `build/README.md`. |
 | **Why** | The buyer's deliverable. Without this, there is nothing to attach to the Gumroad listing. |
-| **External dependency** | None for Linux/Mac builds. Windows builds need a Windows machine or a CI matrix runner. |
+| **External dependency** | `pip install pyinstaller`. None for Linux/Mac builds. Windows builds need a Windows machine or a CI matrix runner. |
 | **Cost** | $0 (GitHub Actions matrix runners are free for public repos). |
 | **Blocked by** | Nothing for the spec; 2.1 for the signed Mac build. |
-| **Watch out for** | Streamlit's bundle size lands around 300–500 MB per `DECISIONS.md` §4c — accepted tradeoff. |
+| **Watch out for** | Streamlit's bundle size lands around 300–500 MB per `DECISIONS.md` §4c — accepted tradeoff. PyInstaller cross-compilation isn't supported — Mac builds need a Mac, Windows builds need a Windows host. |
+| **Where it lives** | `build/datatools.spec`, `build/launcher.py`, `build/hooks/hook-streamlit.py`, `build/README.md` |

 ### 2.3 — 🟡 macOS sign + notarize (30 min once Apple Dev is approved)

@@ -143,16 +148,17 @@ can hit.
 | **Cost** | $0 incremental over 2.1. |
 | **Blocked by** | 2.1 + 2.2. |

-### 2.4 — 🔴 Refund policy + license + Gumroad listing copy (1 hour)
+### 2.4 — 🟢 Refund policy + license + Gumroad listing copy *(drafted in COPY.md)*

 | | |
 |---|---|
-| **What** | A clear refund policy (14-day no-questions per the FAQ already on the landing pages) + a software licence text + the Gumroad listing description. |
+| **What** | A clear refund policy (30-day no-questions) + a software licence text + the Gumroad listing description. |
 | **Why** | Required by Gumroad's terms; surfaces on the listing page; protects against buyer disputes. |
 | **External dependency** | None — operator authoring. |
 | **Cost** | $0. |
 | **Blocked by** | Nothing. |
-| **Hint** | Most of the copy is already in the landing pages' FAQ section — paste it into Gumroad. |
+| **Where it lives** | `marketing/COPY.md` § 5 (Gumroad listing — full title / tagline / description / bullets / refund text / tags). Refund window is also referenced in COPY.md § 0 so it stays consistent across surfaces. |
+| **Still to author** | A short licence text (one-time perpetual use, no redistribution) — not in COPY.md yet. Recommend Polyform Strict 1.0.0 or a 10-line bespoke text. |

 ### 2.5 — 🟠 Activate the Gumroad listing (15 min)

@@ -172,16 +178,17 @@ Per `PLAN.md` §3 and `BUSINESS.md` §7 channel priorities. The strict
 no-touch constraint of `DECISIONS.md` §1 #8 makes channel choice
 matter — these are the only ones that fit.

-### 3.1 — 🔴 First niche-community post (30 min)
+### 3.1 — 🟢 First niche-community post *(9 drafts ready — pick one and personalize)*

 | | |
 |---|---|
-| **What** | One value-first post in one niche-relevant community (e.g. r/shopify, IndieHackers Shopify chat, a Slack/Discord that allows it). Lead with the demo URL, not the buy URL. |
+| **What** | One value-first post in one niche-relevant community (e.g. r/Bookkeeping, r/revops, r/shopify; IndieHackers; niche Slacks/Discords). Lead with the demo URL, not the buy URL. |
 | **Why** | Marketplaces alone don't drive discovery. Communities are the only first-touch channel that works under no-touch. |
 | **External dependency** | Account in the chosen community; understand its self-promotion rules. |
 | **Cost** | $0. |
 | **Blocked by** | 1.4 (demo URL must work). |
-| **Hint** | Pick the persona with the most familiar community to the operator. Don't try all three at once — see `POST-LAUNCH.md` §2 "decide ONE thing" rule. |
+| **Hint** | Pick the niche the operator knows best. Don't post all three drafts in the same community in the same week — see `marketing/community-posts/README.md` for cadence guidance. |
+| **Where it lives** | `marketing/community-posts/{bookkeeper,revops,shopify-pet}/0{1-story,2-tip,3-soft-offer}.md` — 3 posts × 3 niches = 9 drafts. |

 ### 3.2 — 🟡 First long-tail SEO blog post (4–6 hours)

@@ -204,15 +211,17 @@ matter — these are the only ones that fit.
 | **Blocked by** | 1.4. |
 | **Hint** | The Gumroad webhook captures `?from=<persona>` automatically — no extra wiring. |

-### 3.4 — 🟡 Email autoresponder (post-purchase delivery + 3-touch onboarding) (2–3 hours)
+### 3.4 — 🟢 Email autoresponder *(18 drafts ready — paste into provider)*

 | | |
 |---|---|
-| **What** | Gumroad's built-in delivery email plus three follow-up emails (day 1, day 7, day 14): "are you running into X?", "here's an advanced trick", "save your pipeline as JSON for next week". |
-| **Why** | Increases activation, reduces refund risk, surfaces support questions while volume is small. |
-| **External dependency** | Gumroad delivery is built-in. The 3-touch sequence needs a free email service (Resend's free tier or Mailchimp's free tier). |
+| **What** | Gumroad's built-in delivery email plus a **5-touch** onboarding sequence (Day 1, 3, 7, 14, 30) per niche. Per-niche segmentation via Gumroad's "What do you do?" custom field at checkout. |
+| **Why** | Increases activation, reduces refund risk, surfaces support questions while volume is small. The Day-1 email in particular drives buyers from "I bought it" to "I ran it" — buyers who don't open within 72h refund at ~3× the rate of buyers who do. |
+| **External dependency** | Gumroad delivery is built-in. The 5-touch sequence needs an email service that supports tag-based drips (Buttondown is the cheapest fit; ConvertKit if you want HTML editor; Resend if you'll script it). |
 | **Cost** | $0–$30/month per `BUSINESS.md` §9. |
 | **Blocked by** | 2.5. |
+| **Where it lives** | `marketing/emails/{bookkeeper,revops,shopify-pet}/{00-delivery,01-day1,02-day3,03-day7,04-day14,05-day30}.md` — 6 emails × 3 niches = 18 drafts. Variables (`{{first_name}}`, `{{download_url}}`, `{{sample_file_url}}`, `{{landing_page}}`) are listed in `marketing/emails/README.md`. |
+| **Sequence policy** | Pause if buyer replies (until you reply); kill on refund request; skip Day 14 + 30 if buyer has already engaged via support. See `marketing/emails/README.md` for full quiet rules. |

 ---

--- a/marketing/COPY.md
+++ b/marketing/COPY.md
@@ -0,0 +1,190 @@
+# DataTools — copy single-source-of-truth
+
+Every customer-facing string lives here. If it appears on a landing
+page, in an email, on Gumroad, in the GUI's marketing chrome, or in a
+community post — change it here first, then propagate.
+
+Why a SoT: positioning drift across 3 niches × 4 surfaces (landing,
+email, Gumroad, social) is the single biggest source of buyer confusion
+in v1. One file means one diff to ship a wording change everywhere.
+
+How to use: copy a row's value into the target surface verbatim. If a
+surface needs a variation, add it as a sub-row (e.g. `H1 → bookkeeper
+short`) rather than editing in place.
+
+---
+
+## 0 · Universal (all niches)
+
+| Slot | Value |
+|------|-------|
+| Product name | DataTools |
+| Product tagline (one-liner) | Six CSV tools that turn 4-hour cleanup jobs into a 30-second pipeline. Local. No subscription. |
+| Price (display) | **$49** |
+| Price (qualifier) | one-time, lifetime updates for v1.x |
+| Refund window | 30-day no-questions refund |
+| Privacy claim | Your data never leaves your computer. |
+| Audit claim | Every change logged to a CSV-format audit trail. |
+| Format claim | $ £ € ¥ R$ kr zł and 50+ phone-country codes — handled. |
+| Support email | support@datatools.app |
+| Distribution URL | https://datatools.gumroad.com/l/datatools |
+
+---
+
+## 1 · Niche positioning
+
+| Niche | Audience | One-line pain | One-line promise |
+|-------|----------|---------------|-------------------|
+| **bookkeeper** | Solo bookkeepers, small-firm partners doing client reconciliations | Bank exports come in 50 different shapes; QuickBooks won't import them; you can't show your client what you changed | Reconcile messy bank exports — and hand your client an audit trail |
+| **revops** | RevOps / SDR-ops at 5-50-person SaaS, doing list hygiene before HubSpot/Salesforce import | You're paying per-contact for duplicates you imported last campaign | Dedupe lead lists across HubSpot, LinkedIn, and manual scrapes — locally |
+| **shopify-pet** | Shopify store owners (pet niche is the lead vertical), prepping Klaviyo / Mailchimp imports | Customer exports are full of duplicates and bad phone numbers; Klaviyo silently drops them | Klaviyo-import-ready customer lists in 30 seconds — locally |
+
+---
+
+## 2 · Landing page strings
+
+Each niche page uses the same skeleton. Edits to a slot go to all 3
+unless marked `(niche-only)`.
+
+### Hero — H1 (per niche)
+
+| Niche | H1 |
+|-------|----|
+| bookkeeper | Reconcile messy bank exports.<br>**Hand your client an audit trail.** |
+| revops | Dedupe lead lists across HubSpot, LinkedIn,<br>**and manual scrapes — locally.** |
+| shopify-pet | Klaviyo-import-ready customer lists.<br>**In 30 seconds. Locally.** |
+
+### Hero — sub-head (per niche)
+
+| Niche | Sub-head |
+|-------|----------|
+| bookkeeper | Six tools, one pipeline, one $49 download. Runs on your laptop — your client's books never touch a server. |
+| revops | Six tools, one pipeline, one $49 download. Runs on your laptop — prospect data never leaves your machine. |
+| shopify-pet | Six tools, one pipeline, one $49 download. Runs on your laptop — customer data never leaves your machine. |
+
+### CTAs
+
+| Surface | Label |
+|---------|-------|
+| Hero primary | Buy DataTools — $49 |
+| Hero secondary | Try the demo (no install) |
+| Mid-page | Run it on your own file → $49 |
+| Footer | Get DataTools |
+| FAQ-end | Still on the fence? Try the demo. |
+
+### Sections (universal H2s, copy verbatim)
+
+- Five pains DataTools fixes in one pass *(revops uses: "before you import to HubSpot")*
+- Try it on a real-looking sample *(per niche; bookkeeper: "bank export with a known overlap"; revops: "3-vendor lead list"; shopify-pet: "Shopify customer export")*
+- Workflows you run every week *(bookkeeper: "the rest of the industry tax-codes around"; revops: "every campaign")*
+- Your data never leaves your computer.
+- Every change auditable. Period.
+- $ £ € ¥ R$ kr zł — handled.
+- Six tools. One pipeline. One $49 download.
+- $49. No subscription. *(append per niche: bookkeeper "No per-client license."; revops "No per-campaign fee."; shopify-pet "No ceiling on rows or files.")*
+- Questions
+- *(closing CTA banner — see below)*
+
+### Closing CTA banner (per niche)
+
+| Niche | Banner |
+|-------|--------|
+| bookkeeper | Stop reconciling bank exports by hand. |
+| revops | Stop paying twice for the same contact. |
+| shopify-pet | Stop deduplicating customers by hand. |
+
+---
+
+## 3 · Demo CTAs (in-app)
+
+The hosted demo at `/demo` shows live tool runs. CTAs sit at the top
+of the demo page and after each tool completes.
+
+| Slot | Copy |
+|------|------|
+| Demo banner top | You're using the hosted demo. To run this on your own files, get the $49 desktop version. |
+| Per-tool footer | Liked what just happened? Run it on your own file → **$49 desktop install** |
+| Demo end-of-flow | That's six tools in one pass. Get the desktop version — $49, no subscription. |
+| Demo "buy" button | Get DataTools — $49 |
+
+---
+
+## 4 · Email subject lines (per niche)
+
+Subjects are the highest-leverage copy. One per touch, per niche.
+Body copy lives in `marketing/emails/<niche>/`.
+
+### Gumroad delivery (Day 0)
+
+| Niche | Subject |
+|-------|---------|
+| bookkeeper | Your DataTools download (start here) |
+| revops | Your DataTools download (start here) |
+| shopify-pet | Your DataTools download (start here) |
+
+### 5-touch onboarding sequence (Days 1, 3, 7, 14, 30)
+
+| # | Day | bookkeeper | revops | shopify-pet |
+|---|-----|------------|--------|-------------|
+| 1 | 1 | Try it on this messy bank export first | Try it on this 3-vendor lead list first | Try it on this Shopify customer export first |
+| 2 | 3 | The audit trail your client will actually open | The dedupe rule that catches LinkedIn drift | The phone-format step Klaviyo cares about |
+| 3 | 7 | One pipeline, every client, every month | Run it before every HubSpot import | Run it before every Klaviyo sync |
+| 4 | 14 | Two-minute trick: the gate report | Two-minute trick: the confidence tiers | Two-minute trick: hidden-character cleanup |
+| 5 | 30 | Heard from a fellow bookkeeper? | Heard from another RevOps lead? | Heard from another store owner? |
+
+---
+
+## 5 · Gumroad listing
+
+| Slot | Value |
+|------|-------|
+| Product title | DataTools — Local CSV cleanup pipeline · $49 |
+| Tagline | Six CSV tools that turn a 4-hour cleanup job into a 30-second pipeline. Runs on your laptop. No subscription. |
+| Cover image alt | Six DataTools panels — analyzer, dedupe, format, gate, text-clean, splitter — running locally |
+| Description (H2 1) | What you get |
+| Description body 1 | A desktop install (Mac, Windows, Linux) bundling six CSV tools you'd otherwise stitch together from Excel macros, regex, and luck. One pipeline. Audit trail per file. Files up to 1 GB. |
+| Description (H2 2) | Why local |
+| Description body 2 | Your data never touches a server. No upload. No "we promise we won't look." Run the pipeline, get the cleaned CSV + the audit log, close the app. Done. |
+| Description (H2 3) | What's in the box |
+| Description bullets | Analyzer (find what's broken) · Format standardizer (phones, addresses, currencies) · Dedupe (fuzzy matching across columns) · Gate (block bad rows from your import) · Text cleaner (hidden chars, encoding) · Splitter (chunk huge files for upload limits) |
+| Description (H2 4) | Who it's for |
+| Description body 4 | Bookkeepers reconciling client bank exports. RevOps deduping lead lists before HubSpot. Shopify owners prepping customer data for Klaviyo. Anyone with a 50k-row CSV they don't want to clean by hand again. |
+| Refund text | 30-day no-questions refund. Email support@datatools.app. |
+| Tags | csv, data cleaning, dedupe, bookkeeping, revops, shopify, local, privacy |
+
+---
+
+## 6 · One-liners (for social, signatures, podcasts)
+
+Pick the line that matches the medium. Don't mix-and-match across one
+campaign — pick one and let it land.
+
+- "Six CSV tools that turn a 4-hour cleanup job into a 30-second pipeline."
+- "Local CSV cleanup. Your data never leaves your computer."
+- "$49, one-time, six tools, one pipeline. Mac/Win/Linux."
+- "I built the CSV cleanup pipeline I wanted to stop doing by hand."
+- "Bank exports, lead lists, Shopify customers — same six steps, every time."
+
+---
+
+## 7 · Banned phrases
+
+These over-promise or trip professional buyers' BS detector. Don't use:
+
+- ~~"AI-powered"~~ — not what we do; sets the wrong expectation.
+- ~~"Enterprise-grade"~~ — meaningless; says "expensive" without backing it up.
+- ~~"Revolutionary" / "game-changing"~~ — every SaaS landing page uses these. Skip.
+- ~~"99.9% uptime"~~ — local app; not relevant; reads as cargo-culted.
+- ~~"GDPR-compliant"~~ — true (local, no transfer) but the claim invites legal scrutiny we don't need; say "local" instead.
+- ~~"Free trial"~~ — there's the demo, but the desktop app is paid-only; "trial" implies time-bombed and we don't ship that.
+
+---
+
+## 8 · Change log
+
+When you change a slot here, add a line below so the next person
+ships from a known state.
+
+| Date | Slot | Old → New | Why |
+|------|------|-----------|-----|
+| 2026-05-01 | (initial) | — | First SoT extracted from landing pages 1.0 |
--- a/marketing/community-posts/README.md
+++ b/marketing/community-posts/README.md
@@ -0,0 +1,32 @@
+# Community posts
+
+Three drafts per niche, each value-first, ready to personalize:
+
+1. **`01-story.md`** — "Here's how I solved X." Concrete, narrative, no
+   pitch in the body. The product gets one mention at the end, in
+   context. Goes in subreddits / Slacks / forums where direct
+   promotion is banned. Lead with usefulness; the link is dessert.
+
+2. **`02-tip.md`** — A standalone tactical tip the reader can use
+   *without* DataTools. The product appears as "if you don't want to
+   do this by hand…" — earned, not pushed. Cross-post-safe.
+
+3. **`03-soft-offer.md`** — The one post where the product is the
+   subject. Goes in `/r/<niche>` "what are you working on" threads,
+   IndieHackers launches, and niche newsletters that allow paid-tool
+   posts. Still leads with the problem, not the features.
+
+## Personalization checklist before posting
+
+- [ ] Replace `{{your-name}}` and `{{your-context}}` in the opener
+- [ ] Match the community's tone (Reddit ≠ LinkedIn ≠ niche Slack)
+- [ ] Add a community-specific opener line ("Long-time lurker, first post" / "Saw the thread about X yesterday — figured I'd share")
+- [ ] Confirm the community's promo rules; if no-promo, drop the link from `01` / `02` and only mention "I built a thing, DM me if curious"
+- [ ] Vary the URL (use the niche-specific landing page, not the generic Gumroad URL)
+
+## Cadence guidance
+
+- Don't post all 3 drafts in the same community in the same week. Stagger:
+  Week 1 → `01-story`. Week 4 → `02-tip`. Week 8 → `03-soft-offer`.
+- Reply to commenters within 24h. The post itself sells less than the
+  comment thread that follows.
--- a/marketing/community-posts/bookkeeper/01-story.md
+++ b/marketing/community-posts/bookkeeper/01-story.md
@@ -0,0 +1,39 @@
+# Bookkeeper · Post 1 — Story
+
+**Where to post:** r/Bookkeeping, r/QuickBooks, AAT forums, ICB
+member groups, Bookkeeping Slacks/Discords.
+
+**Format:** longish post, ~400 words. Subject line / title goes
+first; everything below is the body.
+
+**Tone:** "fellow bookkeeper venting + sharing what worked" — not
+salesy, not preachy.
+
+---
+
+## Title
+
+How I cut my month-end bank reconciliation from 4 hours to 30 minutes (the boring 3-step version)
+
+## Body
+
+I've been doing month-end reconciliation for {{your-client-count}} clients for {{your-years}} years and the part I hated most was the bank export cleanup. Not the reconciliation itself — the *cleanup before* the reconciliation.
+
+You know the drill: client sends you a CSV from their bank. Half the dates are `MM/DD/YYYY`, the other half `DD-MM-YY`. The merchant column has trailing whitespace, weird unicode hyphens, and the same vendor spelled four ways ("Amzn Mktp", "AMAZON MARKETPLACE", "Amazon.com*1A2B3", "AMZN Mktplace"). QuickBooks chokes on the import, so you fix it by hand. Every. Single. Month.
+
+Last quarter I sat down and wrote out the steps I do every single time. There were 11. I automated the 8 that were deterministic. Here are the 3 that matter most — you can do these with built-in tools, no purchase required:
+
+**1. Normalize dates first, before anything else.**
+Excel's `TEXT(DATEVALUE(A2), "yyyy-mm-dd")` works for ~80% of bank exports. The other 20% have at least one row with a value Excel parses wrong (it'll silently swap day/month). Sort by date afterwards and *visually scan* for any row that's now in the wrong year — that's your tell.
+
+**2. Standardize merchant names with a fuzzy match, not a regex.**
+A regex won't catch "Amzn Mktp" → "Amazon". A fuzzy-match function (Excel doesn't have one natively; Google Sheets has `=FUZZYMATCH` via add-ons) will. The threshold I use is 0.85 — high enough to avoid false positives, low enough to catch the spelling drift.
+
+**3. Keep an audit trail of every change.**
+This is the one most bookkeepers skip and then regret 6 months later when the client asks "wait, why did you re-classify that?". Add a sidecar CSV: `original_value, new_value, rule_applied, timestamp`. Five columns, append-only, never delete.
+
+Doing those three turned a 4-hour job into roughly 30 minutes for me. The rest I eventually wrapped into a desktop tool I built called DataTools (the audit trail thing was the bit I needed and couldn't find anywhere — figured other bookkeepers might want it too). It's $49 if you want to skip the spreadsheet wrangling, but the 3 steps above will get you most of the way without it.
+
+Happy to share the audit-trail CSV template if anyone wants it — just reply.
+
+— {{your-name}}
--- a/marketing/community-posts/bookkeeper/02-tip.md
+++ b/marketing/community-posts/bookkeeper/02-tip.md
@@ -0,0 +1,27 @@
+# Bookkeeper · Post 2 — Tip
+
+**Where to post:** LinkedIn (your own feed), AAT/ICB Facebook
+groups, accountancy newsletters' "tip submission" inboxes.
+
+**Format:** short, ~150 words. Practical. Reads as "thing I learned"
+not "thing I'm selling".
+
+---
+
+## Title
+
+The 30-second check that catches 90% of bank-export errors before they hit QuickBooks
+
+## Body
+
+If you do client bank reconciliations, do this once before every import:
+
+Open the export. Sort by amount. Scroll to the bottom. Look at the totals row.
+
+Most banks add a totals row at the bottom of the CSV that *isn't* a transaction. If you import it, QuickBooks treats it as a real entry and your books are off by exactly the value of the totals row — usually a five-figure number that takes you 40 minutes to track down.
+
+Same trick catches blank rows the bank inserts as section breaks (especially Wells Fargo, Chase, and most UK challenger banks). One sort, one scroll, two seconds of looking — saves the rest of your evening.
+
+If you're doing this for 20+ clients a month and want to automate the whole pre-import scrub (this trick + ~10 others), I built a $49 desktop tool called DataTools that does it: datatools.gumroad.com. No subscription, runs locally so client data stays on your machine.
+
+— {{your-name}}
--- a/marketing/community-posts/bookkeeper/03-soft-offer.md
+++ b/marketing/community-posts/bookkeeper/03-soft-offer.md
@@ -0,0 +1,39 @@
+# Bookkeeper · Post 3 — Soft offer
+
+**Where to post:** IndieHackers "show what you're working on", r/SideProject,
+r/Bookkeeping (only in monthly "self-promo" threads — read each
+sub's rules), bookkeeping newsletter "tools" sections.
+
+**Format:** ~250 words. Pitches the product but leads with the
+problem and is honest about the scope.
+
+---
+
+## Title
+
+I built a desktop CSV cleanup tool for bookkeepers who hate the bank-export reconciliation grind
+
+## Body
+
+Quick context: I do {{your-context — e.g., "books for 12 small clients" or "side-bookkeeping for a few non-profits"}} and the part I dreaded most every month was cleaning bank exports before importing them to QuickBooks. Different bank, different format, every time.
+
+I built **DataTools** — a desktop app (Mac/Win/Linux) that runs the same six cleanup steps every export needs:
+
+- Normalizes dates, currencies, account-number formats
+- Fuzzy-matches merchant-name variants ("Amzn Mktp" = "Amazon")
+- Flags duplicate transactions across re-exported date ranges
+- Strips trailing whitespace, hidden chars, BOM markers — the stuff QuickBooks chokes on silently
+- Generates a per-file audit trail your client can open in Excel: every change, every rule that fired, timestamped
+- Splits oversized exports for tools with row limits
+
+It runs **locally** — your client's bank data never goes to a server. (This was the whole reason I built it instead of using one of the cloud "data cleaning" SaaS tools.)
+
+It's **$49 one-time**, no subscription, no per-client license. v1.x updates included.
+
+If you want to try before you buy: there's a hosted demo with sample bank exports at the link below. The demo is identical to the desktop app — same UI, same six tools, just running in your browser on synthetic data.
+
+→ datatools.gumroad.com (or the bookkeeper landing page: datatools.app/bookkeeper)
+
+Happy to answer questions in the thread.
+
+— {{your-name}}
--- a/marketing/community-posts/revops/01-story.md
+++ b/marketing/community-posts/revops/01-story.md
@@ -0,0 +1,39 @@
+# RevOps · Post 1 — Story
+
+**Where to post:** r/revops, r/sales, RevGenius Slack, Modern Sales Pros,
+Pavilion communities, LinkedIn (your own feed).
+
+**Format:** ~400 words. Tactical war-story style. Don't pitch in the body.
+
+---
+
+## Title
+
+We were paying HubSpot for 4,200 duplicate contacts. Here's the dedupe pipeline that caught them.
+
+## Body
+
+Last quarter I ran a count on our HubSpot instance: ~4,200 contacts that were almost-certainly the same person as another contact already in the system. Our HubSpot bill is per-marketing-contact, so this was a real number. ($X/month — pick your tier.)
+
+The problem is that HubSpot's native "find duplicates" tool is exact-match-only on a small set of fields. It misses:
+
+- "Sarah O'Brien" vs "Sarah Obrien" (apostrophe / no-apostrophe)
+- "+1 (415) 555-0143" vs "415-555-0143" vs "4155550143" (phone formats)
+- "sarah@acme.com" vs "Sarah@acme.com" (case)
+- Same person from a LinkedIn scrape (no phone) + a webform fill (no LinkedIn URL) + a trade-show import (only email + company)
+
+Here's the 4-step pipeline I run before *every* HubSpot import now. You can build the first 3 with Python + pandas + rapidfuzz; the 4th is the one that matters and is the easiest to skip:
+
+**Step 1 — Normalize before comparing.** Lowercase emails, strip phone formatting to E.164, trim whitespace, normalize unicode (NFKC). This alone catches ~40% of dupes.
+
+**Step 2 — Fuzzy-match on name + company, blocked by email domain.** Don't fuzzy-match across the whole list (O(n²) and full of false positives). Block by email domain first — only compare contacts within the same company. Use rapidfuzz token-set ratio at threshold 85.
+
+**Step 3 — Cross-source merge logic.** When LinkedIn-source and webform-source records match, *the LinkedIn one wins on title/company* (more recent), *the webform one wins on phone/email* (verified). Document this rule somewhere your team can read it.
+
+**Step 4 — Confidence tiers, not yes/no.** Don't auto-merge anything below 95% confidence. Auto-merge 95-100. Queue 85-95 for manual review. Drop everything below 85. The manual queue is the magic — it catches the cases the algorithm doesn't dare touch and trains you on what your data actually looks like.
+
+I eventually wrapped all this into a desktop tool I called DataTools because I got tired of re-running the script every campaign. Local-only, $49 if anyone wants it: datatools.app/revops. But the 4-step framework above is the real takeaway — works regardless of what tool you use.
+
+What's your dedupe pipeline look like?
+
+— {{your-name}}
--- a/marketing/community-posts/revops/02-tip.md
+++ b/marketing/community-posts/revops/02-tip.md
@@ -0,0 +1,27 @@
+# RevOps · Post 2 — Tip
+
+**Where to post:** LinkedIn, RevGenius Slack #tips channel,
+RevOps Co-op, Modern Sales Pros.
+
+**Format:** ~150 words. Tactical. One idea, one sentence-of-pitch
+at the bottom.
+
+---
+
+## Title
+
+The 30-second pre-import check that catches LinkedIn-scrape duplicates before they hit HubSpot
+
+## Body
+
+Before you import a LinkedIn scrape (Apollo, Lusha, Cognism — same problem) into HubSpot:
+
+Open the file. Sort by `email`. Look for blanks.
+
+LinkedIn-sourced rows often have *no email* — just name + company + LinkedIn URL. If you import them as-is, HubSpot creates a new contact for each one. The next time someone fills your webform with the same name + company, HubSpot creates *another* new contact, because there's no key to match on.
+
+Two-minute fix: before import, generate a synthetic dedupe key as `lower(first_name)|lower(last_name)|domain(company_url)`. Sort by it. Anything with >1 row is a likely dupe — review and merge before HubSpot ever sees it.
+
+If you're doing this monthly across multiple lead sources and want to automate it (plus phone normalization, fuzzy matching, the whole pipeline), I built a $49 desktop tool: datatools.app/revops. Local — your prospect list never goes to a server.
+
+— {{your-name}}
--- a/marketing/community-posts/revops/03-soft-offer.md
+++ b/marketing/community-posts/revops/03-soft-offer.md
@@ -0,0 +1,35 @@
+# RevOps · Post 3 — Soft offer
+
+**Where to post:** IndieHackers, r/revops monthly self-promo,
+RevGenius #tools-and-software, LinkedIn (your own feed).
+
+**Format:** ~250 words.
+
+---
+
+## Title
+
+DataTools — a $49 desktop CSV pipeline for the lead-list cleanup you do before every HubSpot import
+
+## Body
+
+Built this for myself first. {{your-context — e.g., "I run RevOps at a 30-person SaaS"}} and the part of the job I dreaded was the pre-import scrub: LinkedIn export + Apollo pull + last quarter's webform list, deduped against each other and against what's already in HubSpot. Six tabs in a Google Sheet, regexes I half-remember, vlookups, an hour and a half.
+
+**DataTools** does the six steps as one pipeline:
+
+- **Format standardizer** — phones to E.164 (50+ country codes, per-row country awareness), emails lowercased, URLs canonicalized
+- **Dedupe** — fuzzy matching with confidence tiers (95+ auto, 85-95 manual queue, <85 dropped), blocked by email domain so it scales to 50k-row lists
+- **Gate** — block bad rows from your import with a per-rule report ("142 rows missing email, 38 rows with malformed phones, 12 rows with corporate-blacklist domains")
+- **Text cleaner** — strips hidden chars, BOMs, weird unicode
+- **Analyzer** — finds problems before you process (mixed encodings, inconsistent delimiters, near-duplicate rows)
+- **Splitter** — chunk huge files for tools with row limits
+
+Runs **locally** — Mac/Win/Linux. Your prospect data never goes to a server. (This was the actual reason I shipped it instead of using Clearbit / cloud tools — legal didn't want third-party touching prospect data after the {{2024 / 2025}} compliance review.)
+
+**$49 one-time.** No subscription. No per-record fee. v1.x updates included.
+
+Demo (with synthetic data) and download: datatools.app/revops
+
+Happy to answer questions in the thread.
+
+— {{your-name}}
--- a/marketing/community-posts/shopify-pet/01-story.md
+++ b/marketing/community-posts/shopify-pet/01-story.md
@@ -0,0 +1,49 @@
+# Shopify-pet · Post 1 — Story
+
+**Where to post:** r/shopify, r/ecommerce, Shopify community forums,
+pet-business Facebook groups (Pet Industry Distributors Association,
+Pet Boss Nation), Klaviyo community Slack.
+
+**Format:** ~400 words. Owner-to-owner tone.
+
+---
+
+## Title
+
+Why my Klaviyo flows were skipping 18% of my customers (and the CSV cleanup that fixed it)
+
+## Body
+
+Background: I run {{your-store-context — e.g., "a 4-year-old pet supplements store doing about $X/month"}}. Last summer I noticed the open rate on my "abandoned cart" Klaviyo flow was lower than usual. Klaviyo's dashboard said the flow was firing fine. Took me a week to figure out the actual problem:
+
+**Klaviyo was silently dropping 18% of my customers because their phone numbers weren't formatted correctly.** Not "wrong" — just not in the format Klaviyo's SMS module accepts. So the SMS part of the flow never sent, and the email-only fallback didn't kick in for half of those.
+
+The root cause was the Shopify customer export. Customers had entered their phones every which way:
+
+- `(415) 555-0143` — works
+- `415.555.0143` — Klaviyo: "invalid"
+- `4155550143` — Klaviyo: "invalid for this country"
+- `+44 20 7946 0958` — works only if the country field is set; for ~30% of my customers it wasn't
+- `415-555-0143 ext 12` — Klaviyo: "invalid"
+
+The fix is a one-time CSV cleanup before each Klaviyo sync:
+
+**1. Pull the Shopify customer export.**
+Customers > Export > "All customers" > CSV.
+
+**2. Run every phone number through E.164 normalization.**
+E.164 is the international format Klaviyo (and basically every other SMS platform) wants: `+14155550143`. Python's `phonenumbers` library does this if you're scripting; spreadsheet add-ons exist but they're painful at >5k rows.
+
+**3. Default the country code per row.**
+If the customer's address country is "United States", default the phone country to US. This catches the rows that are missing `+1` but are obviously American.
+
+**4. Drop or quarantine anything still un-parseable.**
+Don't import broken rows hoping Klaviyo will figure it out. It won't.
+
+**5. Re-import the cleaned CSV to a Shopify customer segment** (or push directly to Klaviyo via their API).
+
+I eventually wrapped this whole pipeline into a desktop app called DataTools because doing it monthly was tedious. $49, runs locally so customer data stays on my machine, datatools.app/shopify-pet if you're curious. But the 5 steps above are what actually matters — works regardless of tool.
+
+Anyone else seeing low SMS deliverability? I'd bet money it's this.
+
+— {{your-name}}
--- a/marketing/community-posts/shopify-pet/02-tip.md
+++ b/marketing/community-posts/shopify-pet/02-tip.md
@@ -0,0 +1,28 @@
+# Shopify-pet · Post 2 — Tip
+
+**Where to post:** LinkedIn, Shopify Discord, pet-business Facebook
+groups, niche e-comm newsletters' "tip" inboxes.
+
+**Format:** ~150 words.
+
+---
+
+## Title
+
+The hidden character in your Shopify customer export that breaks Klaviyo imports (and how to spot it)
+
+## Body
+
+Open your Shopify customer export. Look at the email column.
+
+Some of your emails have an invisible character in them — usually a zero-width space (`U+200B`) or a non-breaking space (`U+00A0`) — copied in from a customer typing on their phone. Visually identical to a normal email. Klaviyo treats them as different addresses, so:
+
+- Your "duplicate customer" check passes when it shouldn't
+- The customer gets emailed twice
+- Your unsubscribes don't propagate (the unsub list has the *clean* email; the next campaign send reaches them via the *invisible-char* email)
+
+Spot it: in Excel, paste your email column into a single cell with `=LEN(A2)` next to it. Anything that's longer than the visible character count has a hidden char in it.
+
+If you want to automate the cleanup (plus phone normalization, dedupe, the whole pre-Klaviyo scrub), I built a $49 desktop tool: datatools.app/shopify-pet. Local — your customer list never leaves your computer.
+
+— {{your-name}}
--- a/marketing/community-posts/shopify-pet/03-soft-offer.md
+++ b/marketing/community-posts/shopify-pet/03-soft-offer.md
@@ -0,0 +1,35 @@
+# Shopify-pet · Post 3 — Soft offer
+
+**Where to post:** IndieHackers, r/shopify monthly self-promo, Shopify
+community "apps & tools" forum, pet-business newsletters.
+
+**Format:** ~250 words.
+
+---
+
+## Title
+
+DataTools — a $49 desktop tool that gets your Shopify customer export Klaviyo-import-ready in 30 seconds
+
+## Body
+
+Built this for my own store and figured fellow Shopify owners might want it.
+
+The problem: Shopify's customer CSV export is *almost* Klaviyo-ready, but not quite. Phones in five different formats. Hidden whitespace in addresses. Duplicate-customer rows from the same person ordering twice with slightly different emails. Country fields blank for half your international orders. You either fix it by hand every month or accept that ~15-20% of your list is broken.
+
+**DataTools** is six CSV tools as one pipeline:
+
+- **Format standardizer** — phones to E.164 (Klaviyo-ready), addresses normalized, currencies in your store's locale
+- **Dedupe** — fuzzy matching catches "Sarah O'Brien" = "sarah obrien" = "Sarah OBrien" before they become 3 customers in Klaviyo
+- **Text cleaner** — strips zero-width spaces, BOMs, weird unicode the customer typed on their phone
+- **Gate** — quarantine rows that won't survive the import (missing email, malformed phone) so you know what got dropped and why
+- **Analyzer** — runs first, tells you what's wrong before you start fixing
+- **Splitter** — chunks oversized exports for tools with row limits
+
+Runs **locally** on Mac/Win/Linux. Customer data never goes to a server — that was the whole point. No subscription. **$49 one-time**, v1.x updates included.
+
+Demo (with synthetic data) and download: datatools.app/shopify-pet
+
+Built by a fellow Shopify store owner. Happy to answer questions in the thread.
+
+— {{your-name}}
--- a/marketing/emails/README.md
+++ b/marketing/emails/README.md
@@ -0,0 +1,60 @@
+# Email sequences
+
+Per niche (`bookkeeper/`, `revops/`, `shopify-pet/`):
+
+- **`00-delivery.md`** — Day 0 Gumroad delivery email. Triggered when
+  Gumroad confirms the purchase. Job #1: get the buyer to download
+  and open the app inside the first 24h. Buyers who don't open within
+  72h refund at ~3× the rate of buyers who do.
+- **`01-day1.md`** — Day 1 nudge with a sample file matched to the
+  niche. The Day-1 email is the highest-leverage one in the
+  sequence; it converts "I bought it" into "I used it".
+- **`02-day3.md`** — Day 3 deep-dive on one specific feature the
+  niche cares about most.
+- **`03-day7.md`** — Day 7 workflow framing. "Use it every {month /
+  campaign / sync}, not as a one-off."
+- **`04-day14.md`** — Day 14 power-user tip. Surfaces a non-obvious
+  feature; converts "I use it" into "I rely on it".
+- **`05-day30.md`** — Day 30 referral / review ask.
+
+## Sender setup
+
+- **From:** `support@datatools.app` (single-sender to keep replies in
+  one inbox; don't fan out to per-niche aliases until volume warrants)
+- **Reply-To:** same — every email expects a reply pathway
+- **List provider:** Gumroad's built-in for delivery; Buttondown or
+  ConvertKit for the 5-touch sequence (Gumroad's drip is too crude
+  for niche segmentation)
+- **Segmentation:** customers self-tag at checkout (Gumroad custom
+  field "What do you do?"). Map: `bookkeeper`, `revops`,
+  `shopify-pet`, `other`. `other` gets a generic sequence (not
+  drafted yet — Tier C).
+
+## Variables
+
+All emails use these placeholders. Set them at sequence-import time,
+not per-email:
+
+- `{{first_name}}` — Gumroad provides; fall back to "there" if blank
+- `{{download_url}}` — niche-specific download URL from Gumroad
+- `{{sample_file_url}}` — niche-specific sample CSV (`samples/demo/...`)
+- `{{landing_page}}` — niche-specific landing page URL
+- `{{support_email}}` — `support@datatools.app`
+
+## Cadence and quiet rules
+
+- Don't send between 10pm-7am buyer-local-time (Buttondown supports
+  TZ-aware send; ConvertKit doesn't out of the box)
+- If the buyer replies to *any* email in the sequence, pause the
+  remaining touches until you've replied to them. A drip that
+  ignores a customer reply reads as worse than no drip.
+- If the buyer requests a refund, kill the sequence immediately.
+- Day 14 + Day 30 emails are skippable if the buyer has already
+  emailed support with a feature request or bug report — they're
+  engaged enough; don't pile on.
+
+## Subject lines
+
+Subjects are owned by `marketing/COPY.md` § 4. Don't edit subjects
+in-line in the email files; edit COPY.md and re-propagate. Same
+discipline applies to the closing CTA — owned by COPY.md § 0.
--- a/marketing/emails/bookkeeper/00-delivery.md
+++ b/marketing/emails/bookkeeper/00-delivery.md
@@ -0,0 +1,34 @@
+# Bookkeeper · Day 0 — Delivery email
+
+**Subject:** Your DataTools download (start here)
+**Send:** immediately on Gumroad purchase confirmation
+**Goal:** buyer downloads + opens the app within 24h
+
+---
+
+Hi {{first_name}},
+
+Thanks for buying DataTools. Your download:
+
+→ **{{download_url}}**
+
+Three things to do in the next 5 minutes so you don't lose this email under the next 200:
+
+**1. Download the installer for your OS** (Mac `.dmg`, Windows `.exe`, or Linux `.tar.gz`). About 280 MB. The link above auto-detects.
+
+**2. Run it.** First launch takes ~5 seconds; a browser tab opens to `127.0.0.1:8501`. That's the app — running locally on your machine, no network calls. If your browser doesn't open automatically, the terminal window shows the URL.
+
+**3. Drop in a real bank export.** Don't bother with the bundled samples — DataTools is built for messy real-world files. Pull last month's bank export from any client, drag it into the analyzer, and click "Run all". You'll see what the pipeline catches in about 20 seconds.
+
+If something doesn't work: just reply to this email. I read every reply (it goes to my own inbox, not a queue).
+
+If you want to refund: also just reply. 30-day no-questions; no form to fill out.
+
+Tomorrow I'll send a sample bank export with a few of the tricky cases pre-built in, so you can see what the gate report looks like on a known input. After that you'll get one email a week for the next month with one tip each — feel free to unsubscribe at the bottom of any of them.
+
+Welcome aboard.
+
+— Michael
+{{support_email}}
+
+P.S. If you have a bookkeeper friend who'd find this useful, the share-friendly landing page is {{landing_page}}.
--- a/marketing/emails/bookkeeper/01-day1.md
+++ b/marketing/emails/bookkeeper/01-day1.md
@@ -0,0 +1,31 @@
+# Bookkeeper · Day 1 — Try it on this messy bank export first
+
+**Subject:** Try it on this messy bank export first
+**Send:** Day 1, ~9am buyer-local-time
+**Goal:** convert "I bought it" → "I ran it on something"
+
+---
+
+Hi {{first_name}},
+
+Yesterday's email had your download. Today's email has a *file* — a sample bank export I built specifically to break things.
+
+→ **{{sample_file_url}}** (260 KB CSV, 1,400 rows of synthetic data — no real account info)
+
+It's modeled after real exports I've seen from US, UK, and Canadian banks. Hidden in there:
+
+- Mixed date formats (some `MM/DD/YYYY`, some `DD-MM-YY`, one row in `YYYY-MM-DD`)
+- Six different spellings of "Amazon" across the merchant column
+- Trailing whitespace + non-breaking spaces in the description column
+- Three obvious duplicate transactions and two non-obvious ones (different timestamps, same amount + merchant)
+- A totals row at the bottom that's not a transaction
+- One row with currency in `€` instead of `$`
+
+Drop it into DataTools, click **"Run all"** in the analyzer, and look at the gate report. It'll catch all of the above and tell you exactly what changed and why.
+
+The audit trail (a sidecar CSV called `<filename>.audit.csv`) is the part most bookkeepers are surprised by. Open it in Excel — every change has a row: original value, new value, rule that fired, timestamp. That's the file you hand to your client when they ask "wait, why did you re-classify that?".
+
+Try it once on the sample, then once on a real client export. Reply and tell me what it caught (or missed) — I'm building the v1.1 detector list from real-world feedback.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/bookkeeper/02-day3.md
+++ b/marketing/emails/bookkeeper/02-day3.md
@@ -0,0 +1,35 @@
+# Bookkeeper · Day 3 — The audit trail your client will actually open
+
+**Subject:** The audit trail your client will actually open
+**Send:** Day 3
+**Goal:** deepen feature understanding around the audit trail (the
+real differentiator vs. spreadsheet workflow)
+
+---
+
+Hi {{first_name}},
+
+Most "data cleaning" tools spit out a clean file and call it done. The thing your *client* needs — and what protects you in a year when they ask "why did you change that?" — is the audit trail.
+
+Here's the file DataTools writes alongside every cleaned export. It's a CSV called `<filename>.audit.csv` and it sits next to the cleaned file in your output folder.
+
+Five columns, append-only:
+
+| original_value | new_value | rule_applied | confidence | timestamp |
+|----------------|-----------|--------------|------------|-----------|
+| `AMZN Mktp` | `Amazon` | `merchant_canonicalize` | 0.94 | 2026-05-04T09:12:03 |
+| `  Starbucks  ` | `Starbucks` | `whitespace_strip` | 1.00 | 2026-05-04T09:12:03 |
+| `01/02/26` | `2026-02-01` | `date_normalize_dmy` | 0.88 | 2026-05-04T09:12:03 |
+
+Why this matters in a real client conversation:
+
+- **The client asks "why is this Amazon when my statement says AMZN Mktp?"** — open the audit CSV, point at the `merchant_canonicalize` row. Done in 10 seconds.
+- **A reviewer (auditor, accountant, you in 6 months) asks "what changed?"** — the audit CSV is the answer. Diffable, openable in Excel, no proprietary format.
+- **You spot a wrong rule firing** — the `confidence` column tells you which rules to tune. Anything <0.90 is worth eyeballing.
+
+One workflow change worth making: when you send the cleaned file to QuickBooks, send the audit CSV to the client at the same time, in a folder labeled "month-end audit trail". Most clients won't open it. The 10% that do will trust you forever.
+
+Reply if you want me to walk through the audit format on a call — happy to do a quick screen-share for any buyer in the first 30 days.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/bookkeeper/03-day7.md
+++ b/marketing/emails/bookkeeper/03-day7.md
@@ -0,0 +1,32 @@
+# Bookkeeper · Day 7 — One pipeline, every client, every month
+
+**Subject:** One pipeline, every client, every month
+**Send:** Day 7
+**Goal:** reframe from one-off tool to monthly workflow
+
+---
+
+Hi {{first_name}},
+
+A week in. By now you've probably run DataTools on 1-2 client exports and confirmed it does what the landing page promised.
+
+The thing buyers tell me they wish they'd done from day one: **set it up as a workflow, not a one-off.**
+
+The pattern that works:
+
+**1. Make a folder per client.** Inside each client folder, a subfolder per month: `Acme Co/2026-05/`. Drop the raw export here.
+
+**2. Save your DataTools settings as a per-client preset.** The "Save settings" button in the analyzer drops a `.datatools-preset.json` file. Stash that in the client folder. Next month, load the preset and the analyzer pre-configures with the rules you tuned for that client (e.g., your "Amazon Marketplace" canonical name, your client's specific merchant aliases).
+
+**3. Run the pipeline. Get three files back:** the cleaned CSV, the audit CSV, the gate report. Move them into `Acme Co/2026-05/cleaned/`.
+
+**4. Import the cleaned CSV to QuickBooks. Email the audit CSV to the client.**
+
+Total elapsed time per client per month, after the first: 3-5 minutes. The first month per client is longer (~15 min) because you're tuning the preset.
+
+The buyers who do this are the ones still emailing me 3 months later — usually with feature requests for the next client they want to onboard. The buyers who only ever run it ad-hoc tend to drift back to spreadsheets within 2 months.
+
+If you want, reply with a sanitized export and I'll show you what your starting preset should look like — happy to do this for the first 50 buyers.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/bookkeeper/04-day14.md
+++ b/marketing/emails/bookkeeper/04-day14.md
@@ -0,0 +1,35 @@
+# Bookkeeper · Day 14 — Two-minute trick: the gate report
+
+**Subject:** Two-minute trick: the gate report
+**Send:** Day 14
+**Goal:** surface the gate tool — non-obvious, high-value once seen
+
+---
+
+Hi {{first_name}},
+
+The tool inside DataTools that buyers find last is the **gate** — and it's the one that quietly does the most for you.
+
+What it does: before any row gets written to the cleaned CSV, the gate runs a per-row pass-through check. Rows that fail get *quarantined* into a separate file (`<filename>.quarantine.csv`) instead of silently dropped or silently passed.
+
+Default rules (you can add your own):
+
+- Missing required fields (date, amount)
+- Amount in unexpected currency without a flag
+- Date outside the export's stated range (catches the "totals row" issue from Day 1)
+- Duplicate of another row already in the file (per the dedupe pass)
+- Confidence below your threshold on a field that got auto-corrected
+
+The 2-minute workflow:
+
+1. Run the pipeline as usual.
+2. Open `<filename>.quarantine.csv`. (It'll be tiny — typically 0-5% of rows.)
+3. Eyeball it. Anything that's a real transaction, fix-and-re-include manually. Anything that's a totals row / blank row / corrupt row — confirm it's correctly quarantined and delete it.
+4. Re-run the pipeline on the fixed-up version (or just append the manually-fixed rows to the cleaned CSV).
+
+The reason this matters: silent drops are the worst possible failure mode for a bookkeeper. You'd rather a row come out wrong (you'll catch it on review) than disappear (you won't catch it for months). The gate makes the silent-drop case impossible.
+
+Set the gate's confidence threshold to `0.85` for client work. Lower (0.75) for personal / exploratory; higher (0.92+) only if you've spent time tuning your client's preset.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/bookkeeper/05-day30.md
+++ b/marketing/emails/bookkeeper/05-day30.md
@@ -0,0 +1,26 @@
+# Bookkeeper · Day 30 — Heard from a fellow bookkeeper?
+
+**Subject:** Heard from a fellow bookkeeper?
+**Send:** Day 30
+**Goal:** referral / review ask. Last touch in the sequence.
+
+---
+
+Hi {{first_name}},
+
+A month in. If DataTools earned its $49 — would you do me one (very small) favor?
+
+**Pick one of these. Whichever is easiest.**
+
+1. **Gumroad review** (60 seconds): {{download_url}}#reviews — even a single line helps the next bookkeeper trust the listing enough to click "buy".
+2. **Reply to this email with one sentence I can quote** on the bookkeeper landing page. Anonymous if you prefer; I'll never use a name without explicit permission.
+3. **Share the landing page** with one bookkeeper friend who'd benefit: {{landing_page}}. No referral commission scheme, just a link.
+
+If DataTools *didn't* earn its $49 — also reply. Tell me what's missing or what's broken. The 30-day refund window is still open and I'd rather refund a buyer who didn't get value than have an unhappy customer in the wild.
+
+Either way, this is the last automated email you'll get from me. After this you only hear from me when there's a v1.x update or if you reply to one of the previous emails.
+
+Thanks for being an early buyer — the first 50 customers shape the next 5,000.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/revops/00-delivery.md
+++ b/marketing/emails/revops/00-delivery.md
@@ -0,0 +1,34 @@
+# RevOps · Day 0 — Delivery email
+
+**Subject:** Your DataTools download (start here)
+**Send:** immediately on Gumroad purchase confirmation
+**Goal:** download + first run within 24h
+
+---
+
+Hi {{first_name}},
+
+Thanks for buying DataTools. Your download:
+
+→ **{{download_url}}**
+
+Three things to do in the next 5 minutes:
+
+**1. Download the installer for your OS** (Mac `.dmg`, Windows `.exe`, or Linux `.tar.gz`). About 280 MB. The link auto-detects.
+
+**2. Run it.** First launch takes ~5 seconds; a browser tab opens to `127.0.0.1:8501`. That's the app — running locally on your machine. No data leaves the box. (Yes, even if you're on the corporate VPN. Especially then.)
+
+**3. Drop in a real lead list.** Don't bother with the bundled samples — the gate report only gets interesting when the data is real. Pull last quarter's webform export, or your most recent Apollo / LinkedIn pull, drag it into the analyzer, and click **"Run all"**. You'll see what the dedupe + format pipeline does in about 30 seconds.
+
+If something doesn't work: just reply. I read every reply.
+
+Refund: also just reply. 30-day no-questions; no form.
+
+Tomorrow I'll send a sample 3-vendor lead list (HubSpot + LinkedIn + Apollo, synthetic data) so you can see the dedupe confidence tiers in action on a known input. After that you'll get one email a week for the next month — practical tips, no upsell. Unsubscribe at the bottom of any of them.
+
+Welcome aboard.
+
+— Michael
+{{support_email}}
+
+P.S. If you have a RevOps friend who'd find this useful: {{landing_page}}.
--- a/marketing/emails/revops/01-day1.md
+++ b/marketing/emails/revops/01-day1.md
@@ -0,0 +1,36 @@
+# RevOps · Day 1 — Try it on this 3-vendor lead list first
+
+**Subject:** Try it on this 3-vendor lead list first
+**Send:** Day 1, ~9am buyer-local-time
+
+---
+
+Hi {{first_name}},
+
+Yesterday's email had your download. Today's email has a *file* — a synthetic 3-vendor lead list (HubSpot + LinkedIn scrape + Apollo pull) that I built specifically to break naive dedupe.
+
+→ **{{sample_file_url}}** (1.2 MB CSV, 4,800 rows — fully synthetic, no real prospects)
+
+What's hidden in there:
+
+- The same person from 3 sources, with intentionally inconsistent fields:
+  - HubSpot row: full email + company; no LinkedIn URL
+  - LinkedIn row: name + title + LinkedIn URL; no email
+  - Apollo row: email + phone + company; misspelled name
+- ~120 obvious duplicates (same email, different case)
+- ~80 cross-source duplicates (different keys, same person — these are the ones HubSpot's native dedupe misses)
+- ~40 phone numbers in 5 different formats per country (+1, +44, +61)
+- One row per 200 with a hidden zero-width space in the email
+
+Drop it into DataTools, click **"Run all"** in the analyzer, then run the **dedupe** tool with the default 0.85 threshold.
+
+Look at three things in the output:
+
+1. **The cleaned CSV** — what your import would look like
+2. **The audit CSV** — every change, every rule, confidence per change
+3. **The manual-review queue** (`<filename>.review.csv`) — the 0.85-0.95 confidence range. This is where the real dedupe value is; auto-merging this range is what gets people in trouble.
+
+Try it once on the sample, then once on a real list. Reply and tell me what it caught (or missed) — the v1.1 fuzzy-matching tuning comes from real-world feedback.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/revops/02-day3.md
+++ b/marketing/emails/revops/02-day3.md
@@ -0,0 +1,36 @@
+# RevOps · Day 3 — The dedupe rule that catches LinkedIn drift
+
+**Subject:** The dedupe rule that catches LinkedIn drift
+**Send:** Day 3
+**Goal:** deepen feature understanding around the cross-source dedupe
+
+---
+
+Hi {{first_name}},
+
+The thing native HubSpot / Salesforce dedupe can't do, and the thing DataTools is actually best at: **cross-source matching**, where the same person shows up via LinkedIn, a webform, and a trade-show import — with no shared key.
+
+The rule that does the work is in the dedupe tool's **"Block by domain, fuzzy on name+title"** mode. Here's what it does:
+
+**Step 1 — Block.** Group rows by email domain. (LinkedIn rows with no email get bucketed by `domain(linkedin_url)` — usually their company website if they listed it.) This avoids the O(n²) explosion and rules out cross-company false positives.
+
+**Step 2 — Within each block, fuzzy-match on `first_name + last_name + title`.** Token-set ratio at 0.85 default. Catches:
+
+- "Sarah O'Brien, VP Marketing" = "sarah obrien, vp of marketing"
+- "Mike Chen, Head of Sales" = "Michael Chen, Sales Lead" (this one needs a 0.78 threshold; configurable)
+- "J. Smith, Director" = "Jane Smith, Director" (only with a strong company-name match)
+
+**Step 3 — Confidence-tier the merge.** ≥0.95 auto-merges. 0.85-0.95 goes to `<filename>.review.csv` for you to eyeball. <0.85 stays unmerged.
+
+**Step 4 — Field-precedence on merge.** When records merge, you choose which source wins per field. Default precedence (configurable):
+
+- `title`, `company`, `linkedin_url` → LinkedIn wins (more recent)
+- `email`, `phone` → Webform wins (verified)
+- `lifecycle_stage`, `owner` → HubSpot wins (your CRM is canonical)
+
+**One trap to avoid:** don't run dedupe before format standardization. If phone formats are inconsistent across sources, the dedupe tool sees "+14155550143" and "(415) 555-0143" as different keys. Always run **format → analyzer → dedupe → gate** in that order. The pipeline UI enforces this; the per-tool runs don't.
+
+Reply if you want me to walk through the precedence config on a screen-share — happy to do this for any buyer in the first 30 days.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/revops/03-day7.md
+++ b/marketing/emails/revops/03-day7.md
@@ -0,0 +1,34 @@
+# RevOps · Day 7 — Run it before every HubSpot import
+
+**Subject:** Run it before every HubSpot import
+**Send:** Day 7
+**Goal:** reframe from one-off tool to per-campaign workflow
+
+---
+
+Hi {{first_name}},
+
+A week in. By now you've probably run DataTools on a real list once or twice and confirmed the dedupe catches more than HubSpot's native check.
+
+The thing that turns DataTools into a per-month-cost saver instead of a one-off purchase: **make it the gate on every import.**
+
+The pattern that works:
+
+**1. One DataTools run per campaign source.** Webform pull → DataTools. LinkedIn scrape → DataTools. Apollo export → DataTools. Each run produces a "clean" CSV.
+
+**2. Concatenate the cleaned CSVs.** Standard pandas `concat` or just paste in Excel.
+
+**3. One more DataTools run on the concatenation.** This is the cross-source dedupe pass — the one that catches the same person across the three sources.
+
+**4. Compare against your current HubSpot export.** DataTools' dedupe against your existing CRM as the second source catches the people you already paid for last quarter and don't need to import again.
+
+**5. Import only the residue** — the rows that survived all four passes — into HubSpot.
+
+The buyers running this pipeline tell me they've cut their HubSpot marketing-contact bill 15-25% within two months. Not because their pipeline got smaller — because they stopped paying for duplicates.
+
+**One thing to set up once:** save your dedupe settings as a `.datatools-preset.json` and commit it to your RevOps team's repo (or a shared Drive folder). Same preset every campaign means consistent results across whoever's running it that week.
+
+If you want, reply with a sanitized lead list and I'll suggest a starting preset for your sources — happy to do this for the first 50 buyers.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/revops/04-day14.md
+++ b/marketing/emails/revops/04-day14.md
@@ -0,0 +1,34 @@
+# RevOps · Day 14 — Two-minute trick: the confidence tiers
+
+**Subject:** Two-minute trick: the confidence tiers
+**Send:** Day 14
+**Goal:** surface the manual-review queue — non-obvious, high-value
+
+---
+
+Hi {{first_name}},
+
+The single most-skipped feature in DataTools is also the one with the highest payoff per minute: the **manual-review queue**.
+
+Here's what's happening under the hood: every dedupe decision DataTools makes has a confidence score (0.0 to 1.0). The dedupe tool by default puts decisions into three buckets:
+
+- **≥0.95** → auto-merge (cleaned CSV)
+- **0.85 - 0.95** → manual-review queue (`<filename>.review.csv`)
+- **<0.85** → unmerged (kept as separate rows)
+
+The 0.85-0.95 bucket is the magic. It's the range where a tuned algorithm catches *most* duplicates but where the wrong choice is a real cost (merging two genuinely different people = lost prospect; not merging two duplicates = paid contact you didn't need).
+
+The 2-minute workflow:
+
+1. Run dedupe.
+2. Open `<filename>.review.csv`. Each row is a candidate merge with: confidence, the two records side-by-side, the rule that fired.
+3. Eyeball each row. Mark `keep_merge` (Y/N) in the rightmost column.
+4. Re-run dedupe with the `--apply-review-decisions <filename>.review.csv` flag (or click "Apply review decisions" in the GUI).
+5. Final cleaned CSV reflects your manual choices.
+
+For a 5,000-row lead list, the review queue is typically 20-60 rows. ~3 minutes of work. The output is dramatically better than auto-merge-everything-≥0.85, which is what most tools (including HubSpot's) do silently.
+
+**Pro move:** save your `keep_merge` decisions over time. After 3-4 campaigns you'll have a corpus of "yes-merges" and "no-merges" you can use to retune the auto-merge threshold for *your* data. Most teams find their sweet spot is somewhere in 0.88-0.92.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/revops/05-day30.md
+++ b/marketing/emails/revops/05-day30.md
@@ -0,0 +1,26 @@
+# RevOps · Day 30 — Heard from another RevOps lead?
+
+**Subject:** Heard from another RevOps lead?
+**Send:** Day 30
+**Goal:** referral / review ask
+
+---
+
+Hi {{first_name}},
+
+A month in. If DataTools earned its $49 — would you do me one small favor?
+
+**Pick the one that's easiest.**
+
+1. **Gumroad review** (60 seconds): {{download_url}}#reviews — every line helps the next RevOps lead trust the listing enough to click "buy".
+2. **Reply to this email with one sentence I can quote** on the RevOps landing page. Anonymous if you prefer; I'll never use a name without explicit permission.
+3. **Share the landing page** with one RevOps friend who'd benefit: {{landing_page}}. No referral commission, just a link.
+
+If DataTools *didn't* earn its $49 — also reply. Tell me what's missing or broken. The 30-day refund window is still open and I'd rather refund than have an unhappy customer in the wild.
+
+Either way, this is the last automated email you'll get from me. After this you only hear from me when there's a v1.x update or if you reply to one of the previous emails.
+
+Thanks for being an early buyer — the first 50 customers shape the next 5,000.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/shopify-pet/00-delivery.md
+++ b/marketing/emails/shopify-pet/00-delivery.md
@@ -0,0 +1,34 @@
+# Shopify-pet · Day 0 — Delivery email
+
+**Subject:** Your DataTools download (start here)
+**Send:** immediately on Gumroad purchase confirmation
+**Goal:** download + first run within 24h
+
+---
+
+Hi {{first_name}},
+
+Thanks for buying DataTools. Your download:
+
+→ **{{download_url}}**
+
+Three things to do in the next 5 minutes:
+
+**1. Download the installer for your OS** (Mac `.dmg`, Windows `.exe`, or Linux `.tar.gz`). About 280 MB. The link auto-detects.
+
+**2. Run it.** First launch takes ~5 seconds; a browser tab opens to `127.0.0.1:8501`. That's the app — running locally on your machine. No data leaves the box. Your customer list never goes to a server.
+
+**3. Drop in a real Shopify customer export.** Don't bother with the bundled samples. Customers > Export > "All customers" > CSV in Shopify admin. Drag it into DataTools' analyzer, click **"Run all"**. You'll see what it catches — typically a few hundred phone-format issues, some hidden-character emails, and a handful of cross-row duplicates — in about 30 seconds.
+
+If something doesn't work: reply to this email. Goes to my inbox.
+
+Refund: also reply. 30-day no-questions; no form.
+
+Tomorrow I'll send a sample Shopify customer export with the tricky cases pre-built in, so you can see what the cleanup catches on a known input. After that you'll get one email a week for the next month with one tip each. Unsubscribe at the bottom of any of them.
+
+Welcome aboard.
+
+— Michael
+{{support_email}}
+
+P.S. Got a fellow store owner who'd find this useful? {{landing_page}}.
--- a/marketing/emails/shopify-pet/01-day1.md
+++ b/marketing/emails/shopify-pet/01-day1.md
@@ -0,0 +1,32 @@
+# Shopify-pet · Day 1 — Try it on this Shopify customer export first
+
+**Subject:** Try it on this Shopify customer export first
+**Send:** Day 1, ~9am buyer-local-time
+
+---
+
+Hi {{first_name}},
+
+Yesterday's email had your download. Today's email has a *file* — a synthetic Shopify customer export I built specifically to break things Klaviyo silently chokes on.
+
+→ **{{sample_file_url}}** (480 KB CSV, 2,200 rows — fully synthetic, no real customer data)
+
+What's hidden in there:
+
+- Phone numbers in 6 different formats (`(415) 555-0143`, `415.555.0143`, `4155550143`, `+44 20 7946 0958` without country field, `+1-415-555-0143 ext 12`, `415 555 0143`)
+- Email addresses with embedded zero-width spaces (looks identical to a clean email; Klaviyo treats as different addresses)
+- ~80 obvious customer duplicates (same email, different case)
+- ~40 cross-row duplicates (different email, same name + same shipping address — usually the same person ordering with two emails)
+- Shipping addresses with mixed `St.` / `Street` / `St` / `STREET` for the same street name
+- 12 customers from outside North America with country field blank
+
+Drop it into DataTools. Click **"Run all"** in the analyzer. Then run **format → dedupe → text-clean → gate** in that order.
+
+Look at the **gate report** at the end — it'll tell you exactly which rows would have broken Klaviyo, with a one-line "why" per row.
+
+If you want to see the difference: import the **raw** file to a test Klaviyo list, then import the **cleaned** file to a different test list. Compare the SMS-deliverable count. The delta is what you've been losing every month.
+
+Reply and tell me what it caught (or missed) — v1.1 detector improvements come from real-world feedback.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/shopify-pet/02-day3.md
+++ b/marketing/emails/shopify-pet/02-day3.md
@@ -0,0 +1,33 @@
+# Shopify-pet · Day 3 — The phone-format step Klaviyo cares about
+
+**Subject:** The phone-format step Klaviyo cares about
+**Send:** Day 3
+**Goal:** deepen feature understanding around the format standardizer
+
+---
+
+Hi {{first_name}},
+
+The single biggest source of "Klaviyo dropped this customer silently" is phone formatting. DataTools fixes this in one tool — the **format standardizer** — but the *settings* matter.
+
+Klaviyo (and basically every modern SMS platform) wants phones in **E.164** format: `+` then country code then number, no spaces, no dashes, no extension. Like: `+14155550143`.
+
+Three settings in DataTools' format standardizer that get this right:
+
+**1. Set "Phone output format" to `E.164`.** Default is `national` (`(415) 555-0143`) — fine for display, broken for Klaviyo. Change it once; the preset remembers.
+
+**2. Set "Default country" per row, not per file.** This is the non-obvious one. For each customer:
+- If the `country` field has a value (e.g., "Canada", "CA", "Canadá"), use it.
+- If blank, fall back to the country in the *shipping address*.
+- If still blank, fall back to the file-level default (you set this — typically your store's primary market).
+
+DataTools does this automatically when you check "Use per-row country detection". *Skip this and ~30% of international customers will end up with US country codes prepended to their numbers — which Klaviyo accepts but routes wrong, and your SMS never arrives.*
+
+**3. Set "Quarantine un-parseable phones" to ON.** Don't drop them silently; don't pass them to Klaviyo broken. Send them to `<filename>.quarantine.csv` so you can fix the worst 10-20 by hand and re-include them.
+
+The combination — E.164 + per-row country + quarantine — typically takes a Shopify export from "60-70% of phones survive Klaviyo's import" to "97-99%". On a 10,000-customer list, that's 2,500 - 3,500 more customers reachable per campaign.
+
+Reply if you want me to walk through these settings on a screen-share — happy to do this for any buyer in the first 30 days.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/shopify-pet/03-day7.md
+++ b/marketing/emails/shopify-pet/03-day7.md
@@ -0,0 +1,35 @@
+# Shopify-pet · Day 7 — Run it before every Klaviyo sync
+
+**Subject:** Run it before every Klaviyo sync
+**Send:** Day 7
+**Goal:** reframe from one-off tool to per-sync workflow
+
+---
+
+Hi {{first_name}},
+
+A week in. By now you've probably run DataTools on a real customer export once or twice and seen the cleanup catch things you'd been losing in Klaviyo for months.
+
+The thing that turns DataTools into a recurring win instead of a one-off purchase: **run it before every sync, not just the first time.**
+
+The pattern that works for most stores:
+
+**1. Pick a cadence.** Most stores I talk to do this monthly; high-volume stores do it weekly. The cadence should match your "I'm planning a campaign" rhythm.
+
+**2. The Sunday-morning ritual:**
+- Pull a fresh customer export from Shopify (Customers > Export > "All customers")
+- Drop into DataTools
+- Run the pipeline (analyzer → format → text-clean → dedupe → gate)
+- Review the gate quarantine file (typically 0.5-2% of rows)
+- Push the cleaned CSV to Klaviyo (their CSV import or via their API)
+
+**3. Save your settings as a preset.** The "Save settings" button writes a `.datatools-preset.json`. Keep it in your store's Drive / Notion / wherever your shop docs live. Next month, load preset, run pipeline, done in 4 minutes.
+
+**4. After 3 months, retune the preset.** Look at your manual-review queue across the 3 runs. If you're consistently approving 0.86-confidence merges, drop the auto-merge threshold to 0.85. If you're rejecting 0.92 merges, raise it to 0.94. The preset improves with use.
+
+The store owners doing this monthly tell me their open rates go up 8-15% in the first 90 days — not from new content, just from the email actually reaching the inbox.
+
+If you want, reply with a sanitized export and I'll suggest a starting preset for your store — happy to do this for the first 50 buyers.
+
+— Michael
+{{support_email}}
--- a/marketing/emails/shopify-pet/04-day14.md
+++ b/marketing/emails/shopify-pet/04-day14.md
@@ -0,0 +1,32 @@
+# Shopify-pet · Day 14 — Two-minute trick: hidden-character cleanup
+
+**Subject:** Two-minute trick: hidden-character cleanup
+**Send:** Day 14
+**Goal:** surface the text cleaner — non-obvious, high-value
+
+---
+
+Hi {{first_name}},
+
+The tool inside DataTools that buyers find last is the **text cleaner** — and on Shopify customer exports it's usually the one with the most "wait, that was a problem?" moments.
+
+What it catches: invisible characters that got into your customer data when customers typed on their phones. The most common offenders:
+
+- **Zero-width space** (`U+200B`) inside emails — Klaviyo treats `sarah@acme.com` (with hidden char) and `sarah@acme.com` (without) as different addresses
+- **Non-breaking space** (`U+00A0`) inside addresses — Shopify accepts it, Klaviyo accepts it, but USPS address validation fails on it
+- **BOM marker** (`U+FEFF`) at the start of CSV cells — usually from a customer pasting from Word or a PDF
+- **Right-to-left mark** (`U+200F`) — rare, but appears in customer names from Hebrew/Arabic locales
+
+The 2-minute workflow:
+
+1. After the format standardizer pass, run the text cleaner.
+2. It produces an additional sidecar file: `<filename>.hidden-chars.csv` — every cell where it found a hidden char, with a "what was hidden where" annotation.
+3. Skim it. Most are fine to silently strip (zero-width spaces, BOMs). For rare ones (right-to-left marks in a name), confirm before stripping — sometimes they're load-bearing.
+4. Click "Apply cleanup". The text cleaner replaces the hidden chars in the cleaned CSV.
+
+The reason this matters: **dedupe runs after text-clean.** Two emails with a hidden char difference look identical in the GUI but get treated as two separate customers — and your dedupe pass won't catch them unless the text cleaner ran first.
+
+The pipeline order baked into the GUI is: `analyzer → format → text-clean → dedupe → gate`. Stick to it; per-tool runs out of order are the most common source of "wait, why didn't dedupe catch this?".
+
+— Michael
+{{support_email}}
--- a/marketing/emails/shopify-pet/05-day30.md
+++ b/marketing/emails/shopify-pet/05-day30.md
@@ -0,0 +1,26 @@
+# Shopify-pet · Day 30 — Heard from another store owner?
+
+**Subject:** Heard from another store owner?
+**Send:** Day 30
+**Goal:** referral / review ask
+
+---
+
+Hi {{first_name}},
+
+A month in. If DataTools earned its $49 — would you do me one small favor?
+
+**Pick the one that's easiest.**
+
+1. **Gumroad review** (60 seconds): {{download_url}}#reviews — every line helps the next Shopify owner trust the listing enough to click "buy".
+2. **Reply to this email with one sentence I can quote** on the landing page. Anonymous if you prefer; I'll never use a name without explicit permission.
+3. **Share the landing page** with one fellow store owner who'd benefit: {{landing_page}}. No referral commission, just a link.
+
+If DataTools *didn't* earn its $49 — also reply. Tell me what's missing or broken. The 30-day refund window is still open and I'd rather refund than have an unhappy customer in the wild.
+
+Either way, this is the last automated email you'll get from me. After this you only hear from me when there's a v1.x update or if you reply to one of the previous emails.
+
+Thanks for being an early buyer — the first 50 customers shape the next 5,000.
+
+— Michael
+{{support_email}}