feat: Tier B operator scaffolding — bundle, copy SoT, posts, emails

Pick up and finish yesterday's cut-off Tier B pass.

- build/: PyInstaller scaffold (datatools.spec + launcher.py +
  hook-streamlit.py + README) — folder-mode bundle, locked
  127.0.0.1, per-OS recipe
- marketing/COPY.md: single source of truth for every customer-facing
  string — landing H1/sub/CTAs, demo CTAs, email subjects, Gumroad
  listing, banned phrases
- marketing/community-posts/: 9 drafts (3 posts × 3 niches:
  bookkeeper, revops, shopify-pet) — story / tip / soft-offer
- marketing/emails/: 18 drafts (Gumroad delivery + 5-touch
  onboarding × 3 niches), per-niche segmentation guidance
- docs/NEXT-STEPS.md: flip 2.2 / 2.4 / 3.1 / 3.4 to done with
  pointers to the new assets; add Phase 0 inventory rows
- .gitignore: narrow `build/` ignore so PyInstaller spec + launcher
  + hooks get tracked, only generated artifacts (build/build/,
  build/__pycache__/, build/dist/) stay ignored

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-02 14:04:37 +00:00
parent 966af8ef94
commit e1f364f010
36 changed files with 1741 additions and 15 deletions

206
build/README.md Normal file
View File

@@ -0,0 +1,206 @@
# Build — DataTools desktop installer
> Cross-platform PyInstaller bundle for Mac / Windows / Linux. The
> single deliverable the buyer downloads from Gumroad.
> **Owner**: Michael · **Updated**: 2026-05-01
This directory is the build pipeline. Source of truth for the bundle
shape, hidden-import lists, per-platform recipes, and the launcher
that boots Streamlit inside the bundle.
## Files
```
build/
├── launcher.py Entry point PyInstaller wraps. Boots a local
│ Streamlit server, opens browser, locks server
│ to 127.0.0.1 so the privacy claim holds.
├── datatools.spec PyInstaller spec — hidden imports, data files,
│ Mac .app bundle config.
├── hooks/ PyInstaller hooks for libs the static analyser
│ └── hook-streamlit.py misses (Streamlit's dynamic imports).
├── icon.icns macOS app icon (TODO: produce from a 1024×1024
│ PNG. Optional — bundle still builds without).
├── icon.ico Windows app icon (TODO).
└── README.md this file
```
## Per-platform recipe
Each platform builds on its own machine — PyInstaller does **not**
cross-compile. Pick the platform that matches the bundle you need.
GitHub Actions matrix runners are the simplest way to produce all
three from one push (see "CI build" below).
### Mac (Intel + Apple Silicon, universal2)
```bash
# One-time:
pyenv install 3.12
pyenv local 3.12
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install pyinstaller
# Build:
pyinstaller build/datatools.spec --clean
# Output:
# dist/DataTools/ — folder mode (faster cold start)
# dist/DataTools.app/ — macOS .app bundle (drag-drop into /Applications)
# Sign + notarize (after Apple Developer Program enrollment per BUSINESS.md §10):
codesign --deep --force --options runtime \
--sign "Developer ID Application: <YOUR-NAME> (<TEAMID>)" \
dist/DataTools.app
# Notarize:
xcrun notarytool submit dist/DataTools.app \
--apple-id "<YOUR-APPLE-ID>" \
--team-id "<TEAMID>" \
--password "<APP-SPECIFIC-PASSWORD>" \
--wait
# Staple the notarization ticket so Gatekeeper sees it offline:
xcrun stapler staple dist/DataTools.app
# Wrap for distribution:
hdiutil create -volname "DataTools" -srcfolder dist/DataTools.app \
-ov -format UDZO dist/DataTools-1.0.0-mac.dmg
```
### Windows
```powershell
# One-time:
py -3.12 -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
pip install pyinstaller
# Build:
pyinstaller build\datatools.spec --clean
# Output:
# dist\DataTools\ — folder mode
# dist\DataTools\DataTools.exe
# Wrap with Inno Setup (free):
# 1. Install Inno Setup (https://jrsoftware.org/isdl.php)
# 2. Create installer.iss next to this README:
# [Setup]
# AppName=DataTools
# AppVersion=1.0.0
# DefaultDirName={autopf}\DataTools
# OutputDir=..\..\dist
# OutputBaseFilename=DataTools-1.0.0-win-setup
# Compression=lzma
# SolidCompression=yes
# [Files]
# Source: "..\..\dist\DataTools\*"; DestDir: "{app}"; Flags: recursesubdirs
# [Icons]
# Name: "{autoprograms}\DataTools"; Filename: "{app}\DataTools.exe"
# 3. Compile: ISCC.exe build\installer.iss
# Code-sign (optional but reduces SmartScreen warnings):
# Use signtool with a code-signing cert (Sectigo / DigiCert).
# Without signing, buyer sees "Windows protected your PC" once;
# they click "More info → Run anyway." Acceptable for v1.
```
### Linux (AppImage)
```bash
python3.12 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install pyinstaller
pyinstaller build/datatools.spec --clean
# dist/DataTools/ — folder mode
# Wrap as AppImage (single-file portable app):
# 1. Download appimagetool from https://appimage.org/
# 2. Set up the AppDir layout:
# DataTools.AppDir/
# ├── AppRun -> ./DataTools/DataTools
# ├── DataTools.desktop (icon + entry config)
# ├── icon.png
# └── usr/bin/ -> dist/DataTools/*
# 3. ./appimagetool DataTools.AppDir dist/DataTools-1.0.0-linux-x86_64.AppImage
```
## CI build (recommended once the spec is stable)
`.github/workflows/build.yml` (template):
```yaml
name: Build installers
on:
workflow_dispatch:
push:
tags: [ 'v*' ]
jobs:
build:
strategy:
matrix:
os: [macos-latest, windows-latest, ubuntu-latest]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: '3.12' }
- run: pip install -r requirements.txt pyinstaller
- run: pyinstaller build/datatools.spec --clean
- uses: actions/upload-artifact@v4
with:
name: DataTools-${{ matrix.os }}
path: dist/
```
Mac code-signing in CI requires the cert + private key as a GitHub
secret (encoded with `base64`). Detailed walkthrough belongs in a
later doc — for v1, sign locally and upload to GitHub Releases.
## Common pitfalls
| Symptom | Fix |
|---|---|
| Bundle is 800+ MB | Check the ``excludes`` list in ``datatools.spec``. ``matplotlib`` / ``scipy`` / ``tkinter`` are the usual suspects. |
| App launches, browser opens, page is blank | Streamlit's static assets aren't bundled. Re-run with `--log-level=DEBUG` and confirm the static dir was collected by `collect_data_files('streamlit')`. |
| App launches but logs ``ImportError: streamlit.runtime.X`` | Add ``X`` to ``hidden_imports`` in the spec or to ``hook-streamlit.py``. |
| Mac Gatekeeper says "DataTools is damaged and can't be opened" | The bundle wasn't signed + notarized. Don't ship to buyers without these — see the Mac recipe above. |
| Windows SmartScreen blocks first launch | Buyer clicks "More info → Run anyway". Code-signing reduces but doesn't eliminate this; for v1 it's an accepted friction. |
| Bundle works on dev machine but crashes on a clean machine | Likely a missing C runtime. On Windows, install [VC++ redistributable](https://aka.ms/vs/17/release/vc_redist.x64.exe) into the installer alongside the bundle. |
## Testing the bundle
Smoke-test on a **clean** machine (or VM) — your dev machine has too
much state to trust:
```
1. Boot a clean Mac / Win / Linux VM.
2. Copy the .dmg / .exe / .AppImage onto it.
3. Install / drag-drop into Applications / chmod +x.
4. Double-click the app icon.
5. Browser should open to http://127.0.0.1:850x within 5 seconds.
6. Drop samples/demo/shopify_pet_customers.csv into the
Pipeline Runner page; click Run; AFTER preview should appear.
7. Confirm in the network tab: zero outbound calls except to
127.0.0.1 and the Streamlit static asset paths (also local).
```
Step 7 is the privacy-claim integrity check from
`docs/POST-LAUNCH.md` §6 — do this once per release, then trust it.
## Versioning
Bump the version string in three places per release:
- `datatools.spec` (CFBundleVersion + CFBundleShortVersionString)
- the Inno Setup `AppVersion` line
- the AppImage filename
A single source of truth (e.g. `src/__init__.py`) is a future
refactor — for v1 the three-spot update is fine.

153
build/datatools.spec Normal file
View File

@@ -0,0 +1,153 @@
# PyInstaller spec for DataTools.
#
# Build (from the repo root, after ``pip install pyinstaller``):
#
# pyinstaller build/datatools.spec
#
# Output: ``dist/DataTools/`` (folder mode) and ``dist/DataTools.exe``
# (or platform equivalent) on Windows; ``dist/DataTools.app`` on macOS
# when packaged via ``--target-arch universal2``. See ``build/README.md``
# for the full per-platform recipe.
#
# Why folder-mode (one-dir) is the default:
# * Streamlit's static assets + Python interpreter + ~300 MB of deps
# compress poorly into onefile. Onefile mode unpacks every launch
# to a temp dir — adds 5-15 s startup latency that confuses
# non-technical buyers ("did it crash?").
# * Folder mode lets the installer (Inno Setup on Win, .dmg on Mac)
# run a one-time copy. Subsequent launches are instant.
#
# Cross-platform note: this single spec file is built ON each target
# platform. Cross-compilation isn't supported — Mac builds need a
# Mac, Windows builds need a Windows machine (or a Windows GitHub
# Actions runner). See build/README.md for the matrix recipe.
# -*- mode: python ; coding: utf-8 -*-
from pathlib import Path
from PyInstaller.utils.hooks import (
collect_all,
collect_data_files,
collect_submodules,
)
# Repo root from this spec's location (PyInstaller sets SPECPATH).
REPO = Path(SPECPATH).resolve().parent
# ----- Hidden imports ------------------------------------------------
# PyInstaller's static analyser misses everything Streamlit reaches
# through ``importlib`` and the per-tool registries our app uses. We
# exhaustively pull every submodule of the libraries that bridge
# user code to runtime — better a 50 MB-bigger bundle than a runtime
# ImportError on the buyer's machine.
hidden_imports: list[str] = []
hidden_imports += collect_submodules("streamlit")
hidden_imports += collect_submodules("pandas")
hidden_imports += collect_submodules("phonenumbers")
hidden_imports += collect_submodules("rapidfuzz")
hidden_imports += collect_submodules("charset_normalizer")
hidden_imports += collect_submodules("openpyxl")
hidden_imports += collect_submodules("loguru")
# Our own engine + GUI modules. Even though we import them directly
# at the top of ``launcher.py`` / ``app.py``, the Streamlit
# session-state and per-page page discovery layers re-import via
# names that PyInstaller doesn't see.
hidden_imports += collect_submodules("src")
# ----- Data files ---------------------------------------------------
# Streamlit's static assets (the JS / CSS / fonts the browser fetches
# from the bundled HTTP server) are NOT Python files; PyInstaller
# can't auto-find them.
datas: list[tuple[str, str]] = []
# Streamlit's runtime assets.
datas += collect_data_files("streamlit", include_py_files=False)
# phonenumbers ships its country/area-code metadata as resources.
datas += collect_data_files("phonenumbers", include_py_files=False)
# Our application files. PyInstaller's bundler treats source as code
# (.pyc) by default; we add it again as data so the launcher's
# ``Path(sys._MEIPASS) / "src" / "gui" / "app.py"`` resolution works.
datas += [
(str(REPO / "src"), "src"),
(str(REPO / "samples" / "demo"), "samples/demo"),
(str(REPO / ".streamlit" / "config.toml"),".streamlit"),
]
# ----- Analysis ------------------------------------------------------
a = Analysis(
[str(REPO / "build" / "launcher.py")],
pathex=[str(REPO)],
binaries=[],
datas=datas,
hiddenimports=hidden_imports,
hookspath=[str(REPO / "build" / "hooks")],
hooksconfig={},
runtime_hooks=[],
excludes=[
# Ship-trim — PyInstaller pulls these in but we never need
# them, and they add ~80 MB combined.
"tkinter",
"matplotlib",
"scipy",
"IPython",
"jupyter",
"notebook",
"test",
"tests",
],
noarchive=False,
)
pyz = PYZ(a.pure)
exe = EXE(
pyz,
a.scripts,
[],
exclude_binaries=True,
name="DataTools",
debug=False,
bootloader_ignore_signals=False,
strip=False,
upx=True,
console=False, # GUI app — no terminal window on Win/Mac
disable_windowed_traceback=False,
icon=str(REPO / "build" / "icon.icns") if (REPO / "build" / "icon.icns").exists() else None,
)
coll = COLLECT(
exe,
a.binaries,
a.datas,
strip=False,
upx=True,
upx_exclude=[],
name="DataTools",
)
# macOS .app bundle wrapper. PyInstaller produces it only on Mac;
# this block is a no-op on Win/Linux.
import sys as _sys
if _sys.platform == "darwin":
app = BUNDLE(
coll,
name="DataTools.app",
icon=str(REPO / "build" / "icon.icns") if (REPO / "build" / "icon.icns").exists() else None,
bundle_identifier="com.datatools.desktop",
info_plist={
"CFBundleDisplayName": "DataTools",
"CFBundleVersion": "1.0.0",
"CFBundleShortVersionString": "1.0.0",
"NSHighResolutionCapable": True,
# Buyer's macOS will not show the app's window in the dock
# if this is True. We want the dock icon so the buyer can
# see the app is running while the browser tab is open.
"LSUIElement": False,
},
)

View File

@@ -0,0 +1,30 @@
"""PyInstaller hook for Streamlit.
The runtime needs three things PyInstaller's static analyser misses:
1. Every submodule of ``streamlit`` (the framework reaches into
``streamlit.runtime`` / ``streamlit.web`` / ``streamlit.elements``
via dynamic import).
2. The static front-end assets (JS / CSS / fonts) under
``streamlit/static/``.
3. The vendored config / proto schemas under
``streamlit/runtime/scriptrunner/`` etc.
The main spec already calls ``collect_all('streamlit')`` so this
hook is mostly belt-and-braces — but PyInstaller picks hooks up by
name, and a missing hook can produce confusing runtime errors when
Streamlit upgrades. Keeping it explicit here documents the
dependency.
"""
from PyInstaller.utils.hooks import collect_all, collect_data_files, collect_submodules
datas, binaries, hiddenimports = collect_all("streamlit")
# Belt-and-braces: explicitly include the static directory.
datas += collect_data_files("streamlit", subdir="static", include_py_files=False)
# Some Streamlit components are loaded by name from the registry.
hiddenimports += collect_submodules("streamlit.elements")
hiddenimports += collect_submodules("streamlit.runtime")
hiddenimports += collect_submodules("streamlit.web")

138
build/launcher.py Normal file
View File

@@ -0,0 +1,138 @@
"""DataTools desktop launcher.
This is the entry point PyInstaller wraps for Mac / Windows / Linux
installers. Double-clicking the produced binary boots a local
Streamlit server (``127.0.0.1:<random-free-port>``), opens the user's
default browser at that URL, and keeps the server alive until the
window is closed or the binary is killed.
Why a launcher instead of pointing PyInstaller at ``src/gui/app.py``:
* Streamlit's CLI normally bootstraps the server via the
``streamlit run`` command. PyInstaller-bundled apps can't shell
out to ``streamlit`` because the CLI script lives inside the
bundle. We invoke Streamlit's bootstrap directly via
:func:`streamlit.web.bootstrap.run`.
* A free port has to be picked at runtime — buyers will have other
services running on 8501.
* The "open browser" step is the buyer's only feedback that
something happened; without it they'd see a black terminal flash
on Windows and conclude the app didn't start.
Local-dev equivalent (no installer):
streamlit run src/gui/app.py
"""
from __future__ import annotations
import os
import socket
import sys
import threading
import time
import webbrowser
from pathlib import Path
def _find_free_port(start: int = 8501, span: int = 50) -> int:
"""Return a TCP port that's free on the loopback interface.
Prefer 8501 (Streamlit's traditional default — buyer recognises
the URL from any docs they've read) and fall back to the next
free port in a small range. We don't fall back to OS-allocated
(port=0) because the buyer's URL should look stable across
restarts within one session.
"""
for offset in range(span):
port = start + offset
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
try:
s.bind(("127.0.0.1", port))
return port
except OSError:
continue
# Last resort: kernel-assigned ephemeral port.
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(("127.0.0.1", 0))
return s.getsockname()[1]
def _resolve_app_path() -> Path:
"""Locate ``src/gui/app.py`` whether running from source or a frozen bundle.
PyInstaller's ``onefile`` mode unpacks resources into a temp
directory pointed at by ``sys._MEIPASS``. Bundled mode uses that
directory; source mode walks up from this file.
"""
if getattr(sys, "frozen", False) and hasattr(sys, "_MEIPASS"):
# Frozen: app.py was bundled as a data file (see datatools.spec).
return Path(sys._MEIPASS) / "src" / "gui" / "app.py" # type: ignore[attr-defined]
return Path(__file__).resolve().parent.parent / "src" / "gui" / "app.py"
def _open_browser_when_ready(url: str, delay: float = 1.5) -> None:
"""Open the buyer's default browser to *url* after a short delay.
The delay gives Streamlit's HTTP server time to bind. Without it,
the browser races the server and renders a "couldn't connect"
page that confuses non-technical buyers. 1.5 s is conservative
on slow Windows machines; faster machines will see a brief
blank tab.
"""
def _open() -> None:
time.sleep(delay)
webbrowser.open(url, new=2)
threading.Thread(target=_open, daemon=True).start()
def main() -> int:
"""Boot the local Streamlit server and open the browser."""
app_path = _resolve_app_path()
if not app_path.exists():
sys.stderr.write(
f"DataTools could not find its UI script at {app_path}.\n"
"This is usually a bundle-build error. Re-install or "
"contact support@datatools.app.\n"
)
return 2
port = _find_free_port()
url = f"http://127.0.0.1:{port}/"
# Pre-set Streamlit options the bundle ships locked. ``server.address``
# = 127.0.0.1 enforces "no network exposure" — Streamlit's default
# is 0.0.0.0 which would expose the GUI to the LAN. The privacy
# claim on the landing pages depends on this.
os.environ.setdefault("STREAMLIT_SERVER_ADDRESS", "127.0.0.1")
os.environ.setdefault("STREAMLIT_SERVER_PORT", str(port))
os.environ.setdefault("STREAMLIT_SERVER_HEADLESS", "true")
os.environ.setdefault("STREAMLIT_BROWSER_GATHER_USAGE_STATS", "false")
# Print before opening the browser so the terminal log doesn't
# scroll behind the new browser tab on macOS.
print(f"DataTools is running at {url}")
print("Close this window or press Ctrl+C to stop.")
_open_browser_when_ready(url)
# Streamlit's bootstrap entry point — equivalent to running
# ``streamlit run app.py`` but in-process so PyInstaller's bundled
# interpreter handles it without shelling out to a separate script.
from streamlit.web import bootstrap
bootstrap.run(
str(app_path),
is_hello=False,
args=[],
flag_options={
"server.address": "127.0.0.1",
"server.port": port,
"server.headless": True,
"browser.gatherUsageStats": False,
},
)
return 0
if __name__ == "__main__":
sys.exit(main())