Files
datatools-dev/docs/ARCHITECTURE.md
Michael 624f99653e docs(arch): end-to-end system + tech-stack diagrams
New ARCHITECTURE.md pulls the desktop app (TECHNICAL.md) and the
license server (LICENSE-SERVER.md) into a single picture — the two
were never reconciled into an end-to-end view before.

Contents:
  §1. System diagram (ASCII) showing operator laptop, license
      server stack (nginx → FastAPI → Postgres), Postmark, Gumroad,
      and the buyer's machine — with the three primary flows
      (sale, manual mint, offline activation) traced through it.
  §2. Tech stack diagram, layered: desktop / server / operator /
      external SaaS, with version pins.
  §3. Trust + isolation boundaries table — what crosses each one
      and what the threat model is.
  §4. "Where things are stored" — paths, tables, files.
  §5. Pointers to the deeper per-component docs.

ASCII over Mermaid since the repo's Gitea version is unknown and
plain text renders in every viewer / IDE / raw `cat`.

LICENSE-SERVER.md status flipped from "design proposal, not built"
to "deployed (PR 1 + PR 2 code merged)" — that was stale since
the PR 1 deploy yesterday.

TECHNICAL.md and ADMIN.md gain one-line pointers to ARCHITECTURE.md
so people land at the unified view when looking for "how does it
all fit together".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 01:59:05 +00:00

242 lines
20 KiB
Markdown

# ARCHITECTURE — end-to-end view
Stitches the desktop app (`TECHNICAL.md`) and the license server
(`LICENSE-SERVER.md`) into a single picture. Read this first for "how
does it all fit together"; drill into the per-component docs for
detail.
---
## 1. System diagram
```
┌────────────────────────────────────────────────────────────────────────┐
│ OPERATOR / DEVELOPER LAPTOP │
│ │
│ git clone / push ←─── code lives in git.invixiom.com │
│ datatools-admin CLI ─── manual mints, list, revoke ─────┐ │
│ ssh -L 8090:127.0.0.1:8090 ───── tunnel for /internal/* ─────┤ │
└────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┘
│ internal Bearer-auth API (over SSH tunnel only)
┌────────────────────────────────────────────────────────────────────────┐
│ LICENSE SERVER — 46.225.166.142 │
│ ───────────────────────────────────────────────────────────────── │
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ nginx 1.24 (TLS termination, public reverse proxy) │ │
│ │ │ │
│ │ datatools.unalogix.com → static placeholder │ │
│ │ licenses.datatools.unalogix.com → 127.0.0.1:8090 (FastAPI) │ │
│ │ /internal/* on public surface → blocked (404) │ │
│ └────────────────────────────┬─────────────────────────────────────┘ │
│ │ │
│ ┌────────────────────────────▼─────────────────────────────────────┐ │
│ │ FastAPI app — datatools-api (Docker container, UID 10001) │ │
│ │ │ │
│ │ ┌──────────────────┐ ┌──────────────────┐ ┌───────────────┐ │ │
│ │ │ /webhooks/* │ │ /internal/* │ │ /health │ │ │
│ │ │ (storefronts) │ │ (Bearer-auth) │ │ (public) │ │ │
│ │ └────────┬─────────┘ └────────┬─────────┘ └───────────────┘ │ │
│ │ │ │ │ │
│ │ ▼ ▼ │ │
│ │ ┌────────────────────────────────────────┐ │ │
│ │ │ SourceAdapter (Protocol) — normalized │ │ │
│ │ │ • ManualAdapter • GumroadAdapter │ │ │
│ │ │ • (LemonSqueezy, Stripe — future) │ │ │
│ │ └────────────────┬───────────────────────┘ │ │
│ │ │ SaleEvent / RefundEvent │ │
│ │ ▼ │ │
│ │ ┌────────────────────────────────────────┐ │ │
│ │ │ mint_from_sale() │ │ │
│ │ │ • Ed25519 sign via PyCA cryptography │ │ │
│ │ │ • idempotent on (source, order_id) │ │ │
│ │ └────────────────┬───────────────────────┘ │ │
│ └────────────────────┼─────────────────────────────────────────────┘ │
│ │ SQL │
│ ┌────────────────────▼─────────────────────────────────────────────┐ │
│ │ Postgres 16 — datatools-postgres (container, vol pg_data) │ │
│ │ • licenses — authoritative customer record │ │
│ │ • gumroad_events — webhook audit log (idempotency, replay) │ │
│ └──────────────────────────────────────────────────────────────────┘ │
└───────────────────────┬────────────────────────────────┬───────────────┘
│ │
┌───────────┘ └──────────┐
│ POST /email (httpx) Gumroad Ping│
▼ POST │
┌───────────────────┐ ┌─────────────▼──┐
│ Postmark │ │ Gumroad │
│ (transactional │ │ (storefront, │
│ email) │ │ payments) │
└───────┬───────────┘ └────────────────┘
│ DKIM-signed email with license blob ▲
▼ │
┌────────────────────────────────────────────────────────────────┴───────┐
│ BUYER'S MACHINE │
│ │
│ Receives email ──► copies DTLIC2: blob ──► pastes into desktop app │
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ DataTools desktop (Python 3.12 + Streamlit + Typer CLIs) │ │
│ │ │ │
│ │ ┌────────────────────────────────────────────────────────────┐ │ │
│ │ │ Activate screen — verifies blob signature │ │ │
│ │ │ against EMBEDDED Ed25519 public key │ │ │
│ │ │ (NO network call to the license server, ever) │ │ │
│ │ └─────────────────────────┬──────────────────────────────────┘ │ │
│ │ ▼ │ │
│ │ ~/.datatools/license.json (signed blob, mode 644, on disk) │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
│ Pays via web browser ─────► Gumroad ────► (kicks off the Ping) │
└────────────────────────────────────────────────────────────────────────┘
```
**Three primary flows**, distinguishable by where the green arrows
start in the diagram:
1. **Sale → fulfillment** (the automated path)
Buyer pays at Gumroad → Gumroad fires Ping to
`licenses.datatools.unalogix.com/webhooks/gumroad?secret=…` → nginx
→ FastAPI → audit-log row → adapter normalizes payload → `mint_from_sale`
writes the `licenses` row + Ed25519-signs the blob → Postmark emails
the buyer their blob. End-to-end latency: a few hundred milliseconds.
2. **Manual mint** (operator path — comps, support replacements)
Operator opens SSH tunnel → `datatools-admin mint``/internal/mint`
(Bearer-authed, never publicly reachable) → same `mint_from_sale`
path → blob returned in HTTP response. Operator delivers to buyer
out-of-band.
3. **Activation** (buyer path — fully offline)
Buyer pastes blob into desktop's Activate screen → desktop verifies
the Ed25519 signature against the public key **embedded in the
shipped binary** → license written to `~/.datatools/license.json`.
The desktop app makes **no network calls** to the license server at
any point. This preserves the "your data never leaves your computer"
promise (`DECISIONS.md §9b`).
---
## 2. Tech stack
Layered view of what technology lives where. "External SaaS" entries
are services we depend on but don't operate.
```
┌────────────────────────────────────────────────────────────────────────┐
│ DESKTOP APP (shipped binary, runs on buyer's box) │
├──────────────────┬─────────────────────────────────────────────────────┤
│ GUI │ Streamlit 1.35 — local web server, browser opens │
│ CLI │ Typer 0.12 — per-tool entry points │
│ Core logic │ pandas 2.x, numpy, rapidfuzz, charset-normalizer │
│ Crypto (verify) │ PyCA cryptography — Ed25519 public-key verify only │
│ Storage │ ~/.datatools/license.json (file, mode 644) │
│ Internationalization │ i18n via JSON catalogs in src/i18n/ │
│ Build │ PyInstaller — one-file binary, per OS │
│ Runtimes │ Python 3.12 (bundled into installer) │
│ Platforms │ Windows · macOS · Linux │
└──────────────────┴─────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────┐
│ LICENSE SERVER (this box; non-buyer-facing) │
├──────────────────┬─────────────────────────────────────────────────────┤
│ Edge │ nginx 1.24 + Let's Encrypt (auto-renew via timer) │
│ HTTP framework │ FastAPI 0.119 + Starlette + Pydantic v2 │
│ ASGI server │ uvicorn 0.39 (+uvloop, +httptools, +watchfiles) │
│ Form parsing │ python-multipart (for Gumroad form-encoded Pings) │
│ ORM │ SQLAlchemy 2.0 │
│ Migrations │ Alembic 1.18 (one initial migration so far) │
│ Database │ Postgres 16-alpine (containerized, single node) │
│ Database driver │ psycopg 3.3 (with binary wheel) │
│ Crypto (sign) │ PyCA cryptography — Ed25519 private-key sign │
│ HTTP client │ httpx 0.28 (Postmark calls, test mocking) │
│ Config │ Pydantic Settings + YAML (products.yaml) │
│ Container │ Docker + Docker Compose v2 plugin │
│ Image base │ python:3.12-slim │
│ Process user │ UID 10001 (non-root `app` user defined in image) │
│ Logging │ stdlib `logging` to container stdout → docker logs │
│ Host OS │ Ubuntu 24.04 LTS │
└──────────────────┴─────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────┐
│ OPERATOR / DEVELOPER MACHINE │
├──────────────────┬─────────────────────────────────────────────────────┤
│ Source control │ git → self-hosted Gitea (git.invixiom.com) │
│ Admin CLI │ Typer (src/admin_cli.py) │
│ Server access │ SSH tunnel for /internal/* (no public exposure) │
│ Break-glass │ scripts/generate_license.py (offline-only mints, │
│ │ used when the license server is unreachable) │
│ Test runner │ pytest 8.3 + SQLite in-memory (no docker required) │
│ Smoke test │ bash + docker compose (server/scripts/smoke.sh) │
└──────────────────┴─────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────┐
│ EXTERNAL SaaS / dependencies │
├──────────────────┬─────────────────────────────────────────────────────┤
│ Storefront │ Gumroad — Ping webhook to /webhooks/gumroad │
│ Transactional │ Postmark — HTTP API for license-delivery emails │
│ email │ (LoggingEmailService fallback when token unset) │
│ TLS CA │ Let's Encrypt — ACME HTTP-01 challenge via certbot │
│ Authoritative │ supercp / cPanel (your DNS host for unalogix.com) │
│ DNS │ — Cloudflare front-door deferred │
│ Source hosting │ Self-hosted Gitea (git.invixiom.com) — not on the │
│ │ datatools box; shares the same physical host │
└──────────────────┴─────────────────────────────────────────────────────┘
```
---
## 3. Trust + isolation boundaries
Worth tracing explicitly because the threat model differs at each
boundary:
| Boundary | What crosses it | Trust model |
|---|---|---|
| Buyer ↔ Gumroad | Payment, buyer details | Out of scope — Gumroad's problem |
| Gumroad → license server (webhook) | Signed-by-shared-secret POST | URL secret check; non-matching = 404 (no info leak); audit-log everything regardless |
| License server → Postmark | DKIM-signed transactional mail | Postmark verified-sender domain; HTTP API auth via server token |
| License server → Postgres | SQL over local docker bridge | Same compose project; password from on-disk secret file |
| Operator → license server (`/internal/*`) | Bearer token over SSH tunnel | Token only on disk + in the operator's env; nginx blocks `/internal/*` publicly as defense-in-depth |
| License server → buyer (email) | Plaintext blob in inbox | Buyer's email account hygiene; we deliberately don't encrypt — blob is self-protecting (signature) |
| Buyer → desktop app (activation) | Signed blob pasted in | Verified against pubkey **embedded in the shipped binary**; no network call |
The single most important property to preserve: **the desktop app
never talks to the license server.** All trust in the desktop comes
from the embedded public key + the signed blob. This is what makes
the offline activation guarantee real, and what keeps a license-server
outage from breaking buyers who've already activated.
---
## 4. Where things are stored
| Lives on… | Path / location | Contents |
|---|---|---|
| Buyer's machine | `~/.datatools/license.json` | Activated license blob |
| Buyer's machine | Postmark email | Delivery copy of the blob |
| License server | `licenses` table (Postgres) | Authoritative customer record — name, email, tier, blob, source, order ID, promotion, amount paid |
| License server | `gumroad_events` table | Append-only webhook delivery audit log |
| License server | `/srv/datatools-license/secrets/` | Postgres password, admin Bearer token, (PR 2) Postmark token + Gumroad secret |
| License server | `/etc/letsencrypt/live/datatools.unalogix.com/` | TLS cert + key |
| Operator's laptop | `~/.datatools-creator/issued.jsonl` | Creator-side issuance log (pre-server era, kept as a break-glass backup) |
| Operator's laptop | Git clone of this repo | Source code, including `server/config/products.yaml` |
| Gitea | This repo's commits | Everything except secrets |
---
## 5. Related docs
| Doc | Scope |
|---|---|
| `TECHNICAL.md` | Desktop app internals (core libs, GUI, CLIs) |
| `LICENSE-SERVER.md` | Server architecture rationale + DB schema |
| `SETUP-LICENSE-SERVER.md` | Server install runbook (DNS, packages, nginx, TLS, Postgres) |
| `ADMIN.md` | Day-2 operations (minting, rotation, inspection) |
| `DECISIONS.md` | Architecture decision records — `§9b` = no online activation check |
| `USER-GUIDE.md` | Buyer-facing documentation |