Files
datatools-dev/docs/ARCHITECTURE.md
Michael 624f99653e docs(arch): end-to-end system + tech-stack diagrams
New ARCHITECTURE.md pulls the desktop app (TECHNICAL.md) and the
license server (LICENSE-SERVER.md) into a single picture — the two
were never reconciled into an end-to-end view before.

Contents:
  §1. System diagram (ASCII) showing operator laptop, license
      server stack (nginx → FastAPI → Postgres), Postmark, Gumroad,
      and the buyer's machine — with the three primary flows
      (sale, manual mint, offline activation) traced through it.
  §2. Tech stack diagram, layered: desktop / server / operator /
      external SaaS, with version pins.
  §3. Trust + isolation boundaries table — what crosses each one
      and what the threat model is.
  §4. "Where things are stored" — paths, tables, files.
  §5. Pointers to the deeper per-component docs.

ASCII over Mermaid since the repo's Gitea version is unknown and
plain text renders in every viewer / IDE / raw `cat`.

LICENSE-SERVER.md status flipped from "design proposal, not built"
to "deployed (PR 1 + PR 2 code merged)" — that was stale since
the PR 1 deploy yesterday.

TECHNICAL.md and ADMIN.md gain one-line pointers to ARCHITECTURE.md
so people land at the unified view when looking for "how does it
all fit together".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 01:59:05 +00:00

20 KiB

ARCHITECTURE — end-to-end view

Stitches the desktop app (TECHNICAL.md) and the license server (LICENSE-SERVER.md) into a single picture. Read this first for "how does it all fit together"; drill into the per-component docs for detail.


1. System diagram

┌────────────────────────────────────────────────────────────────────────┐
│                  OPERATOR / DEVELOPER LAPTOP                           │
│                                                                        │
│   git clone / push          ←─── code lives in git.invixiom.com        │
│   datatools-admin CLI       ─── manual mints, list, revoke ─────┐      │
│   ssh -L 8090:127.0.0.1:8090  ───── tunnel for /internal/* ─────┤      │
└────────────────────────────────────────────────────────────────────────┘
                                                                  │
        ┌─────────────────────────────────────────────────────────┘
        │
        │   internal Bearer-auth API (over SSH tunnel only)
        ▼
┌────────────────────────────────────────────────────────────────────────┐
│  LICENSE SERVER — 46.225.166.142                                       │
│  ─────────────────────────────────────────────────────────────────     │
│                                                                        │
│  ┌──────────────────────────────────────────────────────────────────┐ │
│  │  nginx 1.24  (TLS termination, public reverse proxy)             │ │
│  │                                                                  │ │
│  │  datatools.unalogix.com           → static placeholder           │ │
│  │  licenses.datatools.unalogix.com  → 127.0.0.1:8090 (FastAPI)     │ │
│  │  /internal/* on public surface    → blocked (404)                │ │
│  └────────────────────────────┬─────────────────────────────────────┘ │
│                               │                                        │
│  ┌────────────────────────────▼─────────────────────────────────────┐ │
│  │  FastAPI app — datatools-api (Docker container, UID 10001)       │ │
│  │                                                                  │ │
│  │   ┌──────────────────┐  ┌──────────────────┐  ┌───────────────┐ │ │
│  │   │  /webhooks/*     │  │  /internal/*     │  │  /health      │ │ │
│  │   │  (storefronts)   │  │  (Bearer-auth)   │  │  (public)     │ │ │
│  │   └────────┬─────────┘  └────────┬─────────┘  └───────────────┘ │ │
│  │            │                     │                              │ │
│  │            ▼                     ▼                              │ │
│  │   ┌────────────────────────────────────────┐                    │ │
│  │   │  SourceAdapter (Protocol) — normalized │                    │ │
│  │   │   • ManualAdapter   • GumroadAdapter   │                    │ │
│  │   │   • (LemonSqueezy, Stripe — future)    │                    │ │
│  │   └────────────────┬───────────────────────┘                    │ │
│  │                    │ SaleEvent / RefundEvent                    │ │
│  │                    ▼                                            │ │
│  │   ┌────────────────────────────────────────┐                    │ │
│  │   │  mint_from_sale()                      │                    │ │
│  │   │   • Ed25519 sign via PyCA cryptography │                    │ │
│  │   │   • idempotent on (source, order_id)   │                    │ │
│  │   └────────────────┬───────────────────────┘                    │ │
│  └────────────────────┼─────────────────────────────────────────────┘ │
│                       │ SQL                                            │
│  ┌────────────────────▼─────────────────────────────────────────────┐ │
│  │  Postgres 16 — datatools-postgres (container, vol pg_data)       │ │
│  │   • licenses        — authoritative customer record              │ │
│  │   • gumroad_events  — webhook audit log (idempotency, replay)    │ │
│  └──────────────────────────────────────────────────────────────────┘ │
└───────────────────────┬────────────────────────────────┬───────────────┘
                        │                                │
            ┌───────────┘                                └──────────┐
            │ POST /email (httpx)                       Gumroad Ping│
            ▼                                            POST       │
   ┌───────────────────┐                              ┌─────────────▼──┐
   │  Postmark         │                              │   Gumroad      │
   │  (transactional   │                              │   (storefront, │
   │   email)          │                              │    payments)   │
   └───────┬───────────┘                              └────────────────┘
           │ DKIM-signed email with license blob                 ▲
           ▼                                                     │
┌────────────────────────────────────────────────────────────────┴───────┐
│                       BUYER'S MACHINE                                  │
│                                                                        │
│   Receives email ──► copies DTLIC2: blob ──► pastes into desktop app   │
│                                                                        │
│   ┌──────────────────────────────────────────────────────────────────┐ │
│   │  DataTools desktop (Python 3.12 + Streamlit + Typer CLIs)        │ │
│   │                                                                  │ │
│   │   ┌────────────────────────────────────────────────────────────┐ │ │
│   │   │  Activate screen — verifies blob signature                 │ │ │
│   │   │  against EMBEDDED Ed25519 public key                       │ │ │
│   │   │  (NO network call to the license server, ever)             │ │ │
│   │   └─────────────────────────┬──────────────────────────────────┘ │ │
│   │                             ▼                                   │ │
│   │  ~/.datatools/license.json    (signed blob, mode 644, on disk)  │ │
│   └──────────────────────────────────────────────────────────────────┘ │
│                                                                        │
│   Pays via web browser ─────► Gumroad ────► (kicks off the Ping)       │
└────────────────────────────────────────────────────────────────────────┘

Three primary flows, distinguishable by where the green arrows start in the diagram:

  1. Sale → fulfillment (the automated path) Buyer pays at Gumroad → Gumroad fires Ping to licenses.datatools.unalogix.com/webhooks/gumroad?secret=… → nginx → FastAPI → audit-log row → adapter normalizes payload → mint_from_sale writes the licenses row + Ed25519-signs the blob → Postmark emails the buyer their blob. End-to-end latency: a few hundred milliseconds.

  2. Manual mint (operator path — comps, support replacements) Operator opens SSH tunnel → datatools-admin mint/internal/mint (Bearer-authed, never publicly reachable) → same mint_from_sale path → blob returned in HTTP response. Operator delivers to buyer out-of-band.

  3. Activation (buyer path — fully offline) Buyer pastes blob into desktop's Activate screen → desktop verifies the Ed25519 signature against the public key embedded in the shipped binary → license written to ~/.datatools/license.json. The desktop app makes no network calls to the license server at any point. This preserves the "your data never leaves your computer" promise (DECISIONS.md §9b).


2. Tech stack

Layered view of what technology lives where. "External SaaS" entries are services we depend on but don't operate.

┌────────────────────────────────────────────────────────────────────────┐
│                  DESKTOP APP   (shipped binary, runs on buyer's box)   │
├──────────────────┬─────────────────────────────────────────────────────┤
│  GUI             │  Streamlit 1.35 — local web server, browser opens   │
│  CLI             │  Typer 0.12 — per-tool entry points                 │
│  Core logic      │  pandas 2.x, numpy, rapidfuzz, charset-normalizer   │
│  Crypto (verify) │  PyCA cryptography — Ed25519 public-key verify only │
│  Storage         │  ~/.datatools/license.json (file, mode 644)         │
│  Internationalization │ i18n via JSON catalogs in src/i18n/            │
│  Build           │  PyInstaller — one-file binary, per OS              │
│  Runtimes        │  Python 3.12 (bundled into installer)               │
│  Platforms       │  Windows · macOS · Linux                            │
└──────────────────┴─────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────┐
│                  LICENSE SERVER   (this box; non-buyer-facing)         │
├──────────────────┬─────────────────────────────────────────────────────┤
│  Edge            │  nginx 1.24 + Let's Encrypt (auto-renew via timer)  │
│  HTTP framework  │  FastAPI 0.119 + Starlette + Pydantic v2            │
│  ASGI server     │  uvicorn 0.39 (+uvloop, +httptools, +watchfiles)    │
│  Form parsing    │  python-multipart (for Gumroad form-encoded Pings)  │
│  ORM             │  SQLAlchemy 2.0                                     │
│  Migrations      │  Alembic 1.18 (one initial migration so far)        │
│  Database        │  Postgres 16-alpine (containerized, single node)    │
│  Database driver │  psycopg 3.3 (with binary wheel)                    │
│  Crypto (sign)   │  PyCA cryptography — Ed25519 private-key sign       │
│  HTTP client     │  httpx 0.28 (Postmark calls, test mocking)          │
│  Config          │  Pydantic Settings + YAML (products.yaml)           │
│  Container       │  Docker + Docker Compose v2 plugin                  │
│  Image base      │  python:3.12-slim                                   │
│  Process user    │  UID 10001 (non-root `app` user defined in image)   │
│  Logging         │  stdlib `logging` to container stdout → docker logs │
│  Host OS         │  Ubuntu 24.04 LTS                                   │
└──────────────────┴─────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────┐
│                  OPERATOR / DEVELOPER MACHINE                          │
├──────────────────┬─────────────────────────────────────────────────────┤
│  Source control  │  git → self-hosted Gitea (git.invixiom.com)         │
│  Admin CLI       │  Typer (src/admin_cli.py)                           │
│  Server access   │  SSH tunnel for /internal/* (no public exposure)    │
│  Break-glass     │  scripts/generate_license.py (offline-only mints,   │
│                  │  used when the license server is unreachable)       │
│  Test runner     │  pytest 8.3 + SQLite in-memory (no docker required) │
│  Smoke test      │  bash + docker compose (server/scripts/smoke.sh)    │
└──────────────────┴─────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────┐
│                  EXTERNAL SaaS / dependencies                          │
├──────────────────┬─────────────────────────────────────────────────────┤
│  Storefront      │  Gumroad — Ping webhook to /webhooks/gumroad        │
│  Transactional   │  Postmark — HTTP API for license-delivery emails    │
│   email          │   (LoggingEmailService fallback when token unset)   │
│  TLS CA          │  Let's Encrypt — ACME HTTP-01 challenge via certbot │
│  Authoritative   │  supercp / cPanel (your DNS host for unalogix.com)  │
│   DNS            │   — Cloudflare front-door deferred                  │
│  Source hosting  │  Self-hosted Gitea (git.invixiom.com) — not on the  │
│                  │   datatools box; shares the same physical host      │
└──────────────────┴─────────────────────────────────────────────────────┘

3. Trust + isolation boundaries

Worth tracing explicitly because the threat model differs at each boundary:

Boundary What crosses it Trust model
Buyer ↔ Gumroad Payment, buyer details Out of scope — Gumroad's problem
Gumroad → license server (webhook) Signed-by-shared-secret POST URL secret check; non-matching = 404 (no info leak); audit-log everything regardless
License server → Postmark DKIM-signed transactional mail Postmark verified-sender domain; HTTP API auth via server token
License server → Postgres SQL over local docker bridge Same compose project; password from on-disk secret file
Operator → license server (/internal/*) Bearer token over SSH tunnel Token only on disk + in the operator's env; nginx blocks /internal/* publicly as defense-in-depth
License server → buyer (email) Plaintext blob in inbox Buyer's email account hygiene; we deliberately don't encrypt — blob is self-protecting (signature)
Buyer → desktop app (activation) Signed blob pasted in Verified against pubkey embedded in the shipped binary; no network call

The single most important property to preserve: the desktop app never talks to the license server. All trust in the desktop comes from the embedded public key + the signed blob. This is what makes the offline activation guarantee real, and what keeps a license-server outage from breaking buyers who've already activated.


4. Where things are stored

Lives on… Path / location Contents
Buyer's machine ~/.datatools/license.json Activated license blob
Buyer's machine Postmark email Delivery copy of the blob
License server licenses table (Postgres) Authoritative customer record — name, email, tier, blob, source, order ID, promotion, amount paid
License server gumroad_events table Append-only webhook delivery audit log
License server /srv/datatools-license/secrets/ Postgres password, admin Bearer token, (PR 2) Postmark token + Gumroad secret
License server /etc/letsencrypt/live/datatools.unalogix.com/ TLS cert + key
Operator's laptop ~/.datatools-creator/issued.jsonl Creator-side issuance log (pre-server era, kept as a break-glass backup)
Operator's laptop Git clone of this repo Source code, including server/config/products.yaml
Gitea This repo's commits Everything except secrets

Doc Scope
TECHNICAL.md Desktop app internals (core libs, GUI, CLIs)
LICENSE-SERVER.md Server architecture rationale + DB schema
SETUP-LICENSE-SERVER.md Server install runbook (DNS, packages, nginx, TLS, Postgres)
ADMIN.md Day-2 operations (minting, rotation, inspection)
DECISIONS.md Architecture decision records — §9b = no online activation check
USER-GUIDE.md Buyer-facing documentation