sec(license): Ed25519 sigs + production-safe tripwire

Two coupled hardening upgrades.

1. Asymmetric signatures (HMAC → Ed25519)

The previous HMAC scheme used a symmetric secret that any motivated
reverse engineer could pull out of the shipped binary and use to
mint blobs for any tier / name / email. With Ed25519, the binary
ships only the public verification key; the signing key never
leaves the seller's environment, so binary compromise no longer
yields forgery.

- src/license/crypto.py rewritten around
  cryptography.hazmat.primitives.asymmetric.ed25519. Same public
  API surface (sign/verify/encode_blob/decode_blob), same canonical
  JSON encoding — drop-in for the manager / cli / GUI layers.
- DATATOOLS_LICENSE_PRIVKEY (seller-side) and
  DATATOOLS_LICENSE_PUBKEY (build-time) env vars supply the keys;
  the in-source dev keypair (src/license/_dev_keypair.py)
  deterministically derives from a seed phrase for repro builds and
  tests.
- Blob prefix bumped DTLIC1: → DTLIC2:. Decoding a DTLIC1 blob
  surfaces a clear "old format" error rather than a confusing
  signature mismatch.
- scripts/generate_keypair.py mints fresh production keypairs for
  the seller (run once, stash the private key offline). Adds
  cryptography>=41,<46 to requirements.txt (was an undeclared
  transitive dep).

2. Production-safe tripwire

assert_production_safe() refuses to boot a frozen / shipped build
when either:

- DATATOOLS_DEV_MODE=1 is set (would unconditionally bypass every
  license check — fine in source/test but catastrophic in a buyer
  install).
- The active verification key is still the embedded dev key (the
  build pipeline forgot to set DATATOOLS_LICENSE_PUBKEY).

No-op in source / pytest runs (sys.frozen is unset) so test
fixtures and dev workflows keep working without ceremony. Called
from src/cli_license_guard.guard() and from hide_streamlit_chrome
— so it fires on every CLI invocation and every GUI page load.

Tests: 49 license-layer unit tests (was 40); added Ed25519
wrong-key rejection, dev-keypair seed pin, blob v2 prefix, v1
rejection with clear message, and four production-safe scenarios
(no-op in source, fires on DEV_MODE in frozen, fires on dev key in
frozen, passes in frozen with prod pubkey). Total: 2024 → 2033.

Docs (REQUIREMENTS §17a, DEVELOPER licensing recipe, DECISIONS
§9b + decision log) updated with the new threat-model write-up,
key-storage workflow, and tripwire behaviour.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-13 17:34:48 +00:00
parent d32b58e61a
commit e534fb4989
12 changed files with 549 additions and 75 deletions

View File

@@ -53,9 +53,14 @@ def guard(feature: str | None = None) -> None:
InvalidLicenseError,
LicenseError,
UnsupportedFeatureError,
assert_production_safe,
get_manager,
)
# Refuse to run a misconfigured shipped build. No-op in
# development / pytest runs.
assert_production_safe()
mgr = get_manager()
if mgr.dev_mode:
return

View File

@@ -89,6 +89,12 @@ def hide_streamlit_chrome(*, gate_license: bool = True) -> None:
can render its own form without recursion.
"""
st.markdown(_HIDE_CHROME_CSS, unsafe_allow_html=True)
# Production-safe check runs first so a misconfigured shipped
# build refuses to render anything (rather than rendering a
# broken activation form that doesn't accept real blobs).
# No-op in source / pytest runs.
from src.license import assert_production_safe
assert_production_safe()
# Imported lazily so this module stays importable in environments
# where the i18n packs haven't been laid out (e.g. unit tests of
# individual legacy helpers).

View File

@@ -34,12 +34,21 @@ from .errors import (
UnsupportedFeatureError,
)
from .features import FEATURES_BY_TIER, all_features_for_tier
from .manager import LicenseManager, current_state, get_manager, require_feature
from .manager import (
LicenseManager,
ProductionBuildError,
assert_production_safe,
current_state,
get_manager,
require_feature,
)
from .schema import FeatureFlag, License, Tier
__all__ = [
# Manager
"LicenseManager",
"ProductionBuildError",
"assert_production_safe",
"current_state",
"get_manager",
"require_feature",

View File

@@ -0,0 +1,73 @@
"""**Development-only** Ed25519 keypair embedded in the source tree.
This pair lets developers run / test / sign locally without needing
the production private key. Both values are deterministic from a
seed string (``hashlib.sha256(SEED).digest()``) so any contributor
checking out the source gets the same keys — which is exactly what
makes this keypair unsafe for production.
============================================================
DO NOT SHIP THIS KEYPAIR.
============================================================
For shipped builds:
1. Run ``scripts/generate_keypair.py`` to produce a fresh production
keypair.
2. Stash the **private** key in your password manager / KMS.
3. In the PyInstaller build pipeline, set the env var
``DATATOOLS_LICENSE_PUBKEY=<production-pubkey-hex>`` so the
shipped binary verifies against the production key, not this dev
key.
4. The production-safe runtime check (``assert_production_safe``)
refuses to start a frozen build that's still verifying against
this dev key — that's the tripwire that catches a missing build
step.
The matching seed phrase below is in source on purpose; rotating
the dev key means changing it here AND regenerating every test
fixture that hard-codes a blob. The seed includes the words
"DEV-seed-NOT-FOR-PRODUCTION" specifically so a string-grep against
a shipped binary would flag a missing build override immediately.
"""
from __future__ import annotations
import hashlib
# The seed phrase. Hashed to 32 bytes → Ed25519 private-key seed.
DEV_SEED_PHRASE: bytes = (
b"datatools-license-v2-DEV-seed-NOT-FOR-PRODUCTION"
)
# Derived constants. Computed once at import for self-test
# (``test_dev_keypair_matches_seed`` in ``tests/test_license.py``)
# without doing crypto work on every import.
DEV_PRIVATE_KEY_HEX: str = (
"0bdc196f098b84ed155bacbd00061d4fff2cb68e10109f94332f1fc7de194cdb"
)
DEV_PUBLIC_KEY_HEX: str = (
"1cbef16b7826dd364ac0c7187d42c2ee00d76486e42389db05efa45dd1ade78a"
)
def _derive_from_seed() -> tuple[str, str]:
"""Re-derive the dev keypair from the seed phrase. Used by the
unit test that pins the constants above to the seed."""
from cryptography.hazmat.primitives.asymmetric.ed25519 import (
Ed25519PrivateKey,
)
from cryptography.hazmat.primitives import serialization
seed = hashlib.sha256(DEV_SEED_PHRASE).digest()
priv = Ed25519PrivateKey.from_private_bytes(seed)
priv_hex = priv.private_bytes(
encoding=serialization.Encoding.Raw,
format=serialization.PrivateFormat.Raw,
encryption_algorithm=serialization.NoEncryption(),
).hex()
pub_hex = priv.public_key().public_bytes(
encoding=serialization.Encoding.Raw,
format=serialization.PublicFormat.Raw,
).hex()
return priv_hex, pub_hex

View File

@@ -1,85 +1,150 @@
"""HMAC sign/verify for license blobs.
"""Ed25519 sign/verify for license blobs.
The signing secret is read from ``$DATATOOLS_LICENSE_SECRET`` if
present, otherwise from the build-time constant below. Replace the
constant at build time (via PyInstaller hook or a sed step in the
build pipeline) so the shipped binary has a different secret from
this repo's source tree.
Asymmetric model:
Threat model: honor-system DRM. A motivated reverse engineer can pull
the secret out of the binary, sign their own licenses, and bypass the
check. That's expected for $49 desktop software — the goal is to
discourage casual sharing, not stop targeted piracy. The 30-day
refund policy and the personal-name embedded in every license cover
the same gap from a different angle.
- **Private key** (32 bytes) lives with the seller only. It signs the
buyer's name/email/tier/etc into a license blob via
``scripts/generate_license.py``.
- **Public key** (32 bytes) is embedded in every shipped binary. The
binary uses it to verify blobs at activation time.
The split means a motivated reverse engineer who pulls everything out
of the binary still can't sign new licenses — they'd need the private
key, which never leaves the seller's environment. This is the key
upgrade vs. the v1 HMAC scheme: HMAC's symmetric secret was trivially
extractable, so anyone with the binary could mint blobs for any tier.
Keys come from (in priority order):
1. ``$DATATOOLS_LICENSE_PRIVKEY`` / ``$DATATOOLS_LICENSE_PUBKEY`` —
hex-encoded raw bytes. The build pipeline sets the pubkey here.
2. The dev-only constants in ``_dev_keypair.py`` — deterministic from
a seed, embedded in the source tree for local development and
testing. **Never** ship a binary that still uses these.
A frozen / shipped build verifying against the dev key is a build
configuration error — ``assert_production_safe`` (see
``.manager``) fires loudly on startup in that case.
Blob format: ``DTLIC2:`` + base64-encoded JSON. The version prefix
bumped from ``DTLIC1`` to ``DTLIC2`` when we switched from HMAC to
Ed25519, so old v1 blobs surface a clear "old format" error rather
than a confusing "signature mismatch".
"""
from __future__ import annotations
import base64
import hashlib
import hmac
import json
import os
from typing import Any
# Build-time default. Replace via env var in shipped builds; keep this
# constant non-empty so unit tests have a stable verification key.
_DEFAULT_SECRET = (
"datatools-license-v1-development-secret-"
"replace-at-build-time-via-DATATOOLS_LICENSE_SECRET"
from cryptography.exceptions import InvalidSignature
from cryptography.hazmat.primitives.asymmetric.ed25519 import (
Ed25519PrivateKey,
Ed25519PublicKey,
)
from ._dev_keypair import DEV_PRIVATE_KEY_HEX, DEV_PUBLIC_KEY_HEX
def _secret_bytes() -> bytes:
"""Return the active HMAC secret as bytes."""
return os.environ.get("DATATOOLS_LICENSE_SECRET", _DEFAULT_SECRET).encode("utf-8")
# ---------------------------------------------------------------------------
# Key material
# ---------------------------------------------------------------------------
def _privkey_hex() -> str:
"""Hex-encoded raw Ed25519 private-key bytes.
Read from ``$DATATOOLS_LICENSE_PRIVKEY`` first (where the seller
stashes their real key), falling back to the dev seed-derived
constant. The dev fallback only matters during testing /
development; a shipped build calling :func:`sign` is a bug (only
the seller's key-gen script does that).
"""
return os.environ.get("DATATOOLS_LICENSE_PRIVKEY") or DEV_PRIVATE_KEY_HEX
def _pubkey_hex() -> str:
"""Hex-encoded raw Ed25519 public-key bytes.
Read from ``$DATATOOLS_LICENSE_PUBKEY`` first (the build
pipeline sets this), falling back to the dev key.
"""
return os.environ.get("DATATOOLS_LICENSE_PUBKEY") or DEV_PUBLIC_KEY_HEX
def _privkey() -> Ed25519PrivateKey:
return Ed25519PrivateKey.from_private_bytes(bytes.fromhex(_privkey_hex()))
def _pubkey() -> Ed25519PublicKey:
return Ed25519PublicKey.from_public_bytes(bytes.fromhex(_pubkey_hex()))
def is_using_dev_key() -> bool:
"""True when the active **public** key matches the embedded dev key.
Used by :func:`.manager.assert_production_safe` to catch frozen
builds whose pubkey wasn't overridden at build time.
"""
return _pubkey_hex() == DEV_PUBLIC_KEY_HEX
# ---------------------------------------------------------------------------
# Canonical encoding (shared with v1 — same bytes, same hash, same sig)
# ---------------------------------------------------------------------------
def _canonical_bytes(payload: dict[str, Any]) -> bytes:
"""Canonical JSON encoding for the HMAC input.
"""Canonical JSON encoding for the signature input.
``sort_keys=True`` + ``separators=(",", ":")`` produce a byte-for-
byte deterministic representation across Python versions and OS
locales. Without that, two structurally-identical dicts could hash
to different signatures.
locales. Without that, two structurally-identical dicts could
produce different signatures.
"""
return json.dumps(payload, sort_keys=True, separators=(",", ":")).encode("utf-8")
# ---------------------------------------------------------------------------
# Sign / verify
# ---------------------------------------------------------------------------
def sign(payload: dict[str, Any]) -> str:
"""Compute the HMAC-SHA256 hex digest over *payload*.
"""Produce an Ed25519 signature over *payload*, hex-encoded.
*payload* MUST NOT contain a ``signature`` key — that's the field
we're computing. The caller is responsible for stripping it.
Caller must strip any existing ``signature`` field — the function
signs whatever it's given, including a stale signature, which
would never verify because verify recomputes from a fresh
no-``signature`` canonical form.
"""
digest = hmac.new(_secret_bytes(), _canonical_bytes(payload), hashlib.sha256)
return digest.hexdigest()
sig_bytes = _privkey().sign(_canonical_bytes(payload))
return sig_bytes.hex()
def verify(payload: dict[str, Any], signature: str) -> bool:
"""Constant-time compare between the recomputed HMAC and *signature*.
Returns ``True`` on a match. Uses :func:`hmac.compare_digest` so a
timing oracle can't be used to recover the secret one byte at a
time — overkill for honor-system DRM, but free.
"""
expected = sign(payload)
return hmac.compare_digest(expected.encode("ascii"), signature.encode("ascii"))
def verify(payload: dict[str, Any], signature_hex: str) -> bool:
"""Verify *signature_hex* against *payload*. Returns True/False;
never raises (a missing or malformed signature is just False)."""
try:
sig_bytes = bytes.fromhex(signature_hex)
except ValueError:
return False
try:
_pubkey().verify(sig_bytes, _canonical_bytes(payload))
return True
except InvalidSignature:
return False
# ---------------------------------------------------------------------------
# Blob encoding / decoding
# ---------------------------------------------------------------------------
# A "license blob" is the artifact the buyer pastes into the activation
# form. It's a base64-encoded JSON dict containing every license field
# *plus* the signature. We choose base64 over raw JSON so the blob is
# one paste-able token (no whitespace surprises) and so a typo
# truncates the blob into an obviously-invalid form rather than a
# subtly-mutated payload.
_BLOB_PREFIX = "DTLIC1:"
# Buyers paste this whole token into the activation page. The prefix
# is the version marker:
# DTLIC1 — old HMAC scheme (no longer accepted)
# DTLIC2 — Ed25519 (current)
_BLOB_PREFIX = "DTLIC2:"
_OLD_PREFIX = "DTLIC1:"
def encode_blob(payload_with_signature: dict[str, Any]) -> str:
@@ -92,10 +157,15 @@ def encode_blob(payload_with_signature: dict[str, Any]) -> str:
def decode_blob(blob: str) -> dict[str, Any]:
"""Reverse of :func:`encode_blob`. Raises ``ValueError`` on a
blob that doesn't carry the expected prefix or doesn't decode
cleanly — both surface as :class:`InvalidLicenseError` at the
manager layer."""
blob that doesn't carry the expected prefix, doesn't decode
cleanly, or carries the v1 prefix (which we no longer accept)."""
s = blob.strip()
if s.startswith(_OLD_PREFIX):
raise ValueError(
f"License blob is the old {_OLD_PREFIX!r} format. v1 blobs "
"used a symmetric secret that has since been retired — "
"request a new blob from support."
)
if not s.startswith(_BLOB_PREFIX):
raise ValueError(
f"License blob missing {_BLOB_PREFIX!r} prefix. "

View File

@@ -6,6 +6,7 @@ constructor for full isolation.
Lifecycle::
assert_production_safe() # guard against build-config errors
mgr = get_manager()
if not mgr.is_activated():
mgr.activate_from_blob(blob, name, email)
@@ -17,6 +18,7 @@ from __future__ import annotations
import os
import re
import sys
import uuid
from dataclasses import dataclass
from datetime import datetime, timezone
@@ -468,3 +470,69 @@ def current_state() -> LicenseState:
def require_feature(feature: str | FeatureFlag) -> License:
return get_manager().require_feature(feature)
# ---------------------------------------------------------------------------
# Production-build sanity check
# ---------------------------------------------------------------------------
class ProductionBuildError(RuntimeError):
"""Raised when a frozen / shipped build is misconfigured in a way
that would defeat licensing. Always loud, always fatal — the
binary must not boot in this state."""
def _is_shipped_build() -> bool:
"""True when running from a PyInstaller bundle (``sys.frozen``).
Set automatically by PyInstaller; not set in source / pytest
runs. The whole purpose of the prod-safe check is to enforce
invariants that only matter in a shipped build, so the rest of
the codebase can stay flexible.
"""
return getattr(sys, "frozen", False)
def assert_production_safe() -> None:
"""Fail loudly if a shipped build is misconfigured.
Two tripwires:
1. ``DATATOOLS_DEV_MODE`` is set in a frozen build. The dev-mode
env var unconditionally bypasses license verification — if a
buyer's installer somehow ships it enabled (build pipeline
bug, mis-set environment), every license check is a no-op.
Refuse to start instead.
2. The active verification key is still the dev key. The build
pipeline is supposed to override
``DATATOOLS_LICENSE_PUBKEY`` with the production key; if it
didn't, the binary will reject every legitimate license
(signed with the prod private key) AND would *accept*
anything signed with the dev key (which is checked into the
source tree). Refuse to start.
No-ops in non-frozen runs (development, tests) so the dev key
+ dev mode keep working in those contexts. Production builds
call this from :func:`src.cli_license_guard.guard` and
:func:`src.gui.components.hide_streamlit_chrome`.
"""
if not _is_shipped_build():
return
if _truthy_env("DATATOOLS_DEV_MODE"):
raise ProductionBuildError(
"DATATOOLS_DEV_MODE is set in a shipped build. This env "
"var disables every license check and must never be set "
"on a buyer machine. If you see this message in a release "
"build, the install was misconfigured — contact support."
)
if crypto.is_using_dev_key():
raise ProductionBuildError(
"Shipped build is verifying against the development "
"license key. The build pipeline must set "
"DATATOOLS_LICENSE_PUBKEY to the production public key "
"before packaging. This binary will reject every real "
"license blob — re-download from the official channel."
)