build: drop the local Python release method, return to CI-only installer builds
Removes the single-command Python packaging method (build/make_release.py + build/build_portable_zip.py + build/macos/build_zip.sh) and the portable .zip artifacts it produced. Release builds go back to the original GitHub Actions process: the CI matrix builds one installer per platform (.dmg / .exe / .AppImage) on tag push and attaches them to a GitHub Release. Tesseract OCR bundling is preserved: the fetch helpers the workflow depends on (fetch_tessdata, fetch_tesseract_for_platform) are extracted into a standalone build/tesseract.py, which build.yml now imports. Docs (README, build/README, DEVELOPER, TECHNICAL, USER-GUIDE, vendor README, es translations) updated to drop the portable-zip flavor and point at the new module. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
100
build/README.md
100
build/README.md
@@ -23,14 +23,12 @@ build/
|
||||
├── generate_icons.py Builds icon.ico / icon.icns / icon.png from
|
||||
│ src/gui/assets/datatools_icon_256.png. Run
|
||||
│ once before pyinstaller (CI does this).
|
||||
├── build_portable_zip.py Cross-platform: zips dist/DataTools/ into a
|
||||
│ no-install portable download. Used by the
|
||||
│ Windows + Linux portable artifacts.
|
||||
├── tesseract.py Fetches the per-platform Tesseract binary +
|
||||
│ eng.traineddata at build time. CI imports
|
||||
│ fetch_tessdata + fetch_tesseract_for_platform.
|
||||
├── macos/
|
||||
│ ├── build_dmg.sh Wraps dist/DataTools.app into a .dmg with a
|
||||
│ │ drag-to-/Applications layout (installer).
|
||||
│ └── build_zip.sh Wraps dist/DataTools.app into a portable
|
||||
│ .zip via ditto (preserves bundle metadata).
|
||||
│ └── build_dmg.sh Wraps dist/DataTools.app into a .dmg with a
|
||||
│ drag-to-/Applications layout (installer).
|
||||
├── appimage/
|
||||
│ ├── AppRun Entry point invoked when the AppImage runs.
|
||||
│ ├── datatools.desktop Linux desktop-entry metadata.
|
||||
@@ -43,17 +41,15 @@ build/
|
||||
|
||||
## Distribution outputs per platform
|
||||
|
||||
Each CI run produces two downloads per platform — an installer for
|
||||
buyers who want shortcuts wired automatically, and a portable .zip
|
||||
for buyers (or IT-locked-down machines) that can't run installers:
|
||||
Each CI run produces one installer per platform:
|
||||
|
||||
| Platform | Installer | Portable |
|
||||
|----------|----------------------------------------|------------------------------------------------|
|
||||
| macOS | `DataTools-<ver>-mac.dmg` | `DataTools-<ver>-mac-portable.zip` (ditto .app)|
|
||||
| Windows | `DataTools-<ver>-win-setup.exe` | `DataTools-<ver>-win-portable.zip` |
|
||||
| Linux | `DataTools-<ver>-linux-x86_64.AppImage`| (the AppImage IS the portable) |
|
||||
| Platform | Installer |
|
||||
|----------|----------------------------------------|
|
||||
| macOS | `DataTools-<ver>-mac.dmg` |
|
||||
| Windows | `DataTools-<ver>-win-setup.exe` |
|
||||
| Linux | `DataTools-<ver>-linux-x86_64.AppImage` (already portable) |
|
||||
|
||||
All six outputs are self-contained: every dependency (Python, pandas,
|
||||
All three outputs are self-contained: every dependency (Python, pandas,
|
||||
streamlit, pdfplumber, **Tesseract OCR + `eng.traineddata`**, the lot)
|
||||
is frozen into the bundle. The buyer does not need to install Python,
|
||||
pip, Tesseract, or anything else first. With Tesseract bundled, each
|
||||
@@ -76,47 +72,44 @@ the resulting installers to a GitHub Release. Manual
|
||||
|
||||
## Releasing
|
||||
|
||||
### Single-command local build (recommended for one-developer workflow)
|
||||
### CI build (push tag → GitHub Release) — the release process
|
||||
|
||||
PyInstaller can't cross-compile, so a single machine produces one
|
||||
platform's packages. Run this on each target OS:
|
||||
|
||||
```bash
|
||||
# One-time setup per machine:
|
||||
pip install -r requirements.txt
|
||||
pip install pyinstaller pillow
|
||||
# Windows only: install Inno Setup from https://jrsoftware.org/isdl.php
|
||||
# Linux only: drop appimagetool onto PATH (see preflight output)
|
||||
|
||||
# Build everything for the current OS:
|
||||
python build/make_release.py
|
||||
```
|
||||
|
||||
Outputs land in `dist/`:
|
||||
- Windows host → `DataTools-<ver>-win-setup.exe` + `DataTools-<ver>-win-portable.zip`
|
||||
- macOS host → `DataTools-<ver>-mac.dmg` + `DataTools-<ver>-mac-portable.zip`
|
||||
- Linux host → `DataTools-<ver>-linux-x86_64.AppImage`
|
||||
|
||||
Useful flags:
|
||||
|
||||
```bash
|
||||
python build/make_release.py --preflight # check tooling, build nothing
|
||||
python build/make_release.py --clean # wipe dist/ first
|
||||
python build/make_release.py --skip-installer # just the portable zip
|
||||
python build/make_release.py --skip-portable # just the installer
|
||||
```
|
||||
|
||||
### CI build (push tag → GitHub Release)
|
||||
|
||||
If you have CI runners for all three OSes:
|
||||
Releases are built by GitHub Actions (`.github/workflows/build.yml`),
|
||||
not on a developer's machine. The matrix runs on
|
||||
macos-latest / windows-latest / ubuntu-latest, stages Tesseract
|
||||
(`build/tesseract.py`), runs PyInstaller, packages the per-platform
|
||||
installer, and attaches it to a GitHub Release on tag push:
|
||||
|
||||
1. Bump `__version__` in `src/__init__.py`.
|
||||
2. `git commit -am "release: vX.Y.Z" && git tag vX.Y.Z`.
|
||||
3. `git push && git push --tags`.
|
||||
4. CI builds all three platforms and creates a Release with the
|
||||
installers + portable zips attached.
|
||||
installers attached.
|
||||
5. Mirror the Release assets to Gumroad (manual until v2).
|
||||
|
||||
A manual `workflow_dispatch` run does the same build but uploads the
|
||||
installers as workflow artifacts instead of creating a Release —
|
||||
useful for smoke-testing a build without cutting a tag.
|
||||
|
||||
### Local build (single platform, for testing)
|
||||
|
||||
PyInstaller can't cross-compile, so a local build produces only the
|
||||
current OS's installer. This mirrors what CI does, by hand — use it to
|
||||
debug the bundle before tagging. See the per-platform recipes below for
|
||||
the exact commands; the short version is:
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
pip install pyinstaller pillow
|
||||
python build/generate_icons.py
|
||||
python -c "import sys; sys.path.insert(0,'build'); \
|
||||
from tesseract import fetch_tessdata, fetch_tesseract_for_platform; \
|
||||
fetch_tessdata(); fetch_tesseract_for_platform('mac')" # win / mac / linux
|
||||
pyinstaller build/datatools.spec --clean --noconfirm
|
||||
# then run the matching packager: build/macos/build_dmg.sh,
|
||||
# build/installer.iss (iscc), or build/appimage/build.sh
|
||||
```
|
||||
|
||||
## Signing (Phase 2 — needs accounts/credentials)
|
||||
|
||||
Both code-signing steps are intentionally not in CI yet because they
|
||||
@@ -321,17 +314,18 @@ The runtime resolver (in `src/`, owned by the runtime team) walks:
|
||||
(sourced from [tessdata_best](https://github.com/tesseract-ocr/tessdata_best)).
|
||||
`datatools.spec` copies it into `tesseract/tessdata/`.
|
||||
- **Binary** — fetched per-platform at build time by
|
||||
`build/make_release.py` from pinned upstream URLs. Current pin:
|
||||
**Tesseract 5.5.0**.
|
||||
`build/tesseract.py` from pinned upstream URLs. Current pin:
|
||||
**Tesseract 5.5.0**. CI imports `fetch_tessdata` +
|
||||
`fetch_tesseract_for_platform` from this module before PyInstaller.
|
||||
|
||||
**Updating Tesseract**:
|
||||
|
||||
1. Bump the version pin and the per-platform fetch URLs in
|
||||
`build/make_release.py`.
|
||||
`build/tesseract.py`.
|
||||
2. If the model schema changed upstream, refresh
|
||||
`build/vendor/tessdata/eng.traineddata` from `tessdata_best` at the
|
||||
matching tag.
|
||||
3. Rebuild on each platform (`python build/make_release.py`) and
|
||||
3. Push a `v*` tag so CI rebuilds all three platforms, then
|
||||
smoke-test a scanned PDF through the PDF Extractor.
|
||||
4. Update `LICENSE_TESSERACT.txt` at the repo root if upstream license
|
||||
terms change (Apache-2.0 today).
|
||||
|
||||
@@ -1,69 +0,0 @@
|
||||
"""Wrap the PyInstaller folder build into a portable .zip.
|
||||
|
||||
Self-contained download: unzip → double-click the launcher → app runs.
|
||||
No installer, no Python install, no admin rights required.
|
||||
|
||||
Usage:
|
||||
python build/build_portable_zip.py <platform> <version>
|
||||
|
||||
Where ``platform`` is one of ``win`` / ``mac`` / ``linux``. The
|
||||
script just produces a generic ``dist/DataTools/`` zip; on macOS the
|
||||
preferred portable format is the ``ditto``-wrapped .app — see
|
||||
``build/macos/build_zip.sh`` for that flow. This helper exists mainly
|
||||
for Windows + Linux, where there's no .app bundle to wrap.
|
||||
|
||||
Output:
|
||||
dist/DataTools-<version>-<platform>-portable.zip
|
||||
|
||||
The zip root is the ``DataTools/`` folder so an unzip produces a
|
||||
self-contained dir the user can drop anywhere (Desktop, USB stick,
|
||||
network share). On Windows, the launcher is ``DataTools.exe`` inside
|
||||
that folder; on Linux, ``DataTools``.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import shutil
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
REPO = Path(__file__).resolve().parent.parent
|
||||
DIST_DIR = REPO / "dist"
|
||||
BUNDLE_DIR = DIST_DIR / "DataTools"
|
||||
|
||||
|
||||
def main() -> int:
|
||||
if len(sys.argv) < 3:
|
||||
sys.stderr.write(
|
||||
"usage: python build/build_portable_zip.py <platform> <version>\n"
|
||||
)
|
||||
return 2
|
||||
platform = sys.argv[1]
|
||||
version = sys.argv[2]
|
||||
|
||||
if not BUNDLE_DIR.is_dir():
|
||||
sys.stderr.write(
|
||||
f"Bundle dir not found at {BUNDLE_DIR}.\n"
|
||||
"Run ``pyinstaller build/datatools.spec --clean --noconfirm`` first.\n"
|
||||
)
|
||||
return 1
|
||||
|
||||
out_stem = DIST_DIR / f"DataTools-{version}-{platform}-portable"
|
||||
# ``make_archive`` takes a base name (no extension) and produces
|
||||
# ``<base>.zip``. ``root_dir`` = parent of what we want compressed,
|
||||
# ``base_dir`` = the folder name inside the archive root. This
|
||||
# combo yields a single top-level ``DataTools/`` directory inside
|
||||
# the .zip rather than dumping its contents loose.
|
||||
archive = shutil.make_archive(
|
||||
base_name=str(out_stem),
|
||||
format="zip",
|
||||
root_dir=str(DIST_DIR),
|
||||
base_dir="DataTools",
|
||||
)
|
||||
size_mb = Path(archive).stat().st_size / (1024 * 1024)
|
||||
print(f"wrote {archive} ({size_mb:.1f} MB)")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
@@ -105,7 +105,7 @@ datas += [
|
||||
]
|
||||
|
||||
# ----- Tesseract OCR bundle ----------------------------------------
|
||||
# ``build/make_release.py`` stages the per-platform Tesseract binary
|
||||
# ``build/tesseract.py`` stages the per-platform Tesseract binary
|
||||
# + its runtime libs (DLLs/dylibs/sos) into
|
||||
# ``build/_tesseract/<target>/`` and the shared eng.traineddata into
|
||||
# ``build/vendor/tessdata/``. We add both to ``datas`` so PyInstaller
|
||||
@@ -119,16 +119,16 @@ datas += [
|
||||
# from ``Path(sys._MEIPASS) / "tesseract" / ...``. Keep the two ends
|
||||
# in sync — if you rename "tesseract" here, update pdf_extract.py too.
|
||||
#
|
||||
# The orchestrator (make_release.py) sets DATATOOLS_TESS_STAGING to
|
||||
# the right per-platform dir before invoking PyInstaller. For ad-hoc
|
||||
# `pyinstaller build/datatools.spec` runs without the orchestrator,
|
||||
# fall back to the canonical staging path.
|
||||
# CI (.github/workflows/build.yml) sets DATATOOLS_TESS_STAGING to the
|
||||
# right per-platform dir before invoking PyInstaller. For ad-hoc
|
||||
# `pyinstaller build/datatools.spec` runs without that env var, fall
|
||||
# back to the canonical staging path.
|
||||
_tess_staging_env = os.environ.get("DATATOOLS_TESS_STAGING")
|
||||
if _tess_staging_env:
|
||||
_tess_staging = Path(_tess_staging_env)
|
||||
else:
|
||||
# Pick the obvious per-host staging dir as a fallback so spec-only
|
||||
# builds (without the orchestrator) still work in dev.
|
||||
# builds (without the CI env var) still work in dev.
|
||||
import sys as _sys_for_target
|
||||
_target_guess = (
|
||||
"win" if _sys_for_target.platform.startswith("win")
|
||||
@@ -149,8 +149,8 @@ else:
|
||||
# though, since the OCR feature will silently fail at runtime.
|
||||
print(
|
||||
f"WARNING: {_tess_staging} is empty or missing — OCR will be "
|
||||
"disabled in the bundle. Run build/make_release.py (which "
|
||||
"calls fetch_tesseract_for_platform) before pyinstaller, or "
|
||||
"disabled in the bundle. Run build/tesseract.py's "
|
||||
"fetch_tesseract_for_platform before pyinstaller, or "
|
||||
"pre-stage the binary manually."
|
||||
)
|
||||
|
||||
@@ -159,8 +159,8 @@ if (_tessdata / "eng.traineddata").exists():
|
||||
else:
|
||||
print(
|
||||
f"WARNING: {_tessdata}/eng.traineddata is missing — OCR will "
|
||||
"have no language data at runtime. Run build/make_release.py "
|
||||
"or fetch manually per build/vendor/README.md."
|
||||
"have no language data at runtime. Run build/tesseract.py's "
|
||||
"fetch_tessdata or fetch manually per build/vendor/README.md."
|
||||
)
|
||||
|
||||
# Bundle the Apache-2.0 LICENSE text alongside the binary. The docs
|
||||
|
||||
@@ -1,43 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# Wrap dist/DataTools.app into a no-install portable .zip.
|
||||
#
|
||||
# Usage:
|
||||
# bash build/macos/build_zip.sh <version>
|
||||
#
|
||||
# Why a portable .zip in addition to the .dmg:
|
||||
# * Buyers who don't want an installer can unzip and double-click the
|
||||
# .app directly — no drag-to-/Applications step, no installer
|
||||
# chrome. Self-contained: the .app holds Python + every dep.
|
||||
# * IT-locked-down machines often block .dmg auto-mount but allow
|
||||
# .zip download + extraction.
|
||||
#
|
||||
# Run after ``pyinstaller build/datatools.spec --clean --noconfirm``
|
||||
# has produced ``dist/DataTools.app``. Output goes to
|
||||
# ``dist/DataTools-<version>-mac-portable.zip``.
|
||||
#
|
||||
# Tesseract bundling: no-op here. The bundled Tesseract binary +
|
||||
# dylibs + tessdata are already inside DataTools.app/Contents/Resources/tesseract/
|
||||
# (placed by PyInstaller's BUNDLE/datas mechanism). ``ditto -c -k``
|
||||
# preserves the whole .app tree.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
VERSION="${1:-0.0.0-dev}"
|
||||
APP="dist/DataTools.app"
|
||||
ZIP="dist/DataTools-${VERSION}-mac-portable.zip"
|
||||
|
||||
if [[ ! -d "$APP" ]]; then
|
||||
echo "Error: $APP not found. Run pyinstaller build/datatools.spec first." >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# ``ditto`` preserves the .app bundle's extended attributes and
|
||||
# resource forks (a plain ``zip`` strips them and can break code
|
||||
# signatures + Info.plist resolution on the buyer's machine).
|
||||
#
|
||||
# --sequesterRsrc keeps the AppleDouble metadata inside the archive
|
||||
# rather than as parallel ._ files on disk after extraction.
|
||||
rm -f "$ZIP"
|
||||
ditto -c -k --sequesterRsrc --keepParent "$APP" "$ZIP"
|
||||
|
||||
echo "Built $ZIP ($(du -h "$ZIP" | cut -f1))"
|
||||
@@ -1,40 +1,23 @@
|
||||
"""Single-command release builder for DataTools.
|
||||
"""Tesseract bundling helpers for the release build.
|
||||
|
||||
PyInstaller can't cross-compile — to produce a Windows .exe you run
|
||||
this on Windows, for a Mac .dmg you run it on macOS, for a Linux
|
||||
AppImage you run it on Linux. One script, one OS at a time.
|
||||
PDF Extractor OCR ships a per-platform Tesseract binary plus the English
|
||||
``eng.traineddata`` model inside the frozen PyInstaller bundle so scanned
|
||||
PDFs work without a separate user install. These helpers fetch the binary
|
||||
and tessdata at build time; the GitHub Actions workflow
|
||||
(``.github/workflows/build.yml``) imports ``fetch_tessdata`` and
|
||||
``fetch_tesseract_for_platform`` and runs them before PyInstaller.
|
||||
|
||||
What this script does (in order):
|
||||
1. Preflight — checks PyInstaller, Pillow, and the platform's
|
||||
packager (Inno Setup on Win / hdiutil + ditto on Mac /
|
||||
appimagetool on Linux) are reachable. Bails with install
|
||||
instructions if anything is missing.
|
||||
2. Generates icon.ico / icon.icns / icon.png from the PNG asset.
|
||||
3. Runs PyInstaller against build/datatools.spec.
|
||||
4. Wraps the PyInstaller output into:
|
||||
* Windows: DataTools-<ver>-win-setup.exe (Inno Setup)
|
||||
+ DataTools-<ver>-win-portable.zip
|
||||
* macOS: DataTools-<ver>-mac.dmg
|
||||
+ DataTools-<ver>-mac-portable.zip
|
||||
* Linux: DataTools-<ver>-linux-x86_64.AppImage
|
||||
5. Prints what landed in dist/ and the byte sizes.
|
||||
|
||||
Usage:
|
||||
python build/make_release.py # build everything for this OS
|
||||
python build/make_release.py --preflight # check tooling, don't build
|
||||
python build/make_release.py --skip-installer # only the portable zip
|
||||
python build/make_release.py --skip-portable # only the installer
|
||||
python build/make_release.py --clean # wipe dist/ first
|
||||
|
||||
Run from the repo root or from build/ — either works.
|
||||
Everything is staged under ``build/_tesseract/<platform>/`` (gitignored).
|
||||
The PyInstaller spec (``build/datatools.spec``) reads that staging dir plus
|
||||
``build/vendor/tessdata/`` and bundles them under ``<bundle>/tesseract/``,
|
||||
where the runtime discovery code in ``src/pdf_extract.py`` expects:
|
||||
Path(sys._MEIPASS) / "tesseract" / "tesseract[.exe]"
|
||||
Path(sys._MEIPASS) / "tesseract" / "tessdata" / "eng.traineddata"
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import platform
|
||||
import re
|
||||
import shutil
|
||||
import subprocess
|
||||
import sys
|
||||
@@ -43,7 +26,6 @@ from pathlib import Path
|
||||
|
||||
REPO = Path(__file__).resolve().parent.parent
|
||||
BUILD = REPO / "build"
|
||||
DIST = REPO / "dist"
|
||||
|
||||
# Tesseract bundling. The runtime discovery code in
|
||||
# ``src/pdf_extract.py`` looks for the binary at
|
||||
@@ -95,119 +77,6 @@ def _run(cmd: list[str], cwd: Path | None = None, env: dict | None = None) -> No
|
||||
sys.exit(127)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Platform detection
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _detect_platform() -> str:
|
||||
"""Return ``win`` / ``mac`` / ``linux`` based on sys.platform."""
|
||||
p = sys.platform
|
||||
if p.startswith("win"):
|
||||
return "win"
|
||||
if p == "darwin":
|
||||
return "mac"
|
||||
if p.startswith("linux"):
|
||||
return "linux"
|
||||
_err(f"unsupported platform {p!r}; this script handles win/mac/linux only.")
|
||||
sys.exit(2)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Version — single source of truth in src/__init__.py
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _read_version() -> str:
|
||||
init_py = (REPO / "src" / "__init__.py").read_text(encoding="utf-8")
|
||||
m = re.search(r'__version__\s*=\s*["\']([^"\']+)["\']', init_py)
|
||||
if not m:
|
||||
_err("could not parse __version__ from src/__init__.py")
|
||||
sys.exit(1)
|
||||
return m.group(1)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Preflight — check tooling before doing anything destructive
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _have_module(name: str) -> bool:
|
||||
try:
|
||||
__import__(name)
|
||||
return True
|
||||
except ImportError:
|
||||
return False
|
||||
|
||||
|
||||
def _have_command(name: str) -> bool:
|
||||
return shutil.which(name) is not None
|
||||
|
||||
|
||||
# Per-platform install hints. The error messages quote these so a buyer
|
||||
# building from source isn't left guessing what to install next.
|
||||
_INSTALL_HINTS = {
|
||||
"pyinstaller": "pip install pyinstaller",
|
||||
"pil": "pip install pillow",
|
||||
"iscc": "Inno Setup (Windows): https://jrsoftware.org/isdl.php — install, then re-open the shell so iscc lands on PATH.",
|
||||
"hdiutil": "ships with macOS — if it's missing your Mac install is broken.",
|
||||
"ditto": "ships with macOS — if it's missing your Mac install is broken.",
|
||||
"appimagetool": "Linux: download appimagetool-x86_64.AppImage from https://github.com/AppImage/AppImageKit/releases, chmod +x, drop on PATH.",
|
||||
}
|
||||
|
||||
|
||||
def preflight(target: str) -> None:
|
||||
"""Verify every tool the target build needs is reachable; exit if not."""
|
||||
_step(f"preflight ({target})")
|
||||
|
||||
missing: list[tuple[str, str]] = []
|
||||
|
||||
# Python-side deps — same on every platform. The ``_INSTALL_HINTS``
|
||||
# lookup uses lowercase keys so module name capitalization doesn't
|
||||
# need to match.
|
||||
for mod in ("PyInstaller", "PIL"):
|
||||
if not _have_module(mod):
|
||||
hint = _INSTALL_HINTS.get(mod.lower(), f"pip install {mod}")
|
||||
missing.append((mod.lower(), hint))
|
||||
else:
|
||||
_ok(f"{mod} importable")
|
||||
|
||||
# PyInstaller's CLI must also be reachable as a binary, not just as
|
||||
# an importable module — the spec is invoked via the ``pyinstaller``
|
||||
# command. ``python -m PyInstaller`` is a fine fallback so don't
|
||||
# hard-fail if only the CLI binary is missing.
|
||||
if _have_command("pyinstaller"):
|
||||
_ok("pyinstaller on PATH")
|
||||
else:
|
||||
_warn("pyinstaller binary not on PATH — will fall back to `python -m PyInstaller`")
|
||||
|
||||
# Platform-specific packagers.
|
||||
if target == "win":
|
||||
if _have_command("iscc"):
|
||||
_ok("Inno Setup (iscc) on PATH")
|
||||
else:
|
||||
missing.append(("iscc", _INSTALL_HINTS["iscc"]))
|
||||
elif target == "mac":
|
||||
for tool in ("hdiutil", "ditto"):
|
||||
if _have_command(tool):
|
||||
_ok(f"{tool} on PATH")
|
||||
else:
|
||||
missing.append((tool, _INSTALL_HINTS[tool]))
|
||||
elif target == "linux":
|
||||
if _have_command("appimagetool"):
|
||||
_ok("appimagetool on PATH")
|
||||
else:
|
||||
missing.append(("appimagetool", _INSTALL_HINTS["appimagetool"]))
|
||||
|
||||
if missing:
|
||||
_err("missing prerequisites:")
|
||||
for name, hint in missing:
|
||||
print(f" - {name}: {hint}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
_ok("all prerequisites present")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Tesseract bundling — fetch the binary + tessdata at build time.
|
||||
#
|
||||
@@ -582,176 +451,3 @@ def fetch_tesseract_for_platform(target: str) -> Path:
|
||||
)
|
||||
sys.exit(1)
|
||||
return staging
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Build steps
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def step_generate_icons() -> None:
|
||||
_step("generate icons")
|
||||
_run([sys.executable, str(BUILD / "generate_icons.py")])
|
||||
|
||||
|
||||
def step_pyinstaller(clean: bool, *, target: str | None = None) -> None:
|
||||
_step("pyinstaller bundle")
|
||||
# Use ``python -m PyInstaller`` so we don't depend on the binary
|
||||
# being on PATH (Windows users frequently see this — pip's
|
||||
# Scripts/ dir isn't auto-added).
|
||||
cmd = [sys.executable, "-m", "PyInstaller",
|
||||
str(BUILD / "datatools.spec"),
|
||||
"--noconfirm"]
|
||||
if clean:
|
||||
cmd.append("--clean")
|
||||
# The spec reads ``DATATOOLS_TESS_STAGING`` to find the per-platform
|
||||
# tesseract staging dir. Passing it via env keeps the spec file
|
||||
# platform-agnostic — the spec doesn't need to detect win/mac/linux
|
||||
# itself; the orchestrator already did.
|
||||
env = os.environ.copy()
|
||||
if target:
|
||||
env["DATATOOLS_TESS_STAGING"] = str(TESSERACT_STAGING / target)
|
||||
_run(cmd, env=env)
|
||||
|
||||
|
||||
def step_package_win(version: str, do_installer: bool, do_portable: bool) -> list[Path]:
|
||||
out: list[Path] = []
|
||||
if do_installer:
|
||||
_step("Windows installer (Inno Setup)")
|
||||
_run(["iscc", f"/DAppVersion={version}", str(BUILD / "installer.iss")])
|
||||
out.append(DIST / f"DataTools-{version}-win-setup.exe")
|
||||
if do_portable:
|
||||
_step("Windows portable .zip")
|
||||
_run([sys.executable, str(BUILD / "build_portable_zip.py"), "win", version])
|
||||
out.append(DIST / f"DataTools-{version}-win-portable.zip")
|
||||
return out
|
||||
|
||||
|
||||
def step_package_mac(version: str, do_installer: bool, do_portable: bool) -> list[Path]:
|
||||
out: list[Path] = []
|
||||
if do_installer:
|
||||
_step("macOS DMG (installer)")
|
||||
_run(["bash", str(BUILD / "macos" / "build_dmg.sh"), version])
|
||||
out.append(DIST / f"DataTools-{version}-mac.dmg")
|
||||
if do_portable:
|
||||
_step("macOS portable .zip")
|
||||
_run(["bash", str(BUILD / "macos" / "build_zip.sh"), version])
|
||||
out.append(DIST / f"DataTools-{version}-mac-portable.zip")
|
||||
return out
|
||||
|
||||
|
||||
def step_package_linux(version: str, do_installer: bool, do_portable: bool) -> list[Path]:
|
||||
# On Linux the AppImage IS the portable. We ignore the two flags
|
||||
# and always produce the single file — splitting wouldn't add
|
||||
# value.
|
||||
if not (do_installer or do_portable):
|
||||
return []
|
||||
_step("Linux AppImage")
|
||||
_run(["bash", str(BUILD / "appimage" / "build.sh"), version])
|
||||
return [DIST / f"DataTools-{version}-linux-x86_64.AppImage"]
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Orchestration
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _summarise(outputs: list[Path]) -> None:
|
||||
_step("done — outputs")
|
||||
if not outputs:
|
||||
_warn("no files produced (everything skipped via flags)")
|
||||
return
|
||||
for p in outputs:
|
||||
if p.exists():
|
||||
size_mb = p.stat().st_size / (1024 * 1024)
|
||||
print(f" {p.relative_to(REPO)} ({size_mb:.1f} MB)")
|
||||
else:
|
||||
_warn(f"expected output missing: {p.relative_to(REPO)}")
|
||||
|
||||
|
||||
def main() -> int:
|
||||
parser = argparse.ArgumentParser(
|
||||
prog="make_release.py",
|
||||
description=(
|
||||
"Build the installer + portable zip for the current OS. "
|
||||
"Cross-compilation isn't supported by PyInstaller — run "
|
||||
"this once per platform you want to target."
|
||||
),
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--platform", choices=("auto", "win", "mac", "linux"), default="auto",
|
||||
help="Override OS detection (mostly for testing). Default: auto.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--preflight", action="store_true",
|
||||
help="Check tooling and exit without building.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--clean", action="store_true",
|
||||
help="Wipe dist/ before building.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--skip-installer", action="store_true",
|
||||
help="Don't build the OS installer (.exe / .dmg).",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--skip-portable", action="store_true",
|
||||
help="Don't build the portable .zip.",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
target = _detect_platform() if args.platform == "auto" else args.platform
|
||||
version = _read_version()
|
||||
do_installer = not args.skip_installer
|
||||
do_portable = not args.skip_portable
|
||||
|
||||
print(f"DataTools release builder")
|
||||
print(f" target: {target} (host: {platform.platform()})")
|
||||
print(f" version: {version}")
|
||||
print(f" installer: {'yes' if do_installer else 'no'}")
|
||||
print(f" portable: {'yes' if do_portable else 'no'}")
|
||||
print(f" dist dir: {DIST}")
|
||||
|
||||
if target != _detect_platform():
|
||||
_warn(
|
||||
f"--platform {target} but host is {_detect_platform()}. "
|
||||
"PyInstaller can't cross-compile — the bundle will be for "
|
||||
"the HOST, only the packaging step will follow your override. "
|
||||
"Useful only for testing the packager paths."
|
||||
)
|
||||
|
||||
preflight(target)
|
||||
if args.preflight:
|
||||
return 0
|
||||
|
||||
if args.clean and DIST.exists():
|
||||
_step(f"cleaning {DIST}")
|
||||
shutil.rmtree(DIST)
|
||||
|
||||
step_generate_icons()
|
||||
|
||||
# Stage Tesseract OCR before PyInstaller runs. The spec reads
|
||||
# ``build/_tesseract/<target>/`` + ``build/vendor/tessdata/`` and
|
||||
# bundles them under ``<bundle>/tesseract/`` so the runtime
|
||||
# discovery in src/pdf_extract.py finds them at:
|
||||
# Path(sys._MEIPASS) / "tesseract" / "tesseract[.exe]"
|
||||
# Path(sys._MEIPASS) / "tesseract" / "tessdata" / "eng.traineddata"
|
||||
fetch_tessdata()
|
||||
fetch_tesseract_for_platform(target)
|
||||
|
||||
step_pyinstaller(clean=args.clean, target=target)
|
||||
|
||||
if target == "win":
|
||||
outputs = step_package_win(version, do_installer, do_portable)
|
||||
elif target == "mac":
|
||||
outputs = step_package_mac(version, do_installer, do_portable)
|
||||
else:
|
||||
outputs = step_package_linux(version, do_installer, do_portable)
|
||||
|
||||
_summarise(outputs)
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
11
build/vendor/README.md
vendored
11
build/vendor/README.md
vendored
@@ -4,9 +4,10 @@ This tree holds the third-party assets that get bundled into the
|
||||
PyInstaller artifacts but that we deliberately do **not** keep in git
|
||||
(too large / license-encumbered / re-fetchable on demand).
|
||||
|
||||
The build pipeline (`build/make_release.py`) populates everything in
|
||||
here before the PyInstaller step. The contents are git-ignored except
|
||||
for this README.
|
||||
The build's Tesseract helper (`build/tesseract.py`) populates
|
||||
everything in here before the PyInstaller step — CI
|
||||
(`.github/workflows/build.yml`) calls it ahead of the build. The
|
||||
contents are git-ignored except for this README.
|
||||
|
||||
## tessdata/
|
||||
|
||||
@@ -40,9 +41,9 @@ statements (the only OCR use case so far), the extra accuracy of the
|
||||
|
||||
### How it gets populated
|
||||
|
||||
`build/make_release.py::fetch_tessdata()` checks for
|
||||
`build/tesseract.py::fetch_tessdata()` checks for
|
||||
`build/vendor/tessdata/eng.traineddata` on every run. If it's
|
||||
missing, the script downloads it from the canonical URL above and
|
||||
missing, it downloads the file from the canonical URL above and
|
||||
caches it here. Subsequent builds reuse the cached file.
|
||||
|
||||
On CI, the directory is restored from the GitHub Actions cache so we
|
||||
|
||||
Reference in New Issue
Block a user