build: drop the local Python release method, return to CI-only installer builds

Removes the single-command Python packaging method (build/make_release.py
+ build/build_portable_zip.py + build/macos/build_zip.sh) and the portable
.zip artifacts it produced. Release builds go back to the original GitHub
Actions process: the CI matrix builds one installer per platform (.dmg /
.exe / .AppImage) on tag push and attaches them to a GitHub Release.

Tesseract OCR bundling is preserved: the fetch helpers the workflow depends
on (fetch_tessdata, fetch_tesseract_for_platform) are extracted into a
standalone build/tesseract.py, which build.yml now imports.

Docs (README, build/README, DEVELOPER, TECHNICAL, USER-GUIDE, vendor README,
es translations) updated to drop the portable-zip flavor and point at the
new module.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-22 17:47:36 +00:00
parent 28ab51a869
commit fd9606c67b
13 changed files with 127 additions and 608 deletions

View File

@@ -23,14 +23,12 @@ build/
├── generate_icons.py Builds icon.ico / icon.icns / icon.png from
│ src/gui/assets/datatools_icon_256.png. Run
│ once before pyinstaller (CI does this).
├── build_portable_zip.py Cross-platform: zips dist/DataTools/ into a
no-install portable download. Used by the
Windows + Linux portable artifacts.
├── tesseract.py Fetches the per-platform Tesseract binary +
eng.traineddata at build time. CI imports
fetch_tessdata + fetch_tesseract_for_platform.
├── macos/
── build_dmg.sh Wraps dist/DataTools.app into a .dmg with a
drag-to-/Applications layout (installer).
│ └── build_zip.sh Wraps dist/DataTools.app into a portable
│ .zip via ditto (preserves bundle metadata).
── build_dmg.sh Wraps dist/DataTools.app into a .dmg with a
drag-to-/Applications layout (installer).
├── appimage/
│ ├── AppRun Entry point invoked when the AppImage runs.
│ ├── datatools.desktop Linux desktop-entry metadata.
@@ -43,17 +41,15 @@ build/
## Distribution outputs per platform
Each CI run produces two downloads per platform — an installer for
buyers who want shortcuts wired automatically, and a portable .zip
for buyers (or IT-locked-down machines) that can't run installers:
Each CI run produces one installer per platform:
| Platform | Installer | Portable |
|----------|----------------------------------------|------------------------------------------------|
| macOS | `DataTools-<ver>-mac.dmg` | `DataTools-<ver>-mac-portable.zip` (ditto .app)|
| Windows | `DataTools-<ver>-win-setup.exe` | `DataTools-<ver>-win-portable.zip` |
| Linux | `DataTools-<ver>-linux-x86_64.AppImage`| (the AppImage IS the portable) |
| Platform | Installer |
|----------|----------------------------------------|
| macOS | `DataTools-<ver>-mac.dmg` |
| Windows | `DataTools-<ver>-win-setup.exe` |
| Linux | `DataTools-<ver>-linux-x86_64.AppImage` (already portable) |
All six outputs are self-contained: every dependency (Python, pandas,
All three outputs are self-contained: every dependency (Python, pandas,
streamlit, pdfplumber, **Tesseract OCR + `eng.traineddata`**, the lot)
is frozen into the bundle. The buyer does not need to install Python,
pip, Tesseract, or anything else first. With Tesseract bundled, each
@@ -76,47 +72,44 @@ the resulting installers to a GitHub Release. Manual
## Releasing
### Single-command local build (recommended for one-developer workflow)
### CI build (push tag → GitHub Release) — the release process
PyInstaller can't cross-compile, so a single machine produces one
platform's packages. Run this on each target OS:
```bash
# One-time setup per machine:
pip install -r requirements.txt
pip install pyinstaller pillow
# Windows only: install Inno Setup from https://jrsoftware.org/isdl.php
# Linux only: drop appimagetool onto PATH (see preflight output)
# Build everything for the current OS:
python build/make_release.py
```
Outputs land in `dist/`:
- Windows host → `DataTools-<ver>-win-setup.exe` + `DataTools-<ver>-win-portable.zip`
- macOS host → `DataTools-<ver>-mac.dmg` + `DataTools-<ver>-mac-portable.zip`
- Linux host → `DataTools-<ver>-linux-x86_64.AppImage`
Useful flags:
```bash
python build/make_release.py --preflight # check tooling, build nothing
python build/make_release.py --clean # wipe dist/ first
python build/make_release.py --skip-installer # just the portable zip
python build/make_release.py --skip-portable # just the installer
```
### CI build (push tag → GitHub Release)
If you have CI runners for all three OSes:
Releases are built by GitHub Actions (`.github/workflows/build.yml`),
not on a developer's machine. The matrix runs on
macos-latest / windows-latest / ubuntu-latest, stages Tesseract
(`build/tesseract.py`), runs PyInstaller, packages the per-platform
installer, and attaches it to a GitHub Release on tag push:
1. Bump `__version__` in `src/__init__.py`.
2. `git commit -am "release: vX.Y.Z" && git tag vX.Y.Z`.
3. `git push && git push --tags`.
4. CI builds all three platforms and creates a Release with the
installers + portable zips attached.
installers attached.
5. Mirror the Release assets to Gumroad (manual until v2).
A manual `workflow_dispatch` run does the same build but uploads the
installers as workflow artifacts instead of creating a Release —
useful for smoke-testing a build without cutting a tag.
### Local build (single platform, for testing)
PyInstaller can't cross-compile, so a local build produces only the
current OS's installer. This mirrors what CI does, by hand — use it to
debug the bundle before tagging. See the per-platform recipes below for
the exact commands; the short version is:
```bash
pip install -r requirements.txt
pip install pyinstaller pillow
python build/generate_icons.py
python -c "import sys; sys.path.insert(0,'build'); \
from tesseract import fetch_tessdata, fetch_tesseract_for_platform; \
fetch_tessdata(); fetch_tesseract_for_platform('mac')" # win / mac / linux
pyinstaller build/datatools.spec --clean --noconfirm
# then run the matching packager: build/macos/build_dmg.sh,
# build/installer.iss (iscc), or build/appimage/build.sh
```
## Signing (Phase 2 — needs accounts/credentials)
Both code-signing steps are intentionally not in CI yet because they
@@ -321,17 +314,18 @@ The runtime resolver (in `src/`, owned by the runtime team) walks:
(sourced from [tessdata_best](https://github.com/tesseract-ocr/tessdata_best)).
`datatools.spec` copies it into `tesseract/tessdata/`.
- **Binary** — fetched per-platform at build time by
`build/make_release.py` from pinned upstream URLs. Current pin:
**Tesseract 5.5.0**.
`build/tesseract.py` from pinned upstream URLs. Current pin:
**Tesseract 5.5.0**. CI imports `fetch_tessdata` +
`fetch_tesseract_for_platform` from this module before PyInstaller.
**Updating Tesseract**:
1. Bump the version pin and the per-platform fetch URLs in
`build/make_release.py`.
`build/tesseract.py`.
2. If the model schema changed upstream, refresh
`build/vendor/tessdata/eng.traineddata` from `tessdata_best` at the
matching tag.
3. Rebuild on each platform (`python build/make_release.py`) and
3. Push a `v*` tag so CI rebuilds all three platforms, then
smoke-test a scanned PDF through the PDF Extractor.
4. Update `LICENSE_TESSERACT.txt` at the repo root if upstream license
terms change (Apache-2.0 today).

View File

@@ -1,69 +0,0 @@
"""Wrap the PyInstaller folder build into a portable .zip.
Self-contained download: unzip → double-click the launcher → app runs.
No installer, no Python install, no admin rights required.
Usage:
python build/build_portable_zip.py <platform> <version>
Where ``platform`` is one of ``win`` / ``mac`` / ``linux``. The
script just produces a generic ``dist/DataTools/`` zip; on macOS the
preferred portable format is the ``ditto``-wrapped .app — see
``build/macos/build_zip.sh`` for that flow. This helper exists mainly
for Windows + Linux, where there's no .app bundle to wrap.
Output:
dist/DataTools-<version>-<platform>-portable.zip
The zip root is the ``DataTools/`` folder so an unzip produces a
self-contained dir the user can drop anywhere (Desktop, USB stick,
network share). On Windows, the launcher is ``DataTools.exe`` inside
that folder; on Linux, ``DataTools``.
"""
from __future__ import annotations
import shutil
import sys
from pathlib import Path
REPO = Path(__file__).resolve().parent.parent
DIST_DIR = REPO / "dist"
BUNDLE_DIR = DIST_DIR / "DataTools"
def main() -> int:
if len(sys.argv) < 3:
sys.stderr.write(
"usage: python build/build_portable_zip.py <platform> <version>\n"
)
return 2
platform = sys.argv[1]
version = sys.argv[2]
if not BUNDLE_DIR.is_dir():
sys.stderr.write(
f"Bundle dir not found at {BUNDLE_DIR}.\n"
"Run ``pyinstaller build/datatools.spec --clean --noconfirm`` first.\n"
)
return 1
out_stem = DIST_DIR / f"DataTools-{version}-{platform}-portable"
# ``make_archive`` takes a base name (no extension) and produces
# ``<base>.zip``. ``root_dir`` = parent of what we want compressed,
# ``base_dir`` = the folder name inside the archive root. This
# combo yields a single top-level ``DataTools/`` directory inside
# the .zip rather than dumping its contents loose.
archive = shutil.make_archive(
base_name=str(out_stem),
format="zip",
root_dir=str(DIST_DIR),
base_dir="DataTools",
)
size_mb = Path(archive).stat().st_size / (1024 * 1024)
print(f"wrote {archive} ({size_mb:.1f} MB)")
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@@ -105,7 +105,7 @@ datas += [
]
# ----- Tesseract OCR bundle ----------------------------------------
# ``build/make_release.py`` stages the per-platform Tesseract binary
# ``build/tesseract.py`` stages the per-platform Tesseract binary
# + its runtime libs (DLLs/dylibs/sos) into
# ``build/_tesseract/<target>/`` and the shared eng.traineddata into
# ``build/vendor/tessdata/``. We add both to ``datas`` so PyInstaller
@@ -119,16 +119,16 @@ datas += [
# from ``Path(sys._MEIPASS) / "tesseract" / ...``. Keep the two ends
# in sync — if you rename "tesseract" here, update pdf_extract.py too.
#
# The orchestrator (make_release.py) sets DATATOOLS_TESS_STAGING to
# the right per-platform dir before invoking PyInstaller. For ad-hoc
# `pyinstaller build/datatools.spec` runs without the orchestrator,
# fall back to the canonical staging path.
# CI (.github/workflows/build.yml) sets DATATOOLS_TESS_STAGING to the
# right per-platform dir before invoking PyInstaller. For ad-hoc
# `pyinstaller build/datatools.spec` runs without that env var, fall
# back to the canonical staging path.
_tess_staging_env = os.environ.get("DATATOOLS_TESS_STAGING")
if _tess_staging_env:
_tess_staging = Path(_tess_staging_env)
else:
# Pick the obvious per-host staging dir as a fallback so spec-only
# builds (without the orchestrator) still work in dev.
# builds (without the CI env var) still work in dev.
import sys as _sys_for_target
_target_guess = (
"win" if _sys_for_target.platform.startswith("win")
@@ -149,8 +149,8 @@ else:
# though, since the OCR feature will silently fail at runtime.
print(
f"WARNING: {_tess_staging} is empty or missing OCR will be "
"disabled in the bundle. Run build/make_release.py (which "
"calls fetch_tesseract_for_platform) before pyinstaller, or "
"disabled in the bundle. Run build/tesseract.py's "
"fetch_tesseract_for_platform before pyinstaller, or "
"pre-stage the binary manually."
)
@@ -159,8 +159,8 @@ if (_tessdata / "eng.traineddata").exists():
else:
print(
f"WARNING: {_tessdata}/eng.traineddata is missing OCR will "
"have no language data at runtime. Run build/make_release.py "
"or fetch manually per build/vendor/README.md."
"have no language data at runtime. Run build/tesseract.py's "
"fetch_tessdata or fetch manually per build/vendor/README.md."
)
# Bundle the Apache-2.0 LICENSE text alongside the binary. The docs

View File

@@ -1,43 +0,0 @@
#!/usr/bin/env bash
# Wrap dist/DataTools.app into a no-install portable .zip.
#
# Usage:
# bash build/macos/build_zip.sh <version>
#
# Why a portable .zip in addition to the .dmg:
# * Buyers who don't want an installer can unzip and double-click the
# .app directly — no drag-to-/Applications step, no installer
# chrome. Self-contained: the .app holds Python + every dep.
# * IT-locked-down machines often block .dmg auto-mount but allow
# .zip download + extraction.
#
# Run after ``pyinstaller build/datatools.spec --clean --noconfirm``
# has produced ``dist/DataTools.app``. Output goes to
# ``dist/DataTools-<version>-mac-portable.zip``.
#
# Tesseract bundling: no-op here. The bundled Tesseract binary +
# dylibs + tessdata are already inside DataTools.app/Contents/Resources/tesseract/
# (placed by PyInstaller's BUNDLE/datas mechanism). ``ditto -c -k``
# preserves the whole .app tree.
set -euo pipefail
VERSION="${1:-0.0.0-dev}"
APP="dist/DataTools.app"
ZIP="dist/DataTools-${VERSION}-mac-portable.zip"
if [[ ! -d "$APP" ]]; then
echo "Error: $APP not found. Run pyinstaller build/datatools.spec first." >&2
exit 1
fi
# ``ditto`` preserves the .app bundle's extended attributes and
# resource forks (a plain ``zip`` strips them and can break code
# signatures + Info.plist resolution on the buyer's machine).
#
# --sequesterRsrc keeps the AppleDouble metadata inside the archive
# rather than as parallel ._ files on disk after extraction.
rm -f "$ZIP"
ditto -c -k --sequesterRsrc --keepParent "$APP" "$ZIP"
echo "Built $ZIP ($(du -h "$ZIP" | cut -f1))"

View File

@@ -1,40 +1,23 @@
"""Single-command release builder for DataTools.
"""Tesseract bundling helpers for the release build.
PyInstaller can't cross-compile — to produce a Windows .exe you run
this on Windows, for a Mac .dmg you run it on macOS, for a Linux
AppImage you run it on Linux. One script, one OS at a time.
PDF Extractor OCR ships a per-platform Tesseract binary plus the English
``eng.traineddata`` model inside the frozen PyInstaller bundle so scanned
PDFs work without a separate user install. These helpers fetch the binary
and tessdata at build time; the GitHub Actions workflow
(``.github/workflows/build.yml``) imports ``fetch_tessdata`` and
``fetch_tesseract_for_platform`` and runs them before PyInstaller.
What this script does (in order):
1. Preflight checks PyInstaller, Pillow, and the platform's
packager (Inno Setup on Win / hdiutil + ditto on Mac /
appimagetool on Linux) are reachable. Bails with install
instructions if anything is missing.
2. Generates icon.ico / icon.icns / icon.png from the PNG asset.
3. Runs PyInstaller against build/datatools.spec.
4. Wraps the PyInstaller output into:
* Windows: DataTools-<ver>-win-setup.exe (Inno Setup)
+ DataTools-<ver>-win-portable.zip
* macOS: DataTools-<ver>-mac.dmg
+ DataTools-<ver>-mac-portable.zip
* Linux: DataTools-<ver>-linux-x86_64.AppImage
5. Prints what landed in dist/ and the byte sizes.
Usage:
python build/make_release.py # build everything for this OS
python build/make_release.py --preflight # check tooling, don't build
python build/make_release.py --skip-installer # only the portable zip
python build/make_release.py --skip-portable # only the installer
python build/make_release.py --clean # wipe dist/ first
Run from the repo root or from build/ either works.
Everything is staged under ``build/_tesseract/<platform>/`` (gitignored).
The PyInstaller spec (``build/datatools.spec``) reads that staging dir plus
``build/vendor/tessdata/`` and bundles them under ``<bundle>/tesseract/``,
where the runtime discovery code in ``src/pdf_extract.py`` expects:
Path(sys._MEIPASS) / "tesseract" / "tesseract[.exe]"
Path(sys._MEIPASS) / "tesseract" / "tessdata" / "eng.traineddata"
"""
from __future__ import annotations
import argparse
import os
import platform
import re
import shutil
import subprocess
import sys
@@ -43,7 +26,6 @@ from pathlib import Path
REPO = Path(__file__).resolve().parent.parent
BUILD = REPO / "build"
DIST = REPO / "dist"
# Tesseract bundling. The runtime discovery code in
# ``src/pdf_extract.py`` looks for the binary at
@@ -95,119 +77,6 @@ def _run(cmd: list[str], cwd: Path | None = None, env: dict | None = None) -> No
sys.exit(127)
# ---------------------------------------------------------------------------
# Platform detection
# ---------------------------------------------------------------------------
def _detect_platform() -> str:
"""Return ``win`` / ``mac`` / ``linux`` based on sys.platform."""
p = sys.platform
if p.startswith("win"):
return "win"
if p == "darwin":
return "mac"
if p.startswith("linux"):
return "linux"
_err(f"unsupported platform {p!r}; this script handles win/mac/linux only.")
sys.exit(2)
# ---------------------------------------------------------------------------
# Version — single source of truth in src/__init__.py
# ---------------------------------------------------------------------------
def _read_version() -> str:
init_py = (REPO / "src" / "__init__.py").read_text(encoding="utf-8")
m = re.search(r'__version__\s*=\s*["\']([^"\']+)["\']', init_py)
if not m:
_err("could not parse __version__ from src/__init__.py")
sys.exit(1)
return m.group(1)
# ---------------------------------------------------------------------------
# Preflight — check tooling before doing anything destructive
# ---------------------------------------------------------------------------
def _have_module(name: str) -> bool:
try:
__import__(name)
return True
except ImportError:
return False
def _have_command(name: str) -> bool:
return shutil.which(name) is not None
# Per-platform install hints. The error messages quote these so a buyer
# building from source isn't left guessing what to install next.
_INSTALL_HINTS = {
"pyinstaller": "pip install pyinstaller",
"pil": "pip install pillow",
"iscc": "Inno Setup (Windows): https://jrsoftware.org/isdl.php — install, then re-open the shell so iscc lands on PATH.",
"hdiutil": "ships with macOS — if it's missing your Mac install is broken.",
"ditto": "ships with macOS — if it's missing your Mac install is broken.",
"appimagetool": "Linux: download appimagetool-x86_64.AppImage from https://github.com/AppImage/AppImageKit/releases, chmod +x, drop on PATH.",
}
def preflight(target: str) -> None:
"""Verify every tool the target build needs is reachable; exit if not."""
_step(f"preflight ({target})")
missing: list[tuple[str, str]] = []
# Python-side deps — same on every platform. The ``_INSTALL_HINTS``
# lookup uses lowercase keys so module name capitalization doesn't
# need to match.
for mod in ("PyInstaller", "PIL"):
if not _have_module(mod):
hint = _INSTALL_HINTS.get(mod.lower(), f"pip install {mod}")
missing.append((mod.lower(), hint))
else:
_ok(f"{mod} importable")
# PyInstaller's CLI must also be reachable as a binary, not just as
# an importable module — the spec is invoked via the ``pyinstaller``
# command. ``python -m PyInstaller`` is a fine fallback so don't
# hard-fail if only the CLI binary is missing.
if _have_command("pyinstaller"):
_ok("pyinstaller on PATH")
else:
_warn("pyinstaller binary not on PATH — will fall back to `python -m PyInstaller`")
# Platform-specific packagers.
if target == "win":
if _have_command("iscc"):
_ok("Inno Setup (iscc) on PATH")
else:
missing.append(("iscc", _INSTALL_HINTS["iscc"]))
elif target == "mac":
for tool in ("hdiutil", "ditto"):
if _have_command(tool):
_ok(f"{tool} on PATH")
else:
missing.append((tool, _INSTALL_HINTS[tool]))
elif target == "linux":
if _have_command("appimagetool"):
_ok("appimagetool on PATH")
else:
missing.append(("appimagetool", _INSTALL_HINTS["appimagetool"]))
if missing:
_err("missing prerequisites:")
for name, hint in missing:
print(f" - {name}: {hint}", file=sys.stderr)
sys.exit(1)
_ok("all prerequisites present")
# ---------------------------------------------------------------------------
# Tesseract bundling — fetch the binary + tessdata at build time.
#
@@ -582,176 +451,3 @@ def fetch_tesseract_for_platform(target: str) -> Path:
)
sys.exit(1)
return staging
# ---------------------------------------------------------------------------
# Build steps
# ---------------------------------------------------------------------------
def step_generate_icons() -> None:
_step("generate icons")
_run([sys.executable, str(BUILD / "generate_icons.py")])
def step_pyinstaller(clean: bool, *, target: str | None = None) -> None:
_step("pyinstaller bundle")
# Use ``python -m PyInstaller`` so we don't depend on the binary
# being on PATH (Windows users frequently see this — pip's
# Scripts/ dir isn't auto-added).
cmd = [sys.executable, "-m", "PyInstaller",
str(BUILD / "datatools.spec"),
"--noconfirm"]
if clean:
cmd.append("--clean")
# The spec reads ``DATATOOLS_TESS_STAGING`` to find the per-platform
# tesseract staging dir. Passing it via env keeps the spec file
# platform-agnostic — the spec doesn't need to detect win/mac/linux
# itself; the orchestrator already did.
env = os.environ.copy()
if target:
env["DATATOOLS_TESS_STAGING"] = str(TESSERACT_STAGING / target)
_run(cmd, env=env)
def step_package_win(version: str, do_installer: bool, do_portable: bool) -> list[Path]:
out: list[Path] = []
if do_installer:
_step("Windows installer (Inno Setup)")
_run(["iscc", f"/DAppVersion={version}", str(BUILD / "installer.iss")])
out.append(DIST / f"DataTools-{version}-win-setup.exe")
if do_portable:
_step("Windows portable .zip")
_run([sys.executable, str(BUILD / "build_portable_zip.py"), "win", version])
out.append(DIST / f"DataTools-{version}-win-portable.zip")
return out
def step_package_mac(version: str, do_installer: bool, do_portable: bool) -> list[Path]:
out: list[Path] = []
if do_installer:
_step("macOS DMG (installer)")
_run(["bash", str(BUILD / "macos" / "build_dmg.sh"), version])
out.append(DIST / f"DataTools-{version}-mac.dmg")
if do_portable:
_step("macOS portable .zip")
_run(["bash", str(BUILD / "macos" / "build_zip.sh"), version])
out.append(DIST / f"DataTools-{version}-mac-portable.zip")
return out
def step_package_linux(version: str, do_installer: bool, do_portable: bool) -> list[Path]:
# On Linux the AppImage IS the portable. We ignore the two flags
# and always produce the single file — splitting wouldn't add
# value.
if not (do_installer or do_portable):
return []
_step("Linux AppImage")
_run(["bash", str(BUILD / "appimage" / "build.sh"), version])
return [DIST / f"DataTools-{version}-linux-x86_64.AppImage"]
# ---------------------------------------------------------------------------
# Orchestration
# ---------------------------------------------------------------------------
def _summarise(outputs: list[Path]) -> None:
_step("done — outputs")
if not outputs:
_warn("no files produced (everything skipped via flags)")
return
for p in outputs:
if p.exists():
size_mb = p.stat().st_size / (1024 * 1024)
print(f" {p.relative_to(REPO)} ({size_mb:.1f} MB)")
else:
_warn(f"expected output missing: {p.relative_to(REPO)}")
def main() -> int:
parser = argparse.ArgumentParser(
prog="make_release.py",
description=(
"Build the installer + portable zip for the current OS. "
"Cross-compilation isn't supported by PyInstaller — run "
"this once per platform you want to target."
),
formatter_class=argparse.RawDescriptionHelpFormatter,
)
parser.add_argument(
"--platform", choices=("auto", "win", "mac", "linux"), default="auto",
help="Override OS detection (mostly for testing). Default: auto.",
)
parser.add_argument(
"--preflight", action="store_true",
help="Check tooling and exit without building.",
)
parser.add_argument(
"--clean", action="store_true",
help="Wipe dist/ before building.",
)
parser.add_argument(
"--skip-installer", action="store_true",
help="Don't build the OS installer (.exe / .dmg).",
)
parser.add_argument(
"--skip-portable", action="store_true",
help="Don't build the portable .zip.",
)
args = parser.parse_args()
target = _detect_platform() if args.platform == "auto" else args.platform
version = _read_version()
do_installer = not args.skip_installer
do_portable = not args.skip_portable
print(f"DataTools release builder")
print(f" target: {target} (host: {platform.platform()})")
print(f" version: {version}")
print(f" installer: {'yes' if do_installer else 'no'}")
print(f" portable: {'yes' if do_portable else 'no'}")
print(f" dist dir: {DIST}")
if target != _detect_platform():
_warn(
f"--platform {target} but host is {_detect_platform()}. "
"PyInstaller can't cross-compile — the bundle will be for "
"the HOST, only the packaging step will follow your override. "
"Useful only for testing the packager paths."
)
preflight(target)
if args.preflight:
return 0
if args.clean and DIST.exists():
_step(f"cleaning {DIST}")
shutil.rmtree(DIST)
step_generate_icons()
# Stage Tesseract OCR before PyInstaller runs. The spec reads
# ``build/_tesseract/<target>/`` + ``build/vendor/tessdata/`` and
# bundles them under ``<bundle>/tesseract/`` so the runtime
# discovery in src/pdf_extract.py finds them at:
# Path(sys._MEIPASS) / "tesseract" / "tesseract[.exe]"
# Path(sys._MEIPASS) / "tesseract" / "tessdata" / "eng.traineddata"
fetch_tessdata()
fetch_tesseract_for_platform(target)
step_pyinstaller(clean=args.clean, target=target)
if target == "win":
outputs = step_package_win(version, do_installer, do_portable)
elif target == "mac":
outputs = step_package_mac(version, do_installer, do_portable)
else:
outputs = step_package_linux(version, do_installer, do_portable)
_summarise(outputs)
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@@ -4,9 +4,10 @@ This tree holds the third-party assets that get bundled into the
PyInstaller artifacts but that we deliberately do **not** keep in git
(too large / license-encumbered / re-fetchable on demand).
The build pipeline (`build/make_release.py`) populates everything in
here before the PyInstaller step. The contents are git-ignored except
for this README.
The build's Tesseract helper (`build/tesseract.py`) populates
everything in here before the PyInstaller step — CI
(`.github/workflows/build.yml`) calls it ahead of the build. The
contents are git-ignored except for this README.
## tessdata/
@@ -40,9 +41,9 @@ statements (the only OCR use case so far), the extra accuracy of the
### How it gets populated
`build/make_release.py::fetch_tessdata()` checks for
`build/tesseract.py::fetch_tessdata()` checks for
`build/vendor/tessdata/eng.traineddata` on every run. If it's
missing, the script downloads it from the canonical URL above and
missing, it downloads the file from the canonical URL above and
caches it here. Subsequent builds reuse the cached file.
On CI, the directory is restored from the GitHub Actions cache so we