build: drop the local Python release method, return to CI-only installer builds

Removes the single-command Python packaging method (build/make_release.py
+ build/build_portable_zip.py + build/macos/build_zip.sh) and the portable
.zip artifacts it produced. Release builds go back to the original GitHub
Actions process: the CI matrix builds one installer per platform (.dmg /
.exe / .AppImage) on tag push and attaches them to a GitHub Release.

Tesseract OCR bundling is preserved: the fetch helpers the workflow depends
on (fetch_tessdata, fetch_tesseract_for_platform) are extracted into a
standalone build/tesseract.py, which build.yml now imports.

Docs (README, build/README, DEVELOPER, TECHNICAL, USER-GUIDE, vendor README,
es translations) updated to drop the portable-zip flavor and point at the
new module.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-22 17:47:36 +00:00
parent 28ab51a869
commit fd9606c67b
13 changed files with 127 additions and 608 deletions

View File

@@ -23,14 +23,12 @@ build/
├── generate_icons.py Builds icon.ico / icon.icns / icon.png from
│ src/gui/assets/datatools_icon_256.png. Run
│ once before pyinstaller (CI does this).
├── build_portable_zip.py Cross-platform: zips dist/DataTools/ into a
no-install portable download. Used by the
Windows + Linux portable artifacts.
├── tesseract.py Fetches the per-platform Tesseract binary +
eng.traineddata at build time. CI imports
fetch_tessdata + fetch_tesseract_for_platform.
├── macos/
── build_dmg.sh Wraps dist/DataTools.app into a .dmg with a
drag-to-/Applications layout (installer).
│ └── build_zip.sh Wraps dist/DataTools.app into a portable
│ .zip via ditto (preserves bundle metadata).
── build_dmg.sh Wraps dist/DataTools.app into a .dmg with a
drag-to-/Applications layout (installer).
├── appimage/
│ ├── AppRun Entry point invoked when the AppImage runs.
│ ├── datatools.desktop Linux desktop-entry metadata.
@@ -43,17 +41,15 @@ build/
## Distribution outputs per platform
Each CI run produces two downloads per platform — an installer for
buyers who want shortcuts wired automatically, and a portable .zip
for buyers (or IT-locked-down machines) that can't run installers:
Each CI run produces one installer per platform:
| Platform | Installer | Portable |
|----------|----------------------------------------|------------------------------------------------|
| macOS | `DataTools-<ver>-mac.dmg` | `DataTools-<ver>-mac-portable.zip` (ditto .app)|
| Windows | `DataTools-<ver>-win-setup.exe` | `DataTools-<ver>-win-portable.zip` |
| Linux | `DataTools-<ver>-linux-x86_64.AppImage`| (the AppImage IS the portable) |
| Platform | Installer |
|----------|----------------------------------------|
| macOS | `DataTools-<ver>-mac.dmg` |
| Windows | `DataTools-<ver>-win-setup.exe` |
| Linux | `DataTools-<ver>-linux-x86_64.AppImage` (already portable) |
All six outputs are self-contained: every dependency (Python, pandas,
All three outputs are self-contained: every dependency (Python, pandas,
streamlit, pdfplumber, **Tesseract OCR + `eng.traineddata`**, the lot)
is frozen into the bundle. The buyer does not need to install Python,
pip, Tesseract, or anything else first. With Tesseract bundled, each
@@ -76,47 +72,44 @@ the resulting installers to a GitHub Release. Manual
## Releasing
### Single-command local build (recommended for one-developer workflow)
### CI build (push tag → GitHub Release) — the release process
PyInstaller can't cross-compile, so a single machine produces one
platform's packages. Run this on each target OS:
```bash
# One-time setup per machine:
pip install -r requirements.txt
pip install pyinstaller pillow
# Windows only: install Inno Setup from https://jrsoftware.org/isdl.php
# Linux only: drop appimagetool onto PATH (see preflight output)
# Build everything for the current OS:
python build/make_release.py
```
Outputs land in `dist/`:
- Windows host → `DataTools-<ver>-win-setup.exe` + `DataTools-<ver>-win-portable.zip`
- macOS host → `DataTools-<ver>-mac.dmg` + `DataTools-<ver>-mac-portable.zip`
- Linux host → `DataTools-<ver>-linux-x86_64.AppImage`
Useful flags:
```bash
python build/make_release.py --preflight # check tooling, build nothing
python build/make_release.py --clean # wipe dist/ first
python build/make_release.py --skip-installer # just the portable zip
python build/make_release.py --skip-portable # just the installer
```
### CI build (push tag → GitHub Release)
If you have CI runners for all three OSes:
Releases are built by GitHub Actions (`.github/workflows/build.yml`),
not on a developer's machine. The matrix runs on
macos-latest / windows-latest / ubuntu-latest, stages Tesseract
(`build/tesseract.py`), runs PyInstaller, packages the per-platform
installer, and attaches it to a GitHub Release on tag push:
1. Bump `__version__` in `src/__init__.py`.
2. `git commit -am "release: vX.Y.Z" && git tag vX.Y.Z`.
3. `git push && git push --tags`.
4. CI builds all three platforms and creates a Release with the
installers + portable zips attached.
installers attached.
5. Mirror the Release assets to Gumroad (manual until v2).
A manual `workflow_dispatch` run does the same build but uploads the
installers as workflow artifacts instead of creating a Release —
useful for smoke-testing a build without cutting a tag.
### Local build (single platform, for testing)
PyInstaller can't cross-compile, so a local build produces only the
current OS's installer. This mirrors what CI does, by hand — use it to
debug the bundle before tagging. See the per-platform recipes below for
the exact commands; the short version is:
```bash
pip install -r requirements.txt
pip install pyinstaller pillow
python build/generate_icons.py
python -c "import sys; sys.path.insert(0,'build'); \
from tesseract import fetch_tessdata, fetch_tesseract_for_platform; \
fetch_tessdata(); fetch_tesseract_for_platform('mac')" # win / mac / linux
pyinstaller build/datatools.spec --clean --noconfirm
# then run the matching packager: build/macos/build_dmg.sh,
# build/installer.iss (iscc), or build/appimage/build.sh
```
## Signing (Phase 2 — needs accounts/credentials)
Both code-signing steps are intentionally not in CI yet because they
@@ -321,17 +314,18 @@ The runtime resolver (in `src/`, owned by the runtime team) walks:
(sourced from [tessdata_best](https://github.com/tesseract-ocr/tessdata_best)).
`datatools.spec` copies it into `tesseract/tessdata/`.
- **Binary** — fetched per-platform at build time by
`build/make_release.py` from pinned upstream URLs. Current pin:
**Tesseract 5.5.0**.
`build/tesseract.py` from pinned upstream URLs. Current pin:
**Tesseract 5.5.0**. CI imports `fetch_tessdata` +
`fetch_tesseract_for_platform` from this module before PyInstaller.
**Updating Tesseract**:
1. Bump the version pin and the per-platform fetch URLs in
`build/make_release.py`.
`build/tesseract.py`.
2. If the model schema changed upstream, refresh
`build/vendor/tessdata/eng.traineddata` from `tessdata_best` at the
matching tag.
3. Rebuild on each platform (`python build/make_release.py`) and
3. Push a `v*` tag so CI rebuilds all three platforms, then
smoke-test a scanned PDF through the PDF Extractor.
4. Update `LICENSE_TESSERACT.txt` at the repo root if upstream license
terms change (Apache-2.0 today).