build: drop the local Python release method, return to CI-only installer builds
Removes the single-command Python packaging method (build/make_release.py + build/build_portable_zip.py + build/macos/build_zip.sh) and the portable .zip artifacts it produced. Release builds go back to the original GitHub Actions process: the CI matrix builds one installer per platform (.dmg / .exe / .AppImage) on tag push and attaches them to a GitHub Release. Tesseract OCR bundling is preserved: the fetch helpers the workflow depends on (fetch_tessdata, fetch_tesseract_for_platform) are extracted into a standalone build/tesseract.py, which build.yml now imports. Docs (README, build/README, DEVELOPER, TECHNICAL, USER-GUIDE, vendor README, es translations) updated to drop the portable-zip flavor and point at the new module. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
100
build/README.md
100
build/README.md
@@ -23,14 +23,12 @@ build/
|
||||
├── generate_icons.py Builds icon.ico / icon.icns / icon.png from
|
||||
│ src/gui/assets/datatools_icon_256.png. Run
|
||||
│ once before pyinstaller (CI does this).
|
||||
├── build_portable_zip.py Cross-platform: zips dist/DataTools/ into a
|
||||
│ no-install portable download. Used by the
|
||||
│ Windows + Linux portable artifacts.
|
||||
├── tesseract.py Fetches the per-platform Tesseract binary +
|
||||
│ eng.traineddata at build time. CI imports
|
||||
│ fetch_tessdata + fetch_tesseract_for_platform.
|
||||
├── macos/
|
||||
│ ├── build_dmg.sh Wraps dist/DataTools.app into a .dmg with a
|
||||
│ │ drag-to-/Applications layout (installer).
|
||||
│ └── build_zip.sh Wraps dist/DataTools.app into a portable
|
||||
│ .zip via ditto (preserves bundle metadata).
|
||||
│ └── build_dmg.sh Wraps dist/DataTools.app into a .dmg with a
|
||||
│ drag-to-/Applications layout (installer).
|
||||
├── appimage/
|
||||
│ ├── AppRun Entry point invoked when the AppImage runs.
|
||||
│ ├── datatools.desktop Linux desktop-entry metadata.
|
||||
@@ -43,17 +41,15 @@ build/
|
||||
|
||||
## Distribution outputs per platform
|
||||
|
||||
Each CI run produces two downloads per platform — an installer for
|
||||
buyers who want shortcuts wired automatically, and a portable .zip
|
||||
for buyers (or IT-locked-down machines) that can't run installers:
|
||||
Each CI run produces one installer per platform:
|
||||
|
||||
| Platform | Installer | Portable |
|
||||
|----------|----------------------------------------|------------------------------------------------|
|
||||
| macOS | `DataTools-<ver>-mac.dmg` | `DataTools-<ver>-mac-portable.zip` (ditto .app)|
|
||||
| Windows | `DataTools-<ver>-win-setup.exe` | `DataTools-<ver>-win-portable.zip` |
|
||||
| Linux | `DataTools-<ver>-linux-x86_64.AppImage`| (the AppImage IS the portable) |
|
||||
| Platform | Installer |
|
||||
|----------|----------------------------------------|
|
||||
| macOS | `DataTools-<ver>-mac.dmg` |
|
||||
| Windows | `DataTools-<ver>-win-setup.exe` |
|
||||
| Linux | `DataTools-<ver>-linux-x86_64.AppImage` (already portable) |
|
||||
|
||||
All six outputs are self-contained: every dependency (Python, pandas,
|
||||
All three outputs are self-contained: every dependency (Python, pandas,
|
||||
streamlit, pdfplumber, **Tesseract OCR + `eng.traineddata`**, the lot)
|
||||
is frozen into the bundle. The buyer does not need to install Python,
|
||||
pip, Tesseract, or anything else first. With Tesseract bundled, each
|
||||
@@ -76,47 +72,44 @@ the resulting installers to a GitHub Release. Manual
|
||||
|
||||
## Releasing
|
||||
|
||||
### Single-command local build (recommended for one-developer workflow)
|
||||
### CI build (push tag → GitHub Release) — the release process
|
||||
|
||||
PyInstaller can't cross-compile, so a single machine produces one
|
||||
platform's packages. Run this on each target OS:
|
||||
|
||||
```bash
|
||||
# One-time setup per machine:
|
||||
pip install -r requirements.txt
|
||||
pip install pyinstaller pillow
|
||||
# Windows only: install Inno Setup from https://jrsoftware.org/isdl.php
|
||||
# Linux only: drop appimagetool onto PATH (see preflight output)
|
||||
|
||||
# Build everything for the current OS:
|
||||
python build/make_release.py
|
||||
```
|
||||
|
||||
Outputs land in `dist/`:
|
||||
- Windows host → `DataTools-<ver>-win-setup.exe` + `DataTools-<ver>-win-portable.zip`
|
||||
- macOS host → `DataTools-<ver>-mac.dmg` + `DataTools-<ver>-mac-portable.zip`
|
||||
- Linux host → `DataTools-<ver>-linux-x86_64.AppImage`
|
||||
|
||||
Useful flags:
|
||||
|
||||
```bash
|
||||
python build/make_release.py --preflight # check tooling, build nothing
|
||||
python build/make_release.py --clean # wipe dist/ first
|
||||
python build/make_release.py --skip-installer # just the portable zip
|
||||
python build/make_release.py --skip-portable # just the installer
|
||||
```
|
||||
|
||||
### CI build (push tag → GitHub Release)
|
||||
|
||||
If you have CI runners for all three OSes:
|
||||
Releases are built by GitHub Actions (`.github/workflows/build.yml`),
|
||||
not on a developer's machine. The matrix runs on
|
||||
macos-latest / windows-latest / ubuntu-latest, stages Tesseract
|
||||
(`build/tesseract.py`), runs PyInstaller, packages the per-platform
|
||||
installer, and attaches it to a GitHub Release on tag push:
|
||||
|
||||
1. Bump `__version__` in `src/__init__.py`.
|
||||
2. `git commit -am "release: vX.Y.Z" && git tag vX.Y.Z`.
|
||||
3. `git push && git push --tags`.
|
||||
4. CI builds all three platforms and creates a Release with the
|
||||
installers + portable zips attached.
|
||||
installers attached.
|
||||
5. Mirror the Release assets to Gumroad (manual until v2).
|
||||
|
||||
A manual `workflow_dispatch` run does the same build but uploads the
|
||||
installers as workflow artifacts instead of creating a Release —
|
||||
useful for smoke-testing a build without cutting a tag.
|
||||
|
||||
### Local build (single platform, for testing)
|
||||
|
||||
PyInstaller can't cross-compile, so a local build produces only the
|
||||
current OS's installer. This mirrors what CI does, by hand — use it to
|
||||
debug the bundle before tagging. See the per-platform recipes below for
|
||||
the exact commands; the short version is:
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
pip install pyinstaller pillow
|
||||
python build/generate_icons.py
|
||||
python -c "import sys; sys.path.insert(0,'build'); \
|
||||
from tesseract import fetch_tessdata, fetch_tesseract_for_platform; \
|
||||
fetch_tessdata(); fetch_tesseract_for_platform('mac')" # win / mac / linux
|
||||
pyinstaller build/datatools.spec --clean --noconfirm
|
||||
# then run the matching packager: build/macos/build_dmg.sh,
|
||||
# build/installer.iss (iscc), or build/appimage/build.sh
|
||||
```
|
||||
|
||||
## Signing (Phase 2 — needs accounts/credentials)
|
||||
|
||||
Both code-signing steps are intentionally not in CI yet because they
|
||||
@@ -321,17 +314,18 @@ The runtime resolver (in `src/`, owned by the runtime team) walks:
|
||||
(sourced from [tessdata_best](https://github.com/tesseract-ocr/tessdata_best)).
|
||||
`datatools.spec` copies it into `tesseract/tessdata/`.
|
||||
- **Binary** — fetched per-platform at build time by
|
||||
`build/make_release.py` from pinned upstream URLs. Current pin:
|
||||
**Tesseract 5.5.0**.
|
||||
`build/tesseract.py` from pinned upstream URLs. Current pin:
|
||||
**Tesseract 5.5.0**. CI imports `fetch_tessdata` +
|
||||
`fetch_tesseract_for_platform` from this module before PyInstaller.
|
||||
|
||||
**Updating Tesseract**:
|
||||
|
||||
1. Bump the version pin and the per-platform fetch URLs in
|
||||
`build/make_release.py`.
|
||||
`build/tesseract.py`.
|
||||
2. If the model schema changed upstream, refresh
|
||||
`build/vendor/tessdata/eng.traineddata` from `tessdata_best` at the
|
||||
matching tag.
|
||||
3. Rebuild on each platform (`python build/make_release.py`) and
|
||||
3. Push a `v*` tag so CI rebuilds all three platforms, then
|
||||
smoke-test a scanned PDF through the PDF Extractor.
|
||||
4. Update `LICENSE_TESSERACT.txt` at the repo root if upstream license
|
||||
terms change (Apache-2.0 today).
|
||||
|
||||
Reference in New Issue
Block a user