build: drop the local Python release method, return to CI-only installer builds
Removes the single-command Python packaging method (build/make_release.py + build/build_portable_zip.py + build/macos/build_zip.sh) and the portable .zip artifacts it produced. Release builds go back to the original GitHub Actions process: the CI matrix builds one installer per platform (.dmg / .exe / .AppImage) on tag push and attaches them to a GitHub Release. Tesseract OCR bundling is preserved: the fetch helpers the workflow depends on (fetch_tessdata, fetch_tesseract_for_platform) are extracted into a standalone build/tesseract.py, which build.yml now imports. Docs (README, build/README, DEVELOPER, TECHNICAL, USER-GUIDE, vendor README, es translations) updated to drop the portable-zip flavor and point at the new module. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -298,7 +298,7 @@ All `DataToolsError` subclasses extend stdlib `ValueError` or `OSError` so exist
|
||||
|
||||
## PDF Extractor — bundled Tesseract
|
||||
|
||||
Frozen builds (installer / portable .zip / AppImage) ship Tesseract OCR inside the bundle so scanned PDFs work without a separate system install. Source / `pip` developer environments still resolve Tesseract from `PATH`.
|
||||
Frozen builds (installer / AppImage) ship Tesseract OCR inside the bundle so scanned PDFs work without a separate system install. Source / `pip` developer environments still resolve Tesseract from `PATH`.
|
||||
|
||||
**Runtime layout (frozen bundles)**:
|
||||
|
||||
@@ -318,13 +318,13 @@ Frozen builds (installer / portable .zip / AppImage) ship Tesseract OCR inside t
|
||||
**Where the bytes come from**:
|
||||
|
||||
- **Tessdata** is vendored at `build/vendor/tessdata/eng.traineddata` — the "best" English model from [tessdata_best](https://github.com/tesseract-ocr/tessdata_best). PyInstaller's spec copies it into `tesseract/tessdata/` inside the bundle.
|
||||
- **Tesseract binary** is fetched at build time by `build/make_release.py` — per-platform download URLs are pinned in that script. The current pin is **Tesseract 5.5.0**.
|
||||
- **Tesseract binary** is fetched at build time by `build/tesseract.py` — per-platform download URLs are pinned in that module. The current pin is **Tesseract 5.5.0**. CI (`.github/workflows/build.yml`) imports `fetch_tessdata` + `fetch_tesseract_for_platform` and runs them before PyInstaller.
|
||||
|
||||
**To update Tesseract**:
|
||||
|
||||
1. Bump the version pin + the per-platform fetch URLs in `build/make_release.py`.
|
||||
1. Bump the version pin + the per-platform fetch URLs in `build/tesseract.py`.
|
||||
2. If upstream changed the `eng.traineddata` schema, refresh `build/vendor/tessdata/eng.traineddata` from `tessdata_best` at the matching tag.
|
||||
3. Rebuild on each platform (`python build/make_release.py`) and smoke-test a scanned-PDF run through the PDF Extractor before tagging the release.
|
||||
3. Push a `v*` tag so CI rebuilds all three platforms, then smoke-test a scanned-PDF run through the PDF Extractor before publishing the release.
|
||||
4. Update `LICENSE_TESSERACT.txt` at the repo root if the upstream license terms change (Tesseract is Apache-2.0 today).
|
||||
|
||||
## Tests
|
||||
|
||||
Reference in New Issue
Block a user