- NEW LICENSE_TESSERACT.txt at the repo root: header noting it covers
the bundled Tesseract OCR binary (Apache 2.0, upstream
tesseract-ocr/tesseract, copyright Google + contributors) and the
eng.traineddata from tessdata_best (also Apache 2.0). Clarifies
DataTools itself remains proprietary. Full canonical Apache 2.0
license text included.
- README.md + README.es.md (Download section): bumped size estimate
~200 MB → ~300 MB, added a short paragraph stating Tesseract OCR
is bundled (no separate install required), with a link to the new
license file.
- docs/USER-GUIDE.md + docs/USER-GUIDE.es.md (§1.6 System
requirements): bumped disk estimate, added a paragraph stating
Tesseract 5.5 + eng.traineddata ship inside every installer /
portable / AppImage, with a source-install fallback hint pointing
developers to DEVELOPER.md.
- docs/DEVELOPER.md: new "PDF Extractor — bundled Tesseract" section
documenting the runtime layout (sys._MEIPASS / tesseract / …),
discovery order, source of bytes (build/vendor/tessdata + per-
platform fetch in make_release.py), version pin, update recipe.
- docs/TECHNICAL.md: new §3.10 "Bundled Tesseract (PDF Extractor
OCR)" — short version of the discovery order for the build
pipeline section.
- build/README.md: distribution-outputs paragraph now lists
Tesseract among bundled deps with the ~250-300 MB estimate; new
"Tesseract bundling" section: layout diagram, resolver order,
source of bytes + 5.5.0 pin, update steps, license-file ref.
Out-of-scope gaps noted by the docs sweep:
- docs/FUTURE-TOOLS.md §D still describes Tesseract bundling as a
high-risk packaging headache; now superseded. Worth a one-line
"(resolved — bundled as of v1.x)" callout in a future pass.
- USER-GUIDE §2 "What's included" table doesn't list PDF Extractor
at all (it shipped in b8aff86…967d3f6). Separate gap to close.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
One-developer workflow: ``python build/make_release.py`` on each
target OS produces both the installer and a portable .zip for that
platform. Preflight checks PyInstaller / Pillow / iscc / hdiutil /
ditto / appimagetool and bails with install hints if anything is
missing — no half-built dist/.
New scripts:
- build/make_release.py — orchestrator, auto-detects host OS.
- build/generate_icons.py — icon.ico / icon.icns / icon.png from
src/gui/assets/datatools_icon_256.png (Pillow ships ICO + ICNS
writers; no platform tooling needed).
- build/build_portable_zip.py — Win/Linux portable zip via stdlib.
- build/macos/build_zip.sh — Mac portable .app via ditto so
bundle metadata survives.
installer.iss now adds: Quick Launch task (opt-in, legacy Win 7),
App Paths registry entry (Win+R "DataTools" works), SetupIconFile,
UninstallDisplayIcon, AppSupportURL, AppUpdatesURL.
CI workflow uploads installer + portable per platform and attaches
both to GitHub Releases on tag push.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stand up the seamless-download path for non-technical buyers:
* .github/workflows/build.yml — matrix CI (mac/win/linux) that builds
PyInstaller bundles and packages them per platform on tag push,
attaching the resulting installers to a GitHub Release.
* build/installer.iss — Inno Setup script for the Windows installer
(per-user install, optional desktop shortcut, runs on finish).
* build/macos/build_dmg.sh — wraps DataTools.app into a .dmg with a
drag-to-/Applications layout.
* build/appimage/{AppRun,datatools.desktop,build.sh} — AppImage recipe.
* src/__init__.py — single source of truth for __version__; the spec
reads it (was hardcoded), CI passes it through to all packagers.
Buyer download path now lives in the top-level README. Per-build
README documents the Phase 2 step (signing/notarization) that needs
the owner's Apple Developer + Windows code-signing credentials —
those are intentionally not in CI yet because they require setup
outside this repo.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>