End users no longer have to install Tesseract separately for OCR on
scanned PDFs — the engine ships inside the installer, portable .zip,
and AppImage for all three platforms.
Per-platform fetch in build/make_release.py (run before PyInstaller):
- Windows: download UB-Mannheim installer 5.5.0.20241111, extract
with 7-Zip, copy tesseract.exe + required DLLs into the staging dir.
- macOS: ``brew install tesseract``, copy binary + every Homebrew-
prefixed dylib resolved via otool -L (recurse one level for
transitive deps), then install_name_tool rewrites IDs / load paths
to @loader_path/... so the bundle is relocatable.
- Linux: ``apt-get install tesseract-ocr libtesseract5``, copy binary
+ every non-system .so from ldd output, patchelf --set-rpath '$ORIGIN'.
Wire-up:
- build/datatools.spec reads DATATOOLS_TESS_STAGING env var (set by
make_release) and adds the staging dir + tessdata + the
LICENSE_TESSERACT.txt Apache 2.0 attribution to PyInstaller datas
so they land at <bundle>/tesseract/{tesseract[.exe],tessdata/}
and the license sits at the bundle root. Soft-warns when staging
is empty so dev spec runs still complete.
- English tessdata pulled by fetch_tessdata() from
tesseract-ocr/tessdata_best (eng.traineddata, ~16 MB). Cached at
build/vendor/tessdata/.
- .github/workflows/build.yml: actions/cache@v4 step keyed on
``tesseract-${runner.os}-5.5.0-tessdata_best-v1`` caches the
staging dir and the vendored tessdata across runs; apt installs
patchelf on the Linux runner; PyInstaller step now receives the
DATATOOLS_TESS_STAGING env var.
- .gitignore: build/_tesseract/ and the .traineddata blob.
- TESSERACT_SKIP_FETCH=1 honored for offline / manual stages.
- Installer / .dmg / .zip / AppImage scripts: one-line comments
confirming Tesseract rides along automatically via PyInstaller's
datas (no extra packaging steps required in those scripts).
Bundle-size delta: ~50-70 MB on disk per platform, ~25-40 MB post-
compression. Net installer size ~250-300 MB (was ~120 MB) — accepted
tradeoff for zero end-user OCR setup.
Reversal of the prior "don't bundle Tesseract" decision (option A).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
94 lines
4.1 KiB
Plaintext
94 lines
4.1 KiB
Plaintext
; Inno Setup script for DataTools — Windows installer.
|
|
;
|
|
; Compile from the repo root:
|
|
; iscc /DAppVersion=3.0 build\installer.iss
|
|
;
|
|
; CI passes the version via /DAppVersion to keep src/__init__.py the
|
|
; single source of truth. Local manual builds: pass /DAppVersion or
|
|
; let the default kick in.
|
|
;
|
|
; What this installer wires up (covers the "easy launch" surface):
|
|
; * Start Menu group: Start → DataTools → DataTools / Uninstall
|
|
; * Desktop shortcut: optional, checked by default during install
|
|
; * Quick Launch: optional, off by default (legacy Win 7 + power
|
|
; users who keep the bar enabled). Windows 10/11
|
|
; users pin to taskbar manually via right-click —
|
|
; OS security policy forbids programmatic pinning.
|
|
; * App Paths entry: so ``DataTools`` typed into Win+R / cmd works.
|
|
;
|
|
; Self-contained: the installer contains a frozen PyInstaller bundle
|
|
; (Python + every runtime dep). No pre-install or post-install steps
|
|
; on the buyer's machine. UAC is NOT required because we install
|
|
; per-user by default; the prompt only fires if the buyer asks for an
|
|
; all-users install.
|
|
|
|
#ifndef AppVersion
|
|
#define AppVersion "0.0.0-dev"
|
|
#endif
|
|
|
|
[Setup]
|
|
AppId={{D4A07001-DA7A-4001-8001-DA7A70013700}}
|
|
AppName=DataTools
|
|
AppVersion={#AppVersion}
|
|
AppVerName=DataTools {#AppVersion}
|
|
AppPublisher=DataTools
|
|
AppPublisherURL=https://datatools.app
|
|
AppSupportURL=https://datatools.app/support
|
|
AppUpdatesURL=https://datatools.app/releases
|
|
DefaultDirName={autopf}\DataTools
|
|
DefaultGroupName=DataTools
|
|
DisableProgramGroupPage=yes
|
|
OutputDir=..\dist
|
|
OutputBaseFilename=DataTools-{#AppVersion}-win-setup
|
|
SetupIconFile=icon.ico
|
|
UninstallDisplayIcon={app}\DataTools.exe
|
|
Compression=lzma2/max
|
|
SolidCompression=yes
|
|
WizardStyle=modern
|
|
ArchitecturesInstallIn64BitMode=x64
|
|
PrivilegesRequired=lowest
|
|
PrivilegesRequiredOverridesAllowed=dialog
|
|
; Allow per-user install (no UAC prompt) when admin isn't available.
|
|
; Buyers without admin rights can still install without IT involvement.
|
|
|
|
ChangesAssociations=no
|
|
CloseApplications=force
|
|
RestartApplications=no
|
|
|
|
[Languages]
|
|
Name: "english"; MessagesFile: "compiler:Default.isl"
|
|
|
|
[Tasks]
|
|
Name: "desktopicon"; Description: "Create a &desktop shortcut"; GroupDescription: "Additional shortcuts:"
|
|
Name: "quicklaunchicon"; Description: "Create a &Quick Launch shortcut"; GroupDescription: "Additional shortcuts:"; Flags: unchecked; OnlyBelowVersion: 6.1
|
|
|
|
[Files]
|
|
; PyInstaller's dist/DataTools/ tree includes:
|
|
; * DataTools.exe + frozen Python runtime
|
|
; * tesseract/tesseract.exe + DLLs + tessdata/eng.traineddata
|
|
; (bundled via build/datatools.spec datas; runtime discovery in
|
|
; src/pdf_extract.py reads sys._MEIPASS / "tesseract" / ...).
|
|
; * LICENSE_TESSERACT.txt at the bundle root (Apache-2.0).
|
|
; The recursesubdirs flag below picks all of those up — no separate
|
|
; Files: entry needed for tesseract/.
|
|
Source: "..\dist\DataTools\*"; DestDir: "{app}"; Flags: recursesubdirs ignoreversion
|
|
|
|
[Icons]
|
|
; Start Menu entries — created unconditionally so the app is always
|
|
; discoverable via Start search.
|
|
Name: "{group}\DataTools"; Filename: "{app}\DataTools.exe"; IconFilename: "{app}\DataTools.exe"
|
|
Name: "{group}\Uninstall DataTools"; Filename: "{uninstallexe}"
|
|
; Desktop shortcut — opt-in via the Tasks page.
|
|
Name: "{autodesktop}\DataTools"; Filename: "{app}\DataTools.exe"; IconFilename: "{app}\DataTools.exe"; Tasks: desktopicon
|
|
; Quick Launch (legacy) — only relevant on Win 7 and older.
|
|
Name: "{userappdata}\Microsoft\Internet Explorer\Quick Launch\DataTools"; Filename: "{app}\DataTools.exe"; IconFilename: "{app}\DataTools.exe"; Tasks: quicklaunchicon
|
|
|
|
[Registry]
|
|
; App Paths — lets the buyer launch from Win+R or cmd with just
|
|
; "DataTools" instead of a full path. Per-user hive so the per-user
|
|
; install path doesn't need admin to register.
|
|
Root: HKCU; Subkey: "Software\Microsoft\Windows\CurrentVersion\App Paths\DataTools.exe"; ValueType: string; ValueName: ""; ValueData: "{app}\DataTools.exe"; Flags: uninsdeletekey
|
|
|
|
[Run]
|
|
Filename: "{app}\DataTools.exe"; Description: "Launch DataTools"; Flags: nowait postinstall skipifsilent
|