Files
datatools-dev/build/installer.iss
Michael 93ccada974 build: bundle Tesseract 5.5.0 + tessdata into every release artifact
End users no longer have to install Tesseract separately for OCR on
scanned PDFs — the engine ships inside the installer, portable .zip,
and AppImage for all three platforms.

Per-platform fetch in build/make_release.py (run before PyInstaller):
- Windows: download UB-Mannheim installer 5.5.0.20241111, extract
  with 7-Zip, copy tesseract.exe + required DLLs into the staging dir.
- macOS: ``brew install tesseract``, copy binary + every Homebrew-
  prefixed dylib resolved via otool -L (recurse one level for
  transitive deps), then install_name_tool rewrites IDs / load paths
  to @loader_path/... so the bundle is relocatable.
- Linux: ``apt-get install tesseract-ocr libtesseract5``, copy binary
  + every non-system .so from ldd output, patchelf --set-rpath '$ORIGIN'.

Wire-up:
- build/datatools.spec reads DATATOOLS_TESS_STAGING env var (set by
  make_release) and adds the staging dir + tessdata + the
  LICENSE_TESSERACT.txt Apache 2.0 attribution to PyInstaller datas
  so they land at <bundle>/tesseract/{tesseract[.exe],tessdata/}
  and the license sits at the bundle root. Soft-warns when staging
  is empty so dev spec runs still complete.
- English tessdata pulled by fetch_tessdata() from
  tesseract-ocr/tessdata_best (eng.traineddata, ~16 MB). Cached at
  build/vendor/tessdata/.
- .github/workflows/build.yml: actions/cache@v4 step keyed on
  ``tesseract-${runner.os}-5.5.0-tessdata_best-v1`` caches the
  staging dir and the vendored tessdata across runs; apt installs
  patchelf on the Linux runner; PyInstaller step now receives the
  DATATOOLS_TESS_STAGING env var.
- .gitignore: build/_tesseract/ and the .traineddata blob.
- TESSERACT_SKIP_FETCH=1 honored for offline / manual stages.
- Installer / .dmg / .zip / AppImage scripts: one-line comments
  confirming Tesseract rides along automatically via PyInstaller's
  datas (no extra packaging steps required in those scripts).

Bundle-size delta: ~50-70 MB on disk per platform, ~25-40 MB post-
compression. Net installer size ~250-300 MB (was ~120 MB) — accepted
tradeoff for zero end-user OCR setup.

Reversal of the prior "don't bundle Tesseract" decision (option A).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-02 18:20:33 +00:00

94 lines
4.1 KiB
Plaintext

; Inno Setup script for DataTools — Windows installer.
;
; Compile from the repo root:
; iscc /DAppVersion=3.0 build\installer.iss
;
; CI passes the version via /DAppVersion to keep src/__init__.py the
; single source of truth. Local manual builds: pass /DAppVersion or
; let the default kick in.
;
; What this installer wires up (covers the "easy launch" surface):
; * Start Menu group: Start → DataTools → DataTools / Uninstall
; * Desktop shortcut: optional, checked by default during install
; * Quick Launch: optional, off by default (legacy Win 7 + power
; users who keep the bar enabled). Windows 10/11
; users pin to taskbar manually via right-click —
; OS security policy forbids programmatic pinning.
; * App Paths entry: so ``DataTools`` typed into Win+R / cmd works.
;
; Self-contained: the installer contains a frozen PyInstaller bundle
; (Python + every runtime dep). No pre-install or post-install steps
; on the buyer's machine. UAC is NOT required because we install
; per-user by default; the prompt only fires if the buyer asks for an
; all-users install.
#ifndef AppVersion
#define AppVersion "0.0.0-dev"
#endif
[Setup]
AppId={{D4A07001-DA7A-4001-8001-DA7A70013700}}
AppName=DataTools
AppVersion={#AppVersion}
AppVerName=DataTools {#AppVersion}
AppPublisher=DataTools
AppPublisherURL=https://datatools.app
AppSupportURL=https://datatools.app/support
AppUpdatesURL=https://datatools.app/releases
DefaultDirName={autopf}\DataTools
DefaultGroupName=DataTools
DisableProgramGroupPage=yes
OutputDir=..\dist
OutputBaseFilename=DataTools-{#AppVersion}-win-setup
SetupIconFile=icon.ico
UninstallDisplayIcon={app}\DataTools.exe
Compression=lzma2/max
SolidCompression=yes
WizardStyle=modern
ArchitecturesInstallIn64BitMode=x64
PrivilegesRequired=lowest
PrivilegesRequiredOverridesAllowed=dialog
; Allow per-user install (no UAC prompt) when admin isn't available.
; Buyers without admin rights can still install without IT involvement.
ChangesAssociations=no
CloseApplications=force
RestartApplications=no
[Languages]
Name: "english"; MessagesFile: "compiler:Default.isl"
[Tasks]
Name: "desktopicon"; Description: "Create a &desktop shortcut"; GroupDescription: "Additional shortcuts:"
Name: "quicklaunchicon"; Description: "Create a &Quick Launch shortcut"; GroupDescription: "Additional shortcuts:"; Flags: unchecked; OnlyBelowVersion: 6.1
[Files]
; PyInstaller's dist/DataTools/ tree includes:
; * DataTools.exe + frozen Python runtime
; * tesseract/tesseract.exe + DLLs + tessdata/eng.traineddata
; (bundled via build/datatools.spec datas; runtime discovery in
; src/pdf_extract.py reads sys._MEIPASS / "tesseract" / ...).
; * LICENSE_TESSERACT.txt at the bundle root (Apache-2.0).
; The recursesubdirs flag below picks all of those up — no separate
; Files: entry needed for tesseract/.
Source: "..\dist\DataTools\*"; DestDir: "{app}"; Flags: recursesubdirs ignoreversion
[Icons]
; Start Menu entries — created unconditionally so the app is always
; discoverable via Start search.
Name: "{group}\DataTools"; Filename: "{app}\DataTools.exe"; IconFilename: "{app}\DataTools.exe"
Name: "{group}\Uninstall DataTools"; Filename: "{uninstallexe}"
; Desktop shortcut — opt-in via the Tasks page.
Name: "{autodesktop}\DataTools"; Filename: "{app}\DataTools.exe"; IconFilename: "{app}\DataTools.exe"; Tasks: desktopicon
; Quick Launch (legacy) — only relevant on Win 7 and older.
Name: "{userappdata}\Microsoft\Internet Explorer\Quick Launch\DataTools"; Filename: "{app}\DataTools.exe"; IconFilename: "{app}\DataTools.exe"; Tasks: quicklaunchicon
[Registry]
; App Paths — lets the buyer launch from Win+R or cmd with just
; "DataTools" instead of a full path. Per-user hive so the per-user
; install path doesn't need admin to register.
Root: HKCU; Subkey: "Software\Microsoft\Windows\CurrentVersion\App Paths\DataTools.exe"; ValueType: string; ValueName: ""; ValueData: "{app}\DataTools.exe"; Flags: uninsdeletekey
[Run]
Filename: "{app}\DataTools.exe"; Description: "Launch DataTools"; Flags: nowait postinstall skipifsilent