User tried ``brew install tesseract`` in PowerShell after seeing
all three OSes listed inline in the OCR banner — easy mistake
when the install commands are crammed on one line with ``·``
separators. Two changes pre-empt this:
**OS-aware OCR banner.** The expander now detects the user's
platform via ``platform.system()`` and shows only the relevant
install instructions:
- **Windows**: UB-Mannheim installer link, numbered steps,
explicit "keep the Add to PATH checkbox on" callout, plus a
fallback paragraph telling the user how to set
``DATATOOLS_TESSERACT_PATH`` if they already installed
without PATH and don't want to reinstall.
- **macOS**: ``brew install tesseract`` with a Homebrew link.
- **Linux**: ``apt install tesseract-ocr`` with a "or your
distro's equivalent" hedge.
**Robust binary discovery in ``ocr_available()``.** Three-stage:
1. Honor ``DATATOOLS_TESSERACT_PATH`` env var if set — explicit
override for portable installs or non-default locations.
2. Try ``pytesseract``'s default PATH-based lookup.
3. If PATH lookup fails, probe known Windows install paths
(``C:\Program Files\Tesseract-OCR\tesseract.exe``,
the x86 variant, and ``%LOCALAPPDATA%\Programs\Tesseract-OCR\``)
via the new ``_autodetect_tesseract_path``. On hit, set
``pytesseract.pytesseract.tesseract_cmd`` so all subsequent
``image_to_data`` calls use the same binary without
re-discovering.
This means a user who runs the UB-Mannheim installer with
default options but forgets the PATH checkbox will still get
OCR working after a launcher restart, without env-var
gymnastics.
Tests (4 new, 85 total in the suite):
- Auto-detect returns None on non-Windows (no false positives
on dev laptops).
- Auto-detect finds the binary at a mocked
``C:\Program Files\Tesseract-OCR\tesseract.exe``.
- Auto-detect returns None when no candidate exists.
- ``DATATOOLS_TESSERACT_PATH`` env var beats both PATH lookup
and auto-detect (sets ``tesseract_cmd`` even when the path
doesn't resolve, so a real binary at a custom location works).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>