Removes the single-command Python packaging method (build/make_release.py
+ build/build_portable_zip.py + build/macos/build_zip.sh) and the portable
.zip artifacts it produced. Release builds go back to the original GitHub
Actions process: the CI matrix builds one installer per platform (.dmg /
.exe / .AppImage) on tag push and attaches them to a GitHub Release.
Tesseract OCR bundling is preserved: the fetch helpers the workflow depends
on (fetch_tessdata, fetch_tesseract_for_platform) are extracted into a
standalone build/tesseract.py, which build.yml now imports.
Docs (README, build/README, DEVELOPER, TECHNICAL, USER-GUIDE, vendor README,
es translations) updated to drop the portable-zip flavor and point at the
new module.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- NEW LICENSE_TESSERACT.txt at the repo root: header noting it covers
the bundled Tesseract OCR binary (Apache 2.0, upstream
tesseract-ocr/tesseract, copyright Google + contributors) and the
eng.traineddata from tessdata_best (also Apache 2.0). Clarifies
DataTools itself remains proprietary. Full canonical Apache 2.0
license text included.
- README.md + README.es.md (Download section): bumped size estimate
~200 MB → ~300 MB, added a short paragraph stating Tesseract OCR
is bundled (no separate install required), with a link to the new
license file.
- docs/USER-GUIDE.md + docs/USER-GUIDE.es.md (§1.6 System
requirements): bumped disk estimate, added a paragraph stating
Tesseract 5.5 + eng.traineddata ship inside every installer /
portable / AppImage, with a source-install fallback hint pointing
developers to DEVELOPER.md.
- docs/DEVELOPER.md: new "PDF Extractor — bundled Tesseract" section
documenting the runtime layout (sys._MEIPASS / tesseract / …),
discovery order, source of bytes (build/vendor/tessdata + per-
platform fetch in make_release.py), version pin, update recipe.
- docs/TECHNICAL.md: new §3.10 "Bundled Tesseract (PDF Extractor
OCR)" — short version of the discovery order for the build
pipeline section.
- build/README.md: distribution-outputs paragraph now lists
Tesseract among bundled deps with the ~250-300 MB estimate; new
"Tesseract bundling" section: layout diagram, resolver order,
source of bytes + 5.5.0 pin, update steps, license-file ref.
Out-of-scope gaps noted by the docs sweep:
- docs/FUTURE-TOOLS.md §D still describes Tesseract bundling as a
high-risk packaging headache; now superseded. Worth a one-line
"(resolved — bundled as of v1.x)" callout in a future pass.
- USER-GUIDE §2 "What's included" table doesn't list PDF Extractor
at all (it shipped in b8aff86…967d3f6). Separate gap to close.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User-facing docs (USER-GUIDE en+es, README en+es):
- New short paragraph under §3.1 GUI noting the in-tool Help button
on every detail page, what it contains (When to use / Steps /
Examples / Tip), and that content lives in tools.<id>.help_md.
- One-line note in the README tool tables pointing at the same.
- Mention the sidebar +/- nav indicators replacing Streamlit's
default Material Symbols chevron.
Developer docs:
- DEVELOPER: new "Tool page header" subsection documenting
render_tool_header(tool_id), the help_md markdown skeleton, and
the fallback to help.missing_body when a tool's help is absent.
Update i18n authoring rules to list help.* keys and the per-tool
help_md field alongside name/description/page_title/page_caption.
- TECHNICAL: new §10c documenting the sidebar nav indicator swap —
CSS in _HIDE_CHROME_CSS plus _SWAP_NAV_SECTION_INDICATOR_JS
injected through the hide_streamlit_chrome() iframe bundle.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Repo READMEs now show both download flavors side-by-side with
first-launch warnings (SmartScreen, Gatekeeper) and link to the
deeper walkthrough.
USER-GUIDE §1 rewritten from a 9-line stub into six subsections:
- §1.1 Windows: installer (5 steps) + portable (4 steps)
- §1.2 macOS: DMG (5 steps incl. right-click-Open) + portable
- §1.3 Linux: AppImage flow (unchanged)
- §1.4 First-launch: port selection, localhost binding, browser open
- §1.5 How the GUI works
- §1.6 System requirements
§6 Troubleshooting picks up portable-specific items: Safari unzip
quirks, antivirus quarantine on Win portable, license file location.
docs/README and Spanish mirrors updated to match.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds README.es.md, docs/README.es.md, docs/USER-GUIDE.es.md, and
docs/CLI-REFERENCE.es.md mirroring the English client-facing set.
Each English doc gains a one-line language-switch banner pointing at
its Spanish counterpart; the docs index advertises both language sets
in the buyer-facing section. Internal docs (TECHNICAL, DECISIONS,
REQUIREMENTS, BUSINESS, RECOVERY) stay English-only by design — they
don't ship with the product.
The CLI itself emits English only, so CLI-REFERENCE.es.md notes that
flags and values are language-invariant while translating the prose.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>