User-facing docs (USER-GUIDE en+es, README en+es):
- New short paragraph under §3.1 GUI noting the in-tool Help button
on every detail page, what it contains (When to use / Steps /
Examples / Tip), and that content lives in tools.<id>.help_md.
- One-line note in the README tool tables pointing at the same.
- Mention the sidebar +/- nav indicators replacing Streamlit's
default Material Symbols chevron.
Developer docs:
- DEVELOPER: new "Tool page header" subsection documenting
render_tool_header(tool_id), the help_md markdown skeleton, and
the fallback to help.missing_body when a tool's help is absent.
Update i18n authoring rules to list help.* keys and the per-tool
help_md field alongside name/description/page_title/page_caption.
- TECHNICAL: new §10c documenting the sidebar nav indicator swap —
CSS in _HIDE_CHROME_CSS plus _SWAP_NAV_SECTION_INDICATOR_JS
injected through the hide_streamlit_chrome() iframe bundle.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Repo READMEs now show both download flavors side-by-side with
first-launch warnings (SmartScreen, Gatekeeper) and link to the
deeper walkthrough.
USER-GUIDE §1 rewritten from a 9-line stub into six subsections:
- §1.1 Windows: installer (5 steps) + portable (4 steps)
- §1.2 macOS: DMG (5 steps incl. right-click-Open) + portable
- §1.3 Linux: AppImage flow (unchanged)
- §1.4 First-launch: port selection, localhost binding, browser open
- §1.5 How the GUI works
- §1.6 System requirements
§6 Troubleshooting picks up portable-specific items: Safari unzip
quirks, antivirus quarantine on Win portable, license file location.
docs/README and Spanish mirrors updated to match.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds README.es.md, docs/README.es.md, docs/USER-GUIDE.es.md, and
docs/CLI-REFERENCE.es.md mirroring the English client-facing set.
Each English doc gains a one-line language-switch banner pointing at
its Spanish counterpart; the docs index advertises both language sets
in the buyer-facing section. Internal docs (TECHNICAL, DECISIONS,
REQUIREMENTS, BUSINESS, RECOVERY) stay English-only by design — they
don't ship with the product.
The CLI itself emits English only, so CLI-REFERENCE.es.md notes that
flags and values are language-invariant while translating the prose.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
README + USER-GUIDE describe the sidebar picker and current coverage
(home + shared chrome, per-tool bodies pending). DEVELOPER gains a
how-to for adding packs and keys with the parity-test guarantee.
TECHNICAL §10b records the in-house-JSON architecture and locks in the
no-gettext decision (also logged in DECISIONS). REQUIREMENTS reflects
the new interface surface and updated test count. COPY.md adds a
"Language claim" slot so landing/email work can pick it up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stand up the seamless-download path for non-technical buyers:
* .github/workflows/build.yml — matrix CI (mac/win/linux) that builds
PyInstaller bundles and packages them per platform on tag push,
attaching the resulting installers to a GitHub Release.
* build/installer.iss — Inno Setup script for the Windows installer
(per-user install, optional desktop shortcut, runs on finish).
* build/macos/build_dmg.sh — wraps DataTools.app into a .dmg with a
drag-to-/Applications layout.
* build/appimage/{AppRun,datatools.desktop,build.sh} — AppImage recipe.
* src/__init__.py — single source of truth for __version__; the spec
reads it (was hardcoded), CI passes it through to all packagers.
Buyer download path now lives in the top-level README. Per-build
README documents the Phase 2 step (signing/notarization) that needs
the owner's Apple Developer + Windows code-signing credentials —
those are intentionally not in CI yet because they require setup
outside this repo.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New docs/REQUIREMENTS.md catalogs every shipped capability in 17 numbered
categories — file handling, input/output encodings, delimiters, line
endings, detectors, finding schema, confidence tiers, decisions,
performance targets (1 GB), tools, gate behavior, interfaces, platforms,
deps, test coverage, privacy. Linked from README and USER-GUIDE so a
buyer / integrator can scan compliance in under a minute.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a Review & Normalize page that sits between upload and every tool
page. The analyzer now tags each finding with confidence (high/medium/low)
and a fix_action; the gate auto-applies high-confidence fixes, surfaces
medium/low ones for user review, and blocks tool pages on error-level
findings until resolved or waived.
Core (src/core/):
- analyze.py: Finding gains confidence, fix_action, pre_applied; new
detectors for encoding_uncertain, encoding_decode_failed; new top-
level encoding_override parameter.
- fixes.py: registry of fix algorithms keyed by fix_action id.
- normalize.py: auto_fix(), apply_decisions(), is_normalized(), and
the NormalizationResult / Decision dataclasses the gate consumes.
- io.py: detect_encoding tries strict UTF-8 first; repair_bytes now
transcodes UTF-16/32 to UTF-8 before NUL-strip (fixes UTF-16 corruption)
and normalizes line endings (fixes bare-CR parser crash); empty file
handled gracefully instead of EmptyDataError traceback.
GUI (src/gui/):
- pages/0_Review.py: gate page with per-finding decision controls,
encoding override picker (16 codepages + custom), and Advanced output
options (encoding, delimiter, line terminator) on the download.
- components.py: require_normalization_gate() helper.
- pages/1-9: gate guard wired on every tool page.
Test corpora:
- test-cases/encodings-corpus/: 31 encoded CSV fixtures + 9 reference
UTF-8 files + manifest, synced from Business/DataTools.
- test-cases/text-cleaner-corpus/test_data/17: synced malformed input
(unquoted $1,500.00) for the unquoted-delimiter detector.
Tests (94 new):
- test_normalize.py (48): finding fields, fix registry, auto_fix scope,
decision paths, gate idempotency, output-options helper.
- test_encodings_corpus.py (90, 16 xfailed): parametric detection +
decode + analyzer-no-crash sweep against the manifest.
- test_analyze.py: encoding override + encoding_uncertain detectors.
- test_corpus.py: pre-parse repair in the strict reader.
run_tests.py: new aliases --tool normalize, --tool encodings, --tool gate;
encodings corpus added to --fixtures category.
Docs: USER-GUIDE §3.3 covers the gate workflow, encoding override, and
output options; TECHNICAL §10.2.1-10.2.4 documents the analyzer schema,
gate API, Review page, and pre-parse repair pipeline; CLI-REFERENCE adds
the analyzer JSON schema with the new fields; README links to all of it.
Suite: 765 passed, 17 xfailed (was 458 passed).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Update README, CLI reference, and developer guide to cover delimiter
selector, inline checkboxes/dropdowns, live surviving rows preview,
multi-row survivors, and apply_review_decisions(). Remove dead link.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rewrite README.md with project overview, quick-start, and CLI summary
- Add docs/CLI-REFERENCE.md with full flag reference and 8 recipe sections
- Add docs/DEVELOPER.md with architecture, data flow, and extension guides
- Rewrite src/core/__init__.py with public API exports and module docstring
- Add Streamlit GUI (src/gui/) with file upload, advanced options, interactive
match group review with side-by-side diff, and download buttons
- Add .gitignore, requirements.txt, all source code, tests, and sample data
- Add streamlit to requirements.txt
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>