Files
datatools-dev/docs/RECOVERY.md
Michael db5ec084da docs+code: rename tool labels everywhere
Sweep follow-up to 93e43fc. Display labels now consistent across docs,
landing pages, CLI output, code comments, docstrings, and test prose.
Five parallel surfaces touched:

- docs (EN + ES): README, USER-GUIDE, CLI-REFERENCE, and 11 internal
  design/planning docs
- landing pages: index + bookkeeper/revops/shopify-pet
- src: CLI module docstrings, _TOOL_DISPLAY dicts in cli_analyze.py
  and gui/components/_legacy.py, core module headers, every tool
  page's module docstring
- tests: class/method/module docstrings and section-header comments
- test-cases READMEs

Page slugs (1_Deduplicator etc.), tool_id strings (01_deduplicator
etc.), Python class names (TestDeduplicatorWorkflow, FeatureFlag.*),
URL paths, anchor IDs, CSS classes, and asset filenames were left
intact since they're code identifiers / structural references.

All 2033 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 19:50:09 +00:00

6.4 KiB

Recovery

Creator-only. Full project rebuild guide. Version: 1.6 · Updated: 2026-05-01

If lost, this doc + the source ZIP rebuilds the project 100%.

1. Project layout

project-root/
├── README.md
├── docs/
│   ├── BUSINESS.md          # creator-only
│   ├── TECHNICAL.md         # creator-only
│   ├── DECISIONS.md         # creator-only — locked criteria + decision log
│   ├── DEVELOPER.md         # creator-only
│   ├── RECOVERY.md          # creator-only (this file)
│   ├── REQUIREMENTS.md
│   ├── USER-GUIDE.md        # ships to buyers
│   └── CLI-REFERENCE.md
├── src/
│   ├── core/                # shared logic — both CLI + GUI call into this
│   ├── cli.py               # Find Duplicates CLI
│   ├── cli_text_clean.py    # Clean Text CLI
│   ├── cli_analyze.py       # Analyzer CLI
│   └── gui/
│       ├── app.py           # Streamlit entry
│       ├── pages/           # one page per tool
│       └── components/      # shared widgets
├── samples/                 # messy_sales.csv, bank_export.xlsx
├── test-cases/              # corpora: text-cleaner, encodings, format-cleaner
├── tests/                   # pytest
├── demo/streamlit_app.py    # constrained Streamlit Community Cloud version
├── build/
│   ├── pyinstaller.spec     # cross-platform build spec
│   ├── launcher.py          # starts Streamlit, opens browser
│   ├── windows/installer.iss
│   ├── macos/{entitlements.plist, dmg_settings.py}
│   └── linux/AppImage/
├── ci/build.yml             # GitHub Actions matrix build
└── requirements.txt

2. Rebuild steps

From a complete ZIP backup

  1. Unzip into a clean directory.
  2. Push to GitHub.
  3. Tag a release → CI builds Windows / macOS / Linux artifacts.
  4. Connect repo to Streamlit Community Cloud → demo deploys.
  5. Local builds: see §3.

From documentation only (worst case)

  1. Read DECISIONS.md — understand why the project is what it is. §4c locks Streamlit; §4b locks UX standards. Non-negotiable.
  2. Read TECHNICAL.md §1-3 (architecture + build pipeline + Streamlit launcher pattern in §3.4).
  3. Read BUSINESS.md for product strategy + hosted demo as marketing asset.
  4. Recreate scripts using:
    • USER-GUIDE.md §2 (script table)
    • TECHNICAL.md §10 (04/06 boundary — do not relitigate)
    • TECHNICAL.md §11 (per-script functional specs; §11.1-11.3 are the v1 launch targets for Ready tools).
  5. Set up cross-platform build pipeline (§3 below).
  6. Recreate installer configs per TECHNICAL.md §3.5-3.7.
  7. Build constrained demo/streamlit_app.py (row limit, watermark, sample data).

3. Local build setup

Common

pip install -r requirements.txt pyinstaller
streamlit run src/gui/app.py    # verify GUI
python -m src.cli --help         # verify CLI

Windows

macOS

  1. xcode-select --install
  2. Enroll in Apple Developer Program ($99/yr — 1-2 wk first time).
  3. Generate Developer ID cert, install in Keychain.
  4. Generate app-specific password for notarytool.
  5. pyinstaller build/pyinstaller.spec
  6. codesign --deep --force --options runtime --sign "Developer ID Application: [Name]" dist/App.app
  7. Package as DMG.
  8. xcrun notarytool submit *.dmg --wait
  9. xcrun stapler staple *.dmg

Linux

  • Download appimagetool from https://appimage.github.io
  • pyinstaller build/pyinstaller.spec
  • Wrap as AppImage via assets in build/linux/AppImage/.

Streamlit + PyInstaller notes

  • Custom hook-streamlit.py required.
  • Hidden imports: streamlit, altair, pyarrow (and submodules where auto-detection fails).
  • The PyInstaller entry point is build/launcher.py, not the Streamlit script directly.
  • Budget 1-3 days first time. Reusable across all bundles.
git tag v1.0.0 && git push --tags
# GitHub Actions runs the matrix → 3 platform artifacts on Releases page.
# Manual: download → upload to Gumroad / Lemon Squeezy.

Hosted demo deployment

  • Connect GitHub repo to Streamlit Community Cloud (one-time, free).
  • Configure deployment → demo/streamlit_app.py.
  • Auto-updates on push to configured branch.
  • Custom domain optional via CNAME.

4. External dependencies

Item Source Cost
Python python.org/downloads Free
PyInstaller, Streamlit, Python libs pip install -r requirements.txt Free
Inno Setup (Windows) jrsoftware.org/isinfo.php Free
Apple Developer Program (macOS) developer.apple.com $99/yr
Xcode CLT (macOS) xcode-select --install Free
appimagetool (Linux) appimage.github.io Free
GitHub Actions (CI) github.com Free tier covers all 3 OS runners
Streamlit Community Cloud streamlit.io/cloud Free

5. Backup recommendation

  • Primary: GitHub repository (private). Source of truth.
  • Secondary: ZIP of full project tree on cloud storage (Drive / Dropbox / S3).
  • Apple Developer credentials: cert + app-specific password in a password manager. Re-issuable, not catastrophic.
  • Streamlit Community Cloud: stored as GitHub OAuth link in Streamlit UI. Re-authorize from new account if lost.
  • Back up after every meaningful change.
  • Always include RECOVERY.md + DECISIONS.md — irreplaceable context.

6. Recovery priorities (under time pressure)

  1. src/core/ + scripts — without these there is no product.
  2. DECISIONS.md — without this you'll re-litigate every settled call.
  3. TECHNICAL.md §10 (04/06 boundary) + §11 (per-script specs). Without these you'll rebuild dedup with weaker fuzzy than the v1 spec demands and lose to free Excel.
  4. src/gui/ — primary buyer surface; without it the product reverts to CLI-only and the persona refunds.
  5. PyInstaller spec + launcher + per-OS configs — recreating the Streamlit-PyInstaller integration is 1-3 days.
  6. Apple Developer Program enrollment — 1-2 wk lead. Start first if Mac matters.
  7. Hosted demo — important marketing asset, not blocking for desktop sales.
  8. Doc files (USER-GUIDE, BUSINESS, README) — recoverable from memory + this guide.
  9. CI config — nice to have, not blocking.