Files
datatools-dev/docs/RECOVERY.md
2026-04-28 22:02:07 +00:00

8.8 KiB

RECOVERY.md - Full Project Recovery Guide

Creator-only document. Do not ship to buyers.

Version: 1.6 Last updated: April 28, 2026

If the project is ever lost, this guide plus the source ZIP is enough to rebuild it 100%.


1. What's in the Project

project-root/
├── README.md
├── BUSINESS.md                  # Creator only
├── TECHNICAL.md                 # Creator only
├── DECISIONS.md                 # Creator only - locked criteria, rationale, GUI framework decision
├── USER-GUIDE.md                # Ships to buyers
├── RECOVERY.md                  # Creator only (this file)
│
├── scripts/                     # The 9 .py source files (CLI entry points)
│   ├── 01_deduplicator.py       # Working
│   ├── 02_text_cleaner.py
│   ├── 03_format_standardizer.py
│   ├── 04_missing_value_handler.py
│   ├── 05_column_mapper_enforcer.py
│   ├── 06_outlier_detector.py
│   ├── 07_multi_file_merger.py
│   ├── 08_validator_reporter.py
│   └── 09_master_orchestrator.py
│
├── src/
│   ├── core/                    # Shared business logic - both CLI and GUI call into this
│   ├── cli.py                   # Typer CLI front-end
│   └── gui/                     # Streamlit GUI front-end
│       ├── app.py               # Streamlit entry point
│       ├── pages/               # One Streamlit page per script in the bundle
│       └── components.py        # Shared widgets
│
├── samples/
│   ├── messy_sales.csv
│   └── bank_export.xlsx
│
├── demo/
│   └── streamlit_app.py         # Constrained version for Streamlit Community Cloud
│
├── build/
│   ├── pyinstaller.spec         # Cross-platform build spec (handles GUI launcher + CLI binaries)
│   ├── launcher.py              # Starts local Streamlit server, opens default browser
│   ├── windows/
│   │   └── installer.iss        # Inno Setup wrapper
│   ├── macos/
│   │   ├── entitlements.plist
│   │   └── dmg_settings.py
│   └── linux/
│       └── AppImage/            # AppImage build assets
│
├── ci/
│   └── build.yml                # GitHub Actions cross-platform build
│
├── tests/
│
└── requirements.txt

2. Rebuild Steps

From a complete ZIP backup

  1. Unzip into a clean directory.
  2. Push to a GitHub repository.
  3. The CI pipeline (ci/build.yml) builds Windows, macOS, and Linux artifacts on tagged releases.
  4. Connect the repo to Streamlit Community Cloud and point it at demo/streamlit_app.py to redeploy the hosted demo.
  5. For local builds: see Section 3.
  6. Done.

From documentation only (worst case)

  1. Read DECISIONS.md to understand why the project is what it is. Section 4c locks the GUI framework as Streamlit; Section 4b locks the UX standards. These are non-negotiable.
  2. Read TECHNICAL.md Sections 2-3 for the build pipeline architecture, including the Streamlit launcher pattern in Section 3.4.
  3. Read BUSINESS.md for product strategy, which bundles to build, and the hosted demo as a marketing asset.
  4. Recreate scripts using the spec in USER-GUIDE.md Section 2 (script table), TECHNICAL.md Section 7 (per-bundle technical notes), TECHNICAL.md Section 9 (boundary between scripts 04 and 06 - do not relitigate this), and TECHNICAL.md Section 10 (per-script functional requirements; Section 10.1 is the v1 launch target for the deduplicator).
  5. Set up the cross-platform build pipeline (Section 3 below).
  6. Recreate installer configs per TECHNICAL.md Section 3.
  7. Build the constrained demo/streamlit_app.py for hosted deployment. Constraints: row limit, watermark, sample data only or strict file-size cap.

3. Local Build Setup (per platform)

All platforms (common)

  • Install Python 3.11+.
  • pip install -r requirements.txt pyinstaller
  • Verify Streamlit app runs locally: streamlit run src/gui/app.py
  • Verify CLI runs locally: python -m src.cli --help

Windows

  • Install Inno Setup: https://jrsoftware.org/isinfo.php
  • Build: pyinstaller build/pyinstaller.spec
  • Wrap in installer: open build/windows/installer.iss in Inno Setup, compile.

macOS

  • Install Xcode command line tools: xcode-select --install
  • Enroll in Apple Developer Program ($99/yr). Allow 1-2 weeks first time.
  • Generate Developer ID Application certificate, install in Keychain.
  • Generate app-specific password for notarytool.
  • Build: pyinstaller build/pyinstaller.spec
  • Sign: codesign --deep --force --options runtime --sign "Developer ID Application: [Name]" dist/BundleName.app
  • Package as DMG.
  • Notarize: xcrun notarytool submit BundleName.dmg --wait
  • Staple: xcrun stapler staple BundleName.dmg

Linux

  • Install AppImage tooling: download appimagetool from https://appimage.github.io
  • Build: pyinstaller build/pyinstaller.spec
  • Wrap as AppImage using appimagetool per the assets in build/linux/AppImage/.

Streamlit + PyInstaller specific notes

  • A custom PyInstaller hook (hook-streamlit.py) is required to bundle Streamlit's data files correctly.
  • Hidden imports must include streamlit, altair, pyarrow (and their submodules where PyInstaller fails to detect them).
  • The launcher script (build/launcher.py) is the actual PyInstaller entry point, not the Streamlit script directly.
  • Budget 1-3 days the first time getting the Streamlit-PyInstaller spec right; it's reusable across all subsequent bundles.
  • Push the repo to GitHub.
  • Tag a release: git tag v1.0.0 && git push --tags
  • GitHub Actions runs the matrix build, produces all three artifacts.
  • Manual step: download artifacts from the Releases page, upload to Gumroad / Lemon Squeezy.

Hosted demo deployment (separate from desktop build)

  • Connect GitHub repo to Streamlit Community Cloud (one-time, free).
  • Configure the deployment to point at demo/streamlit_app.py.
  • The demo updates automatically on git push to the configured branch.
  • Custom domain optional via CNAME (verify Streamlit Community Cloud current policy at recovery time).

4. External Dependencies (re-acquire if lost)

Item Source Cost
Python https://python.org/downloads Free
PyInstaller pip install pyinstaller Free
Streamlit pip install streamlit Free
Inno Setup (Windows) https://jrsoftware.org/isinfo.php Free
Apple Developer Program (macOS signing) https://developer.apple.com $99/yr
Xcode command line tools (macOS) xcode-select --install Free
appimagetool (Linux) https://appimage.github.io Free
GitHub Actions (CI) github.com Free tier covers all three OS runners
Streamlit Community Cloud (demo hosting) streamlit.io/cloud Free
Python libraries See requirements.txt, pip install -r requirements.txt Free

5. Backup Recommendation

  • Primary backup: GitHub repository (private). Source is the source of truth.
  • Secondary backup: ZIP of the full project tree on cloud storage (Google Drive / Dropbox / S3).
  • Apple Developer credentials: store certificate + app-specific password in a password manager. Losing these requires regenerating, not catastrophic.
  • Streamlit Community Cloud connection: stored in Streamlit's UI as a GitHub OAuth link. Re-authorize from a new Streamlit account if lost.
  • Back up after every meaningful code or doc change.
  • Include this RECOVERY.md and DECISIONS.md in every backup. They contain the irreplaceable context.

6. Recovery Priorities (if rebuilding under time pressure)

If you only have time to rebuild part of the project, this is the order:

  1. Source: src/core/ and scripts/. Without these there is no product.
  2. DECISIONS.md. Without this you will re-litigate every settled decision (especially GUI framework, dual interface, UX standards) and probably get it wrong differently.
  3. TECHNICAL.md, especially Sections 9 (04/06 boundary) and 10 (per-script functional requirements). Without these you will rebuild the deduplicator with weaker fuzzy matching than the v1 launch spec demands and ship something that loses to free Excel.
  4. Streamlit GUI source (src/gui/). The primary buyer surface; without it the product reverts to CLI-only and the buyer persona will refund.
  5. PyInstaller spec + launcher + per-OS build configs (build/). Reproducing the Streamlit-PyInstaller integration from scratch is 1-3 days of work.
  6. Apple Developer Program enrollment. 1-2 week lead time. Start this first if Mac distribution matters.
  7. Hosted demo (demo/streamlit_app.py). Important marketing asset but not blocking for desktop sales.
  8. Documentation files (USER-GUIDE, BUSINESS, README). Recoverable from memory + this guide.
  9. CI config (ci/build.yml). Nice to have, not blocking.