# RECOVERY.md - Full Project Recovery Guide > **Creator-only document. Do not ship to buyers.** **Version**: 1.6 **Last updated**: April 28, 2026 If the project is ever lost, this guide plus the source ZIP is enough to rebuild it 100%. --- ## 1. What's in the Project ``` project-root/ ├── README.md ├── BUSINESS.md # Creator only ├── TECHNICAL.md # Creator only ├── DECISIONS.md # Creator only - locked criteria, rationale, GUI framework decision ├── USER-GUIDE.md # Ships to buyers ├── RECOVERY.md # Creator only (this file) │ ├── scripts/ # The 9 .py source files (CLI entry points) │ ├── 01_deduplicator.py # Working │ ├── 02_text_cleaner.py │ ├── 03_format_standardizer.py │ ├── 04_missing_value_handler.py │ ├── 05_column_mapper_enforcer.py │ ├── 06_outlier_detector.py │ ├── 07_multi_file_merger.py │ ├── 08_validator_reporter.py │ └── 09_master_orchestrator.py │ ├── src/ │ ├── core/ # Shared business logic - both CLI and GUI call into this │ ├── cli.py # Typer CLI front-end │ └── gui/ # Streamlit GUI front-end │ ├── app.py # Streamlit entry point │ ├── pages/ # One Streamlit page per script in the bundle │ └── components.py # Shared widgets │ ├── samples/ │ ├── messy_sales.csv │ └── bank_export.xlsx │ ├── demo/ │ └── streamlit_app.py # Constrained version for Streamlit Community Cloud │ ├── build/ │ ├── pyinstaller.spec # Cross-platform build spec (handles GUI launcher + CLI binaries) │ ├── launcher.py # Starts local Streamlit server, opens default browser │ ├── windows/ │ │ └── installer.iss # Inno Setup wrapper │ ├── macos/ │ │ ├── entitlements.plist │ │ └── dmg_settings.py │ └── linux/ │ └── AppImage/ # AppImage build assets │ ├── ci/ │ └── build.yml # GitHub Actions cross-platform build │ ├── tests/ │ └── requirements.txt ``` --- ## 2. Rebuild Steps ### From a complete ZIP backup 1. Unzip into a clean directory. 2. Push to a GitHub repository. 3. The CI pipeline (`ci/build.yml`) builds Windows, macOS, and Linux artifacts on tagged releases. 4. Connect the repo to Streamlit Community Cloud and point it at `demo/streamlit_app.py` to redeploy the hosted demo. 5. For local builds: see Section 3. 6. Done. ### From documentation only (worst case) 1. Read `DECISIONS.md` to understand *why* the project is what it is. Section 4c locks the GUI framework as Streamlit; Section 4b locks the UX standards. These are non-negotiable. 2. Read `TECHNICAL.md` Sections 2-3 for the build pipeline architecture, including the Streamlit launcher pattern in Section 3.4. 3. Read `BUSINESS.md` for product strategy, which bundles to build, and the hosted demo as a marketing asset. 4. Recreate scripts using the spec in `USER-GUIDE.md` Section 2 (script table), `TECHNICAL.md` Section 7 (per-bundle technical notes), `TECHNICAL.md` Section 9 (boundary between scripts 04 and 06 - do not relitigate this), and `TECHNICAL.md` Section 10 (per-script functional requirements; Section 10.1 is the v1 launch target for the deduplicator). 5. Set up the cross-platform build pipeline (Section 3 below). 6. Recreate installer configs per `TECHNICAL.md` Section 3. 7. Build the constrained `demo/streamlit_app.py` for hosted deployment. Constraints: row limit, watermark, sample data only or strict file-size cap. --- ## 3. Local Build Setup (per platform) ### All platforms (common) - Install Python 3.11+. - `pip install -r requirements.txt pyinstaller` - Verify Streamlit app runs locally: `streamlit run src/gui/app.py` - Verify CLI runs locally: `python -m src.cli --help` ### Windows - Install Inno Setup: https://jrsoftware.org/isinfo.php - Build: `pyinstaller build/pyinstaller.spec` - Wrap in installer: open `build/windows/installer.iss` in Inno Setup, compile. ### macOS - Install Xcode command line tools: `xcode-select --install` - Enroll in Apple Developer Program ($99/yr). Allow 1-2 weeks first time. - Generate Developer ID Application certificate, install in Keychain. - Generate app-specific password for `notarytool`. - Build: `pyinstaller build/pyinstaller.spec` - Sign: `codesign --deep --force --options runtime --sign "Developer ID Application: [Name]" dist/BundleName.app` - Package as DMG. - Notarize: `xcrun notarytool submit BundleName.dmg --wait` - Staple: `xcrun stapler staple BundleName.dmg` ### Linux - Install AppImage tooling: download `appimagetool` from https://appimage.github.io - Build: `pyinstaller build/pyinstaller.spec` - Wrap as AppImage using `appimagetool` per the assets in `build/linux/AppImage/`. ### Streamlit + PyInstaller specific notes - A custom PyInstaller hook (`hook-streamlit.py`) is required to bundle Streamlit's data files correctly. - Hidden imports must include `streamlit`, `altair`, `pyarrow` (and their submodules where PyInstaller fails to detect them). - The launcher script (`build/launcher.py`) is the actual PyInstaller entry point, not the Streamlit script directly. - Budget 1-3 days the first time getting the Streamlit-PyInstaller spec right; it's reusable across all subsequent bundles. ### CI build (recommended) - Push the repo to GitHub. - Tag a release: `git tag v1.0.0 && git push --tags` - GitHub Actions runs the matrix build, produces all three artifacts. - Manual step: download artifacts from the Releases page, upload to Gumroad / Lemon Squeezy. ### Hosted demo deployment (separate from desktop build) - Connect GitHub repo to Streamlit Community Cloud (one-time, free). - Configure the deployment to point at `demo/streamlit_app.py`. - The demo updates automatically on git push to the configured branch. - Custom domain optional via CNAME (verify Streamlit Community Cloud current policy at recovery time). --- ## 4. External Dependencies (re-acquire if lost) | Item | Source | Cost | |---|---|---| | Python | https://python.org/downloads | Free | | PyInstaller | `pip install pyinstaller` | Free | | Streamlit | `pip install streamlit` | Free | | Inno Setup (Windows) | https://jrsoftware.org/isinfo.php | Free | | Apple Developer Program (macOS signing) | https://developer.apple.com | $99/yr | | Xcode command line tools (macOS) | `xcode-select --install` | Free | | appimagetool (Linux) | https://appimage.github.io | Free | | GitHub Actions (CI) | github.com | Free tier covers all three OS runners | | Streamlit Community Cloud (demo hosting) | streamlit.io/cloud | Free | | Python libraries | See `requirements.txt`, `pip install -r requirements.txt` | Free | --- ## 5. Backup Recommendation - **Primary backup**: GitHub repository (private). Source is the source of truth. - **Secondary backup**: ZIP of the full project tree on cloud storage (Google Drive / Dropbox / S3). - **Apple Developer credentials**: store certificate + app-specific password in a password manager. Losing these requires regenerating, not catastrophic. - **Streamlit Community Cloud connection**: stored in Streamlit's UI as a GitHub OAuth link. Re-authorize from a new Streamlit account if lost. - Back up after every meaningful code or doc change. - Include this `RECOVERY.md` and `DECISIONS.md` in every backup. They contain the irreplaceable context. --- ## 6. Recovery Priorities (if rebuilding under time pressure) If you only have time to rebuild part of the project, this is the order: 1. **Source: `src/core/` and `scripts/`**. Without these there is no product. 2. **DECISIONS.md**. Without this you will re-litigate every settled decision (especially GUI framework, dual interface, UX standards) and probably get it wrong differently. 3. **TECHNICAL.md**, especially Sections 9 (04/06 boundary) and 10 (per-script functional requirements). Without these you will rebuild the deduplicator with weaker fuzzy matching than the v1 launch spec demands and ship something that loses to free Excel. 4. **Streamlit GUI source (`src/gui/`)**. The primary buyer surface; without it the product reverts to CLI-only and the buyer persona will refund. 5. **PyInstaller spec + launcher + per-OS build configs** (`build/`). Reproducing the Streamlit-PyInstaller integration from scratch is 1-3 days of work. 6. **Apple Developer Program enrollment**. 1-2 week lead time. Start this first if Mac distribution matters. 7. **Hosted demo (`demo/streamlit_app.py`)**. Important marketing asset but not blocking for desktop sales. 8. Documentation files (USER-GUIDE, BUSINESS, README). Recoverable from memory + this guide. 9. CI config (`ci/build.yml`). Nice to have, not blocking.