docs: add project documentation files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-28 22:02:07 +00:00
parent a23f7a9b6f
commit 0613dc420c
7 changed files with 1370 additions and 0 deletions

180
docs/RECOVERY.md Normal file
View File

@@ -0,0 +1,180 @@
# RECOVERY.md - Full Project Recovery Guide
> **Creator-only document. Do not ship to buyers.**
**Version**: 1.6
**Last updated**: April 28, 2026
If the project is ever lost, this guide plus the source ZIP is enough to rebuild it 100%.
---
## 1. What's in the Project
```
project-root/
├── README.md
├── BUSINESS.md # Creator only
├── TECHNICAL.md # Creator only
├── DECISIONS.md # Creator only - locked criteria, rationale, GUI framework decision
├── USER-GUIDE.md # Ships to buyers
├── RECOVERY.md # Creator only (this file)
├── scripts/ # The 9 .py source files (CLI entry points)
│ ├── 01_deduplicator.py # Working
│ ├── 02_text_cleaner.py
│ ├── 03_format_standardizer.py
│ ├── 04_missing_value_handler.py
│ ├── 05_column_mapper_enforcer.py
│ ├── 06_outlier_detector.py
│ ├── 07_multi_file_merger.py
│ ├── 08_validator_reporter.py
│ └── 09_master_orchestrator.py
├── src/
│ ├── core/ # Shared business logic - both CLI and GUI call into this
│ ├── cli.py # Typer CLI front-end
│ └── gui/ # Streamlit GUI front-end
│ ├── app.py # Streamlit entry point
│ ├── pages/ # One Streamlit page per script in the bundle
│ └── components.py # Shared widgets
├── samples/
│ ├── messy_sales.csv
│ └── bank_export.xlsx
├── demo/
│ └── streamlit_app.py # Constrained version for Streamlit Community Cloud
├── build/
│ ├── pyinstaller.spec # Cross-platform build spec (handles GUI launcher + CLI binaries)
│ ├── launcher.py # Starts local Streamlit server, opens default browser
│ ├── windows/
│ │ └── installer.iss # Inno Setup wrapper
│ ├── macos/
│ │ ├── entitlements.plist
│ │ └── dmg_settings.py
│ └── linux/
│ └── AppImage/ # AppImage build assets
├── ci/
│ └── build.yml # GitHub Actions cross-platform build
├── tests/
└── requirements.txt
```
---
## 2. Rebuild Steps
### From a complete ZIP backup
1. Unzip into a clean directory.
2. Push to a GitHub repository.
3. The CI pipeline (`ci/build.yml`) builds Windows, macOS, and Linux artifacts on tagged releases.
4. Connect the repo to Streamlit Community Cloud and point it at `demo/streamlit_app.py` to redeploy the hosted demo.
5. For local builds: see Section 3.
6. Done.
### From documentation only (worst case)
1. Read `DECISIONS.md` to understand *why* the project is what it is. Section 4c locks the GUI framework as Streamlit; Section 4b locks the UX standards. These are non-negotiable.
2. Read `TECHNICAL.md` Sections 2-3 for the build pipeline architecture, including the Streamlit launcher pattern in Section 3.4.
3. Read `BUSINESS.md` for product strategy, which bundles to build, and the hosted demo as a marketing asset.
4. Recreate scripts using the spec in `USER-GUIDE.md` Section 2 (script table), `TECHNICAL.md` Section 7 (per-bundle technical notes), `TECHNICAL.md` Section 9 (boundary between scripts 04 and 06 - do not relitigate this), and `TECHNICAL.md` Section 10 (per-script functional requirements; Section 10.1 is the v1 launch target for the deduplicator).
5. Set up the cross-platform build pipeline (Section 3 below).
6. Recreate installer configs per `TECHNICAL.md` Section 3.
7. Build the constrained `demo/streamlit_app.py` for hosted deployment. Constraints: row limit, watermark, sample data only or strict file-size cap.
---
## 3. Local Build Setup (per platform)
### All platforms (common)
- Install Python 3.11+.
- `pip install -r requirements.txt pyinstaller`
- Verify Streamlit app runs locally: `streamlit run src/gui/app.py`
- Verify CLI runs locally: `python -m src.cli --help`
### Windows
- Install Inno Setup: https://jrsoftware.org/isinfo.php
- Build: `pyinstaller build/pyinstaller.spec`
- Wrap in installer: open `build/windows/installer.iss` in Inno Setup, compile.
### macOS
- Install Xcode command line tools: `xcode-select --install`
- Enroll in Apple Developer Program ($99/yr). Allow 1-2 weeks first time.
- Generate Developer ID Application certificate, install in Keychain.
- Generate app-specific password for `notarytool`.
- Build: `pyinstaller build/pyinstaller.spec`
- Sign: `codesign --deep --force --options runtime --sign "Developer ID Application: [Name]" dist/BundleName.app`
- Package as DMG.
- Notarize: `xcrun notarytool submit BundleName.dmg --wait`
- Staple: `xcrun stapler staple BundleName.dmg`
### Linux
- Install AppImage tooling: download `appimagetool` from https://appimage.github.io
- Build: `pyinstaller build/pyinstaller.spec`
- Wrap as AppImage using `appimagetool` per the assets in `build/linux/AppImage/`.
### Streamlit + PyInstaller specific notes
- A custom PyInstaller hook (`hook-streamlit.py`) is required to bundle Streamlit's data files correctly.
- Hidden imports must include `streamlit`, `altair`, `pyarrow` (and their submodules where PyInstaller fails to detect them).
- The launcher script (`build/launcher.py`) is the actual PyInstaller entry point, not the Streamlit script directly.
- Budget 1-3 days the first time getting the Streamlit-PyInstaller spec right; it's reusable across all subsequent bundles.
### CI build (recommended)
- Push the repo to GitHub.
- Tag a release: `git tag v1.0.0 && git push --tags`
- GitHub Actions runs the matrix build, produces all three artifacts.
- Manual step: download artifacts from the Releases page, upload to Gumroad / Lemon Squeezy.
### Hosted demo deployment (separate from desktop build)
- Connect GitHub repo to Streamlit Community Cloud (one-time, free).
- Configure the deployment to point at `demo/streamlit_app.py`.
- The demo updates automatically on git push to the configured branch.
- Custom domain optional via CNAME (verify Streamlit Community Cloud current policy at recovery time).
---
## 4. External Dependencies (re-acquire if lost)
| Item | Source | Cost |
|---|---|---|
| Python | https://python.org/downloads | Free |
| PyInstaller | `pip install pyinstaller` | Free |
| Streamlit | `pip install streamlit` | Free |
| Inno Setup (Windows) | https://jrsoftware.org/isinfo.php | Free |
| Apple Developer Program (macOS signing) | https://developer.apple.com | $99/yr |
| Xcode command line tools (macOS) | `xcode-select --install` | Free |
| appimagetool (Linux) | https://appimage.github.io | Free |
| GitHub Actions (CI) | github.com | Free tier covers all three OS runners |
| Streamlit Community Cloud (demo hosting) | streamlit.io/cloud | Free |
| Python libraries | See `requirements.txt`, `pip install -r requirements.txt` | Free |
---
## 5. Backup Recommendation
- **Primary backup**: GitHub repository (private). Source is the source of truth.
- **Secondary backup**: ZIP of the full project tree on cloud storage (Google Drive / Dropbox / S3).
- **Apple Developer credentials**: store certificate + app-specific password in a password manager. Losing these requires regenerating, not catastrophic.
- **Streamlit Community Cloud connection**: stored in Streamlit's UI as a GitHub OAuth link. Re-authorize from a new Streamlit account if lost.
- Back up after every meaningful code or doc change.
- Include this `RECOVERY.md` and `DECISIONS.md` in every backup. They contain the irreplaceable context.
---
## 6. Recovery Priorities (if rebuilding under time pressure)
If you only have time to rebuild part of the project, this is the order:
1. **Source: `src/core/` and `scripts/`**. Without these there is no product.
2. **DECISIONS.md**. Without this you will re-litigate every settled decision (especially GUI framework, dual interface, UX standards) and probably get it wrong differently.
3. **TECHNICAL.md**, especially Sections 9 (04/06 boundary) and 10 (per-script functional requirements). Without these you will rebuild the deduplicator with weaker fuzzy matching than the v1 launch spec demands and ship something that loses to free Excel.
4. **Streamlit GUI source (`src/gui/`)**. The primary buyer surface; without it the product reverts to CLI-only and the buyer persona will refund.
5. **PyInstaller spec + launcher + per-OS build configs** (`build/`). Reproducing the Streamlit-PyInstaller integration from scratch is 1-3 days of work.
6. **Apple Developer Program enrollment**. 1-2 week lead time. Start this first if Mac distribution matters.
7. **Hosted demo (`demo/streamlit_app.py`)**. Important marketing asset but not blocking for desktop sales.
8. Documentation files (USER-GUIDE, BUSINESS, README). Recoverable from memory + this guide.
9. CI config (`ci/build.yml`). Nice to have, not blocking.