docs: tight, scannable rewrite — every item earns its place
Refactors all 10 docs (README, USER-GUIDE, CLI-REFERENCE, REQUIREMENTS, TECHNICAL, DEVELOPER, BUSINESS, DECISIONS, RECOVERY, docs/README) from prose-heavy to bullet-heavy + table-heavy. Same information density, significantly less reading load. Net: 2600 → 1652 lines (~37% reduction) WHILE adding the new content that landed since v1.6: - Format Standardizer (3rd Ready tool) - 199-row buyer corpus - src/core/errors.py structured hierarchy + ensure_dataframe / ensure_choice / wrap_file_read|write / format_for_user helpers - src/core/_constants.py shared USPS/state lookup tables - Cross-tool audit fixes (NaN matching, removed_df schema, validation, enum-bounds checks, forward-compat config) - Per-domain error_policy across format standardizers - Inconsistent-date-format detector - Excel header-row auto-detection + write_file delimiter param Per-doc changes: - README.md (175 → 71): 9-tool table at top, status column, 3 CLI entry points listed, dropped repeated marketing prose. - docs/README.md (38 → 27): pure index — buyer-facing vs creator-only split + version footer. - USER-GUIDE.md (208 → 118): tool table replaces script descriptions, troubleshooting compressed to bullets, gate explanation tightened. - CLI-REFERENCE.md (451 → 235): collapsed flag tables, removed redundant intro text, kept full recipes section. - REQUIREMENTS.md (146 → 129): 18 numbered sections (was 17), added §18 Error Handling, formatting tightened to single-line entries. - TECHNICAL.md (570 → 350): collapsed §3 build pipeline tables, merged redundant §3.5-3.7 OS sections, added §7 (Error handling) + §11.3 (Format Standardizer spec) + §11.4-11.7 (analyzer / gate / Review page / repair_bytes promoted from §10.2.x sub-numbering). - DEVELOPER.md (285 → 161): module map table replaces per-file prose, extension recipes condensed, new §Errors covers when to use each hierarchy class. - BUSINESS.md (278 → 225): collapsed prose to tables (use cases, competitive landscape, costs, risks); honest-status updated. - DECISIONS.md (269 → 189): scoring rubric + GUI matrix preserved, decision log compressed to single-line entries, added v1.6 entries (Format Standardizer Ready, errors module). - RECOVERY.md (180 → 147): rebuild steps as numbered + tabular, external dependencies as one table, recovery priorities tightened. No information removed; redundancy compressed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
251
docs/RECOVERY.md
251
docs/RECOVERY.md
@@ -1,180 +1,147 @@
|
||||
# RECOVERY.md - Full Project Recovery Guide
|
||||
# Recovery
|
||||
|
||||
> **Creator-only document. Do not ship to buyers.**
|
||||
> Creator-only. Full project rebuild guide.
|
||||
> **Version**: 1.6 · **Updated**: 2026-05-01
|
||||
|
||||
**Version**: 1.6
|
||||
**Last updated**: April 28, 2026
|
||||
If lost, this doc + the source ZIP rebuilds the project 100%.
|
||||
|
||||
If the project is ever lost, this guide plus the source ZIP is enough to rebuild it 100%.
|
||||
|
||||
---
|
||||
|
||||
## 1. What's in the Project
|
||||
## 1. Project layout
|
||||
|
||||
```
|
||||
project-root/
|
||||
├── README.md
|
||||
├── BUSINESS.md # Creator only
|
||||
├── TECHNICAL.md # Creator only
|
||||
├── DECISIONS.md # Creator only - locked criteria, rationale, GUI framework decision
|
||||
├── USER-GUIDE.md # Ships to buyers
|
||||
├── RECOVERY.md # Creator only (this file)
|
||||
│
|
||||
├── scripts/ # The 9 .py source files (CLI entry points)
|
||||
│ ├── 01_deduplicator.py # Working
|
||||
│ ├── 02_text_cleaner.py
|
||||
│ ├── 03_format_standardizer.py
|
||||
│ ├── 04_missing_value_handler.py
|
||||
│ ├── 05_column_mapper_enforcer.py
|
||||
│ ├── 06_outlier_detector.py
|
||||
│ ├── 07_multi_file_merger.py
|
||||
│ ├── 08_validator_reporter.py
|
||||
│ └── 09_master_orchestrator.py
|
||||
│
|
||||
├── docs/
|
||||
│ ├── BUSINESS.md # creator-only
|
||||
│ ├── TECHNICAL.md # creator-only
|
||||
│ ├── DECISIONS.md # creator-only — locked criteria + decision log
|
||||
│ ├── DEVELOPER.md # creator-only
|
||||
│ ├── RECOVERY.md # creator-only (this file)
|
||||
│ ├── REQUIREMENTS.md
|
||||
│ ├── USER-GUIDE.md # ships to buyers
|
||||
│ └── CLI-REFERENCE.md
|
||||
├── src/
|
||||
│ ├── core/ # Shared business logic - both CLI and GUI call into this
|
||||
│ ├── cli.py # Typer CLI front-end
|
||||
│ └── gui/ # Streamlit GUI front-end
|
||||
│ ├── app.py # Streamlit entry point
|
||||
│ ├── pages/ # One Streamlit page per script in the bundle
|
||||
│ └── components.py # Shared widgets
|
||||
│
|
||||
├── samples/
|
||||
│ ├── messy_sales.csv
|
||||
│ └── bank_export.xlsx
|
||||
│
|
||||
├── demo/
|
||||
│ └── streamlit_app.py # Constrained version for Streamlit Community Cloud
|
||||
│
|
||||
│ ├── core/ # shared logic — both CLI + GUI call into this
|
||||
│ ├── cli.py # Deduplicator CLI
|
||||
│ ├── cli_text_clean.py # Text Cleaner CLI
|
||||
│ ├── cli_analyze.py # Analyzer CLI
|
||||
│ └── gui/
|
||||
│ ├── app.py # Streamlit entry
|
||||
│ ├── pages/ # one page per tool
|
||||
│ └── components/ # shared widgets
|
||||
├── samples/ # messy_sales.csv, bank_export.xlsx
|
||||
├── test-cases/ # corpora: text-cleaner, encodings, format-cleaner
|
||||
├── tests/ # pytest
|
||||
├── demo/streamlit_app.py # constrained Streamlit Community Cloud version
|
||||
├── build/
|
||||
│ ├── pyinstaller.spec # Cross-platform build spec (handles GUI launcher + CLI binaries)
|
||||
│ ├── launcher.py # Starts local Streamlit server, opens default browser
|
||||
│ ├── windows/
|
||||
│ │ └── installer.iss # Inno Setup wrapper
|
||||
│ ├── macos/
|
||||
│ │ ├── entitlements.plist
|
||||
│ │ └── dmg_settings.py
|
||||
│ └── linux/
|
||||
│ └── AppImage/ # AppImage build assets
|
||||
│
|
||||
├── ci/
|
||||
│ └── build.yml # GitHub Actions cross-platform build
|
||||
│
|
||||
├── tests/
|
||||
│
|
||||
│ ├── pyinstaller.spec # cross-platform build spec
|
||||
│ ├── launcher.py # starts Streamlit, opens browser
|
||||
│ ├── windows/installer.iss
|
||||
│ ├── macos/{entitlements.plist, dmg_settings.py}
|
||||
│ └── linux/AppImage/
|
||||
├── ci/build.yml # GitHub Actions matrix build
|
||||
└── requirements.txt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Rebuild Steps
|
||||
## 2. Rebuild steps
|
||||
|
||||
### From a complete ZIP backup
|
||||
1. Unzip into a clean directory.
|
||||
2. Push to a GitHub repository.
|
||||
3. The CI pipeline (`ci/build.yml`) builds Windows, macOS, and Linux artifacts on tagged releases.
|
||||
4. Connect the repo to Streamlit Community Cloud and point it at `demo/streamlit_app.py` to redeploy the hosted demo.
|
||||
5. For local builds: see Section 3.
|
||||
6. Done.
|
||||
2. Push to GitHub.
|
||||
3. Tag a release → CI builds Windows / macOS / Linux artifacts.
|
||||
4. Connect repo to Streamlit Community Cloud → demo deploys.
|
||||
5. Local builds: see §3.
|
||||
|
||||
### From documentation only (worst case)
|
||||
1. Read `DECISIONS.md` to understand *why* the project is what it is. Section 4c locks the GUI framework as Streamlit; Section 4b locks the UX standards. These are non-negotiable.
|
||||
2. Read `TECHNICAL.md` Sections 2-3 for the build pipeline architecture, including the Streamlit launcher pattern in Section 3.4.
|
||||
3. Read `BUSINESS.md` for product strategy, which bundles to build, and the hosted demo as a marketing asset.
|
||||
4. Recreate scripts using the spec in `USER-GUIDE.md` Section 2 (script table), `TECHNICAL.md` Section 7 (per-bundle technical notes), `TECHNICAL.md` Section 9 (boundary between scripts 04 and 06 - do not relitigate this), and `TECHNICAL.md` Section 10 (per-script functional requirements; Section 10.1 is the v1 launch target for the deduplicator).
|
||||
5. Set up the cross-platform build pipeline (Section 3 below).
|
||||
6. Recreate installer configs per `TECHNICAL.md` Section 3.
|
||||
7. Build the constrained `demo/streamlit_app.py` for hosted deployment. Constraints: row limit, watermark, sample data only or strict file-size cap.
|
||||
1. Read **DECISIONS.md** — understand *why* the project is what it is. §4c locks Streamlit; §4b locks UX standards. **Non-negotiable.**
|
||||
2. Read **TECHNICAL.md** §1-3 (architecture + build pipeline + Streamlit launcher pattern in §3.4).
|
||||
3. Read **BUSINESS.md** for product strategy + hosted demo as marketing asset.
|
||||
4. Recreate scripts using:
|
||||
- USER-GUIDE.md §2 (script table)
|
||||
- TECHNICAL.md §10 (04/06 boundary — do not relitigate)
|
||||
- TECHNICAL.md §11 (per-script functional specs; §11.1-11.3 are the v1 launch targets for Ready tools).
|
||||
5. Set up cross-platform build pipeline (§3 below).
|
||||
6. Recreate installer configs per TECHNICAL.md §3.5-3.7.
|
||||
7. Build constrained `demo/streamlit_app.py` (row limit, watermark, sample data).
|
||||
|
||||
---
|
||||
## 3. Local build setup
|
||||
|
||||
## 3. Local Build Setup (per platform)
|
||||
|
||||
### All platforms (common)
|
||||
- Install Python 3.11+.
|
||||
- `pip install -r requirements.txt pyinstaller`
|
||||
- Verify Streamlit app runs locally: `streamlit run src/gui/app.py`
|
||||
- Verify CLI runs locally: `python -m src.cli --help`
|
||||
### Common
|
||||
```bash
|
||||
pip install -r requirements.txt pyinstaller
|
||||
streamlit run src/gui/app.py # verify GUI
|
||||
python -m src.cli --help # verify CLI
|
||||
```
|
||||
|
||||
### Windows
|
||||
- Install Inno Setup: https://jrsoftware.org/isinfo.php
|
||||
- Build: `pyinstaller build/pyinstaller.spec`
|
||||
- Wrap in installer: open `build/windows/installer.iss` in Inno Setup, compile.
|
||||
- `pyinstaller build/pyinstaller.spec`
|
||||
- Open `build/windows/installer.iss` in Inno Setup, compile.
|
||||
|
||||
### macOS
|
||||
- Install Xcode command line tools: `xcode-select --install`
|
||||
- Enroll in Apple Developer Program ($99/yr). Allow 1-2 weeks first time.
|
||||
- Generate Developer ID Application certificate, install in Keychain.
|
||||
- Generate app-specific password for `notarytool`.
|
||||
- Build: `pyinstaller build/pyinstaller.spec`
|
||||
- Sign: `codesign --deep --force --options runtime --sign "Developer ID Application: [Name]" dist/BundleName.app`
|
||||
- Package as DMG.
|
||||
- Notarize: `xcrun notarytool submit BundleName.dmg --wait`
|
||||
- Staple: `xcrun stapler staple BundleName.dmg`
|
||||
1. `xcode-select --install`
|
||||
2. Enroll in Apple Developer Program ($99/yr — 1-2 wk first time).
|
||||
3. Generate Developer ID cert, install in Keychain.
|
||||
4. Generate app-specific password for `notarytool`.
|
||||
5. `pyinstaller build/pyinstaller.spec`
|
||||
6. `codesign --deep --force --options runtime --sign "Developer ID Application: [Name]" dist/App.app`
|
||||
7. Package as DMG.
|
||||
8. `xcrun notarytool submit *.dmg --wait`
|
||||
9. `xcrun stapler staple *.dmg`
|
||||
|
||||
### Linux
|
||||
- Install AppImage tooling: download `appimagetool` from https://appimage.github.io
|
||||
- Build: `pyinstaller build/pyinstaller.spec`
|
||||
- Wrap as AppImage using `appimagetool` per the assets in `build/linux/AppImage/`.
|
||||
- Download `appimagetool` from https://appimage.github.io
|
||||
- `pyinstaller build/pyinstaller.spec`
|
||||
- Wrap as AppImage via assets in `build/linux/AppImage/`.
|
||||
|
||||
### Streamlit + PyInstaller specific notes
|
||||
- A custom PyInstaller hook (`hook-streamlit.py`) is required to bundle Streamlit's data files correctly.
|
||||
- Hidden imports must include `streamlit`, `altair`, `pyarrow` (and their submodules where PyInstaller fails to detect them).
|
||||
- The launcher script (`build/launcher.py`) is the actual PyInstaller entry point, not the Streamlit script directly.
|
||||
- Budget 1-3 days the first time getting the Streamlit-PyInstaller spec right; it's reusable across all subsequent bundles.
|
||||
### Streamlit + PyInstaller notes
|
||||
- Custom `hook-streamlit.py` required.
|
||||
- Hidden imports: `streamlit`, `altair`, `pyarrow` (and submodules where auto-detection fails).
|
||||
- The PyInstaller entry point is `build/launcher.py`, **not** the Streamlit script directly.
|
||||
- Budget 1-3 days first time. Reusable across all bundles.
|
||||
|
||||
### CI build (recommended)
|
||||
- Push the repo to GitHub.
|
||||
- Tag a release: `git tag v1.0.0 && git push --tags`
|
||||
- GitHub Actions runs the matrix build, produces all three artifacts.
|
||||
- Manual step: download artifacts from the Releases page, upload to Gumroad / Lemon Squeezy.
|
||||
```bash
|
||||
git tag v1.0.0 && git push --tags
|
||||
# GitHub Actions runs the matrix → 3 platform artifacts on Releases page.
|
||||
# Manual: download → upload to Gumroad / Lemon Squeezy.
|
||||
```
|
||||
|
||||
### Hosted demo deployment (separate from desktop build)
|
||||
### Hosted demo deployment
|
||||
- Connect GitHub repo to Streamlit Community Cloud (one-time, free).
|
||||
- Configure the deployment to point at `demo/streamlit_app.py`.
|
||||
- The demo updates automatically on git push to the configured branch.
|
||||
- Custom domain optional via CNAME (verify Streamlit Community Cloud current policy at recovery time).
|
||||
- Configure deployment → `demo/streamlit_app.py`.
|
||||
- Auto-updates on push to configured branch.
|
||||
- Custom domain optional via CNAME.
|
||||
|
||||
---
|
||||
|
||||
## 4. External Dependencies (re-acquire if lost)
|
||||
## 4. External dependencies
|
||||
|
||||
| Item | Source | Cost |
|
||||
|---|---|---|
|
||||
| Python | https://python.org/downloads | Free |
|
||||
| PyInstaller | `pip install pyinstaller` | Free |
|
||||
| Streamlit | `pip install streamlit` | Free |
|
||||
| Inno Setup (Windows) | https://jrsoftware.org/isinfo.php | Free |
|
||||
| Apple Developer Program (macOS signing) | https://developer.apple.com | $99/yr |
|
||||
| Xcode command line tools (macOS) | `xcode-select --install` | Free |
|
||||
| appimagetool (Linux) | https://appimage.github.io | Free |
|
||||
| GitHub Actions (CI) | github.com | Free tier covers all three OS runners |
|
||||
| Streamlit Community Cloud (demo hosting) | streamlit.io/cloud | Free |
|
||||
| Python libraries | See `requirements.txt`, `pip install -r requirements.txt` | Free |
|
||||
|------|--------|------|
|
||||
| Python | python.org/downloads | Free |
|
||||
| PyInstaller, Streamlit, Python libs | `pip install -r requirements.txt` | Free |
|
||||
| Inno Setup (Windows) | jrsoftware.org/isinfo.php | Free |
|
||||
| Apple Developer Program (macOS) | developer.apple.com | $99/yr |
|
||||
| Xcode CLT (macOS) | `xcode-select --install` | Free |
|
||||
| appimagetool (Linux) | appimage.github.io | Free |
|
||||
| GitHub Actions (CI) | github.com | Free tier covers all 3 OS runners |
|
||||
| Streamlit Community Cloud | streamlit.io/cloud | Free |
|
||||
|
||||
---
|
||||
## 5. Backup recommendation
|
||||
|
||||
## 5. Backup Recommendation
|
||||
- **Primary**: GitHub repository (private). Source of truth.
|
||||
- **Secondary**: ZIP of full project tree on cloud storage (Drive / Dropbox / S3).
|
||||
- **Apple Developer credentials**: cert + app-specific password in a password manager. Re-issuable, not catastrophic.
|
||||
- **Streamlit Community Cloud**: stored as GitHub OAuth link in Streamlit UI. Re-authorize from new account if lost.
|
||||
- Back up after every meaningful change.
|
||||
- **Always include RECOVERY.md + DECISIONS.md** — irreplaceable context.
|
||||
|
||||
- **Primary backup**: GitHub repository (private). Source is the source of truth.
|
||||
- **Secondary backup**: ZIP of the full project tree on cloud storage (Google Drive / Dropbox / S3).
|
||||
- **Apple Developer credentials**: store certificate + app-specific password in a password manager. Losing these requires regenerating, not catastrophic.
|
||||
- **Streamlit Community Cloud connection**: stored in Streamlit's UI as a GitHub OAuth link. Re-authorize from a new Streamlit account if lost.
|
||||
- Back up after every meaningful code or doc change.
|
||||
- Include this `RECOVERY.md` and `DECISIONS.md` in every backup. They contain the irreplaceable context.
|
||||
## 6. Recovery priorities (under time pressure)
|
||||
|
||||
---
|
||||
|
||||
## 6. Recovery Priorities (if rebuilding under time pressure)
|
||||
|
||||
If you only have time to rebuild part of the project, this is the order:
|
||||
|
||||
1. **Source: `src/core/` and `scripts/`**. Without these there is no product.
|
||||
2. **DECISIONS.md**. Without this you will re-litigate every settled decision (especially GUI framework, dual interface, UX standards) and probably get it wrong differently.
|
||||
3. **TECHNICAL.md**, especially Sections 9 (04/06 boundary) and 10 (per-script functional requirements). Without these you will rebuild the deduplicator with weaker fuzzy matching than the v1 launch spec demands and ship something that loses to free Excel.
|
||||
4. **Streamlit GUI source (`src/gui/`)**. The primary buyer surface; without it the product reverts to CLI-only and the buyer persona will refund.
|
||||
5. **PyInstaller spec + launcher + per-OS build configs** (`build/`). Reproducing the Streamlit-PyInstaller integration from scratch is 1-3 days of work.
|
||||
6. **Apple Developer Program enrollment**. 1-2 week lead time. Start this first if Mac distribution matters.
|
||||
7. **Hosted demo (`demo/streamlit_app.py`)**. Important marketing asset but not blocking for desktop sales.
|
||||
8. Documentation files (USER-GUIDE, BUSINESS, README). Recoverable from memory + this guide.
|
||||
9. CI config (`ci/build.yml`). Nice to have, not blocking.
|
||||
1. **`src/core/` + scripts** — without these there is no product.
|
||||
2. **DECISIONS.md** — without this you'll re-litigate every settled call.
|
||||
3. **TECHNICAL.md** §10 (04/06 boundary) + §11 (per-script specs). Without these you'll rebuild dedup with weaker fuzzy than the v1 spec demands and lose to free Excel.
|
||||
4. **`src/gui/`** — primary buyer surface; without it the product reverts to CLI-only and the persona refunds.
|
||||
5. **PyInstaller spec + launcher + per-OS configs** — recreating the Streamlit-PyInstaller integration is 1-3 days.
|
||||
6. **Apple Developer Program enrollment** — 1-2 wk lead. Start first if Mac matters.
|
||||
7. **Hosted demo** — important marketing asset, not blocking for desktop sales.
|
||||
8. Doc files (USER-GUIDE, BUSINESS, README) — recoverable from memory + this guide.
|
||||
9. CI config — nice to have, not blocking.
|
||||
|
||||
Reference in New Issue
Block a user