Files
datatools-dev/docs/RECOVERY.md
Michael abb720997e docs: tight, scannable rewrite — every item earns its place
Refactors all 10 docs (README, USER-GUIDE, CLI-REFERENCE, REQUIREMENTS,
TECHNICAL, DEVELOPER, BUSINESS, DECISIONS, RECOVERY, docs/README) from
prose-heavy to bullet-heavy + table-heavy. Same information density,
significantly less reading load.

Net: 2600 → 1652 lines (~37% reduction) WHILE adding the new content
that landed since v1.6:

- Format Standardizer (3rd Ready tool)
- 199-row buyer corpus
- src/core/errors.py structured hierarchy + ensure_dataframe /
  ensure_choice / wrap_file_read|write / format_for_user helpers
- src/core/_constants.py shared USPS/state lookup tables
- Cross-tool audit fixes (NaN matching, removed_df schema, validation,
  enum-bounds checks, forward-compat config)
- Per-domain error_policy across format standardizers
- Inconsistent-date-format detector
- Excel header-row auto-detection + write_file delimiter param

Per-doc changes:

- README.md (175 → 71): 9-tool table at top, status column, 3 CLI
  entry points listed, dropped repeated marketing prose.
- docs/README.md (38 → 27): pure index — buyer-facing vs creator-only
  split + version footer.
- USER-GUIDE.md (208 → 118): tool table replaces script descriptions,
  troubleshooting compressed to bullets, gate explanation tightened.
- CLI-REFERENCE.md (451 → 235): collapsed flag tables, removed
  redundant intro text, kept full recipes section.
- REQUIREMENTS.md (146 → 129): 18 numbered sections (was 17), added
  §18 Error Handling, formatting tightened to single-line entries.
- TECHNICAL.md (570 → 350): collapsed §3 build pipeline tables, merged
  redundant §3.5-3.7 OS sections, added §7 (Error handling) +
  §11.3 (Format Standardizer spec) + §11.4-11.7 (analyzer / gate /
  Review page / repair_bytes promoted from §10.2.x sub-numbering).
- DEVELOPER.md (285 → 161): module map table replaces per-file prose,
  extension recipes condensed, new §Errors covers when to use each
  hierarchy class.
- BUSINESS.md (278 → 225): collapsed prose to tables (use cases,
  competitive landscape, costs, risks); honest-status updated.
- DECISIONS.md (269 → 189): scoring rubric + GUI matrix preserved,
  decision log compressed to single-line entries, added v1.6 entries
  (Format Standardizer Ready, errors module).
- RECOVERY.md (180 → 147): rebuild steps as numbered + tabular,
  external dependencies as one table, recovery priorities tightened.

No information removed; redundancy compressed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 02:49:29 +00:00

148 lines
6.4 KiB
Markdown

# Recovery
> Creator-only. Full project rebuild guide.
> **Version**: 1.6 · **Updated**: 2026-05-01
If lost, this doc + the source ZIP rebuilds the project 100%.
## 1. Project layout
```
project-root/
├── README.md
├── docs/
│ ├── BUSINESS.md # creator-only
│ ├── TECHNICAL.md # creator-only
│ ├── DECISIONS.md # creator-only — locked criteria + decision log
│ ├── DEVELOPER.md # creator-only
│ ├── RECOVERY.md # creator-only (this file)
│ ├── REQUIREMENTS.md
│ ├── USER-GUIDE.md # ships to buyers
│ └── CLI-REFERENCE.md
├── src/
│ ├── core/ # shared logic — both CLI + GUI call into this
│ ├── cli.py # Deduplicator CLI
│ ├── cli_text_clean.py # Text Cleaner CLI
│ ├── cli_analyze.py # Analyzer CLI
│ └── gui/
│ ├── app.py # Streamlit entry
│ ├── pages/ # one page per tool
│ └── components/ # shared widgets
├── samples/ # messy_sales.csv, bank_export.xlsx
├── test-cases/ # corpora: text-cleaner, encodings, format-cleaner
├── tests/ # pytest
├── demo/streamlit_app.py # constrained Streamlit Community Cloud version
├── build/
│ ├── pyinstaller.spec # cross-platform build spec
│ ├── launcher.py # starts Streamlit, opens browser
│ ├── windows/installer.iss
│ ├── macos/{entitlements.plist, dmg_settings.py}
│ └── linux/AppImage/
├── ci/build.yml # GitHub Actions matrix build
└── requirements.txt
```
## 2. Rebuild steps
### From a complete ZIP backup
1. Unzip into a clean directory.
2. Push to GitHub.
3. Tag a release → CI builds Windows / macOS / Linux artifacts.
4. Connect repo to Streamlit Community Cloud → demo deploys.
5. Local builds: see §3.
### From documentation only (worst case)
1. Read **DECISIONS.md** — understand *why* the project is what it is. §4c locks Streamlit; §4b locks UX standards. **Non-negotiable.**
2. Read **TECHNICAL.md** §1-3 (architecture + build pipeline + Streamlit launcher pattern in §3.4).
3. Read **BUSINESS.md** for product strategy + hosted demo as marketing asset.
4. Recreate scripts using:
- USER-GUIDE.md §2 (script table)
- TECHNICAL.md §10 (04/06 boundary — do not relitigate)
- TECHNICAL.md §11 (per-script functional specs; §11.1-11.3 are the v1 launch targets for Ready tools).
5. Set up cross-platform build pipeline (§3 below).
6. Recreate installer configs per TECHNICAL.md §3.5-3.7.
7. Build constrained `demo/streamlit_app.py` (row limit, watermark, sample data).
## 3. Local build setup
### Common
```bash
pip install -r requirements.txt pyinstaller
streamlit run src/gui/app.py # verify GUI
python -m src.cli --help # verify CLI
```
### Windows
- Install Inno Setup: https://jrsoftware.org/isinfo.php
- `pyinstaller build/pyinstaller.spec`
- Open `build/windows/installer.iss` in Inno Setup, compile.
### macOS
1. `xcode-select --install`
2. Enroll in Apple Developer Program ($99/yr — 1-2 wk first time).
3. Generate Developer ID cert, install in Keychain.
4. Generate app-specific password for `notarytool`.
5. `pyinstaller build/pyinstaller.spec`
6. `codesign --deep --force --options runtime --sign "Developer ID Application: [Name]" dist/App.app`
7. Package as DMG.
8. `xcrun notarytool submit *.dmg --wait`
9. `xcrun stapler staple *.dmg`
### Linux
- Download `appimagetool` from https://appimage.github.io
- `pyinstaller build/pyinstaller.spec`
- Wrap as AppImage via assets in `build/linux/AppImage/`.
### Streamlit + PyInstaller notes
- Custom `hook-streamlit.py` required.
- Hidden imports: `streamlit`, `altair`, `pyarrow` (and submodules where auto-detection fails).
- The PyInstaller entry point is `build/launcher.py`, **not** the Streamlit script directly.
- Budget 1-3 days first time. Reusable across all bundles.
### CI build (recommended)
```bash
git tag v1.0.0 && git push --tags
# GitHub Actions runs the matrix → 3 platform artifacts on Releases page.
# Manual: download → upload to Gumroad / Lemon Squeezy.
```
### Hosted demo deployment
- Connect GitHub repo to Streamlit Community Cloud (one-time, free).
- Configure deployment → `demo/streamlit_app.py`.
- Auto-updates on push to configured branch.
- Custom domain optional via CNAME.
## 4. External dependencies
| Item | Source | Cost |
|------|--------|------|
| Python | python.org/downloads | Free |
| PyInstaller, Streamlit, Python libs | `pip install -r requirements.txt` | Free |
| Inno Setup (Windows) | jrsoftware.org/isinfo.php | Free |
| Apple Developer Program (macOS) | developer.apple.com | $99/yr |
| Xcode CLT (macOS) | `xcode-select --install` | Free |
| appimagetool (Linux) | appimage.github.io | Free |
| GitHub Actions (CI) | github.com | Free tier covers all 3 OS runners |
| Streamlit Community Cloud | streamlit.io/cloud | Free |
## 5. Backup recommendation
- **Primary**: GitHub repository (private). Source of truth.
- **Secondary**: ZIP of full project tree on cloud storage (Drive / Dropbox / S3).
- **Apple Developer credentials**: cert + app-specific password in a password manager. Re-issuable, not catastrophic.
- **Streamlit Community Cloud**: stored as GitHub OAuth link in Streamlit UI. Re-authorize from new account if lost.
- Back up after every meaningful change.
- **Always include RECOVERY.md + DECISIONS.md** — irreplaceable context.
## 6. Recovery priorities (under time pressure)
1. **`src/core/` + scripts** — without these there is no product.
2. **DECISIONS.md** — without this you'll re-litigate every settled call.
3. **TECHNICAL.md** §10 (04/06 boundary) + §11 (per-script specs). Without these you'll rebuild dedup with weaker fuzzy than the v1 spec demands and lose to free Excel.
4. **`src/gui/`** — primary buyer surface; without it the product reverts to CLI-only and the persona refunds.
5. **PyInstaller spec + launcher + per-OS configs** — recreating the Streamlit-PyInstaller integration is 1-3 days.
6. **Apple Developer Program enrollment** — 1-2 wk lead. Start first if Mac matters.
7. **Hosted demo** — important marketing asset, not blocking for desktop sales.
8. Doc files (USER-GUIDE, BUSINESS, README) — recoverable from memory + this guide.
9. CI config — nice to have, not blocking.