Two coupled changes:
1. Lite tier
- New Tier.LITE in src/license/schema.py.
- FEATURES_BY_TIER[Tier.LITE] = {Deduplicator, Text Cleaner,
Format Standardizer}. The three universally-useful tools that
cover the most common bookkeeping / RevOps / Klaviyo prep
workflows. Other six tools require Core.
- i18n: license.tier_lite, license.feature_locked_title,
license.feature_locked_body, license.upgrade_link,
license.status_locked (en + es).
- Per-tool feature gate at every GUI tool page
(require_feature_or_render_upgrade) and every tool CLI
(guard(feature=...)). A locked tool renders an upgrade
prompt + Manage-license button (GUI) or exits with code 2
(CLI).
- Home grid: tool cards the user's tier doesn't unlock get a
red 🔒 Locked badge in place of green Ready.
2. Trial removed
- Activation form's "Start 1-year trial" button removed.
- license_cli's `trial` subcommand removed.
- activation.trial_button / activation.trial_help i18n keys
dropped (pack parity test stays green).
- Tier.TRIAL stays in the enum (back-compat with any field-
tested trial licenses); LicenseManager._mint stays internal
for tests and the seller's key generator.
- Decision logged in DECISIONS §9b: a 1-year all-features
trial undercuts paid Lite; paid-only keeps tier economics
clean.
Tests (+29 net): +17 Lite-tier unit/guard tests + 13 Lite-tier
GUI tests + 1 trial-absent assertion - 2 trial CLI tests - 1
trial GUI button test. Total: 1995 → 2024.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
151 lines
6.8 KiB
Markdown
151 lines
6.8 KiB
Markdown
> 🌐 **Language:** English · [Español](USER-GUIDE.es.md)
|
||
|
||
# User Guide
|
||
|
||
**Version**: 1.6 · **Updated**: 2026-05-01
|
||
|
||
## 0. First launch — activation
|
||
|
||
DataTools must be activated before any tools unlock. On first launch you'll see the **Activate** screen.
|
||
|
||
Enter your full name + email, paste the license blob from your purchase email (starts with `DTLIC1:`), and click **Activate**. Renewal works the same way — paste the renewal blob, click **Apply renewal**.
|
||
|
||
**Tiers**:
|
||
|
||
| Tier | Tools |
|
||
|---|---|
|
||
| **Lite** | Deduplicator · Text Cleaner · Format Standardizer |
|
||
| **Core** | All 9 tools |
|
||
|
||
A Lite user opening a Core-only tool sees an "Upgrade your license" prompt. The home page also shows a 🔒 Locked badge on tool cards your tier doesn't unlock. To upgrade, paste a Core blob on the Activate page.
|
||
|
||
Every license lasts 1 year. The sidebar shows your tier and days remaining at all times; a renewal warning appears 30 days before expiry. The license file lives at `~/.datatools/license.json` (Windows: `C:\Users\<you>\.datatools\license.json`).
|
||
|
||
To use the same license on a different machine: deactivate this one (Activate page → **Deactivate this device**) and re-paste your blob on the new machine.
|
||
|
||
## 1. Install
|
||
|
||
You don't need Python — the bundle is self-contained.
|
||
|
||
| OS | File | How |
|
||
|----|------|-----|
|
||
| Windows | `BundleName-Setup-1.0.exe` | Double-click installer → desktop shortcut. |
|
||
| macOS | `BundleName-1.0.dmg` | Mount, drag to Applications. Signed + notarized. |
|
||
| Linux | `BundleName-1.0.AppImage` | `chmod +x`, double-click. (`.tar.gz` fallback available.) |
|
||
|
||
Launching opens your default browser to a local page (`http://localhost:8501`).
|
||
|
||
### How the GUI works
|
||
|
||
- Runs locally on your machine. **No internet, no upload.**
|
||
- Browser is just the display surface. Closing it stops the underlying program.
|
||
- Prefer the terminal? Every tool ships with a CLI too (Section 3).
|
||
|
||
### System requirements
|
||
|
||
- Windows 10/11 (64-bit), macOS 11+, modern Linux (2020+).
|
||
- Modern browser (Chrome, Edge, Firefox, Safari, last 3 years).
|
||
- ~400-500 MB free disk space.
|
||
|
||
Full numbered support matrix: [REQUIREMENTS.md](REQUIREMENTS.md).
|
||
|
||
## 2. What's included
|
||
|
||
| # | Tool | Purpose | Status |
|
||
|---|------|---------|--------|
|
||
| 01 | Deduplicator | Exact + fuzzy match, 5 normalizers, audit | Ready |
|
||
| 02 | Text Cleaner | Whitespace, smart chars, BOM, line endings, case ops | Ready |
|
||
| 03 | Format Standardizer | Dates / phones / emails / addresses / names / currencies / booleans | Ready |
|
||
| 04 | Missing Value Handler | Disguised nulls, imputation, drop-by-threshold | Coming Soon |
|
||
| 05 | Column Mapper | Rename + enforce schema | Coming Soon |
|
||
| 06 | Outlier Detector | z-score, IQR, multivariate | Coming Soon |
|
||
| 07 | Multi-File Merger | Combine multiple files | Coming Soon |
|
||
| 08 | Validator & Reporter | Rules + PDF/Excel report | Coming Soon |
|
||
| 09 | Pipeline Runner | One-click multi-tool launcher | Coming Soon |
|
||
|
||
**Sample data** (`samples/`): `messy_sales.csv`, `bank_export.xlsx`.
|
||
|
||
## 3. Usage
|
||
|
||
### 3.1 GUI (recommended)
|
||
|
||
1. Launch the bundle.
|
||
2. Pick a tool from the sidebar.
|
||
3. Drop your file (or select a sample).
|
||
4. Defaults are pre-filled — click **Run** to preview.
|
||
5. Click **Save Output** to write the cleaned file.
|
||
|
||
Advanced options are tucked in expander panes. The original file is never modified.
|
||
|
||
### 3.2 CLI
|
||
|
||
```bash
|
||
deduplicator customers.csv [--apply]
|
||
text-cleaner messy.csv [--apply]
|
||
format-standardize feed.csv [--apply]
|
||
```
|
||
|
||
Get help: `deduplicator --help`. Full reference: [CLI-REFERENCE.md](CLI-REFERENCE.md).
|
||
|
||
### 3.3 Run order (when running tools manually)
|
||
|
||
If you skip the Pipeline Runner, follow this order:
|
||
|
||
1. **02 Text Cleaner** first — normalizes whitespace + special chars.
|
||
2. **03 Format Standardizer** — dates, phones, etc. need cleaned text.
|
||
3. **04 Missing Value Handler** — sentinel codes hide as numbers.
|
||
4. **05 Column Mapper** — schema before outlier stats.
|
||
5. **06 Outlier Detector** — needs clean numerics. Stats on data with `NaN` or `-999` are mathematically poisoned.
|
||
6. **07 Multi-File Merger**, **08 Validator** as needed.
|
||
7. **01 Deduplicator** is order-flexible (normalizes internally for matching).
|
||
|
||
The Pipeline Runner enforces this automatically.
|
||
|
||
### 3.4 Language
|
||
|
||
The sidebar has a **Language / Idioma** picker. Two packs ship today:
|
||
|
||
- **English** (default)
|
||
- **Español**
|
||
|
||
Pick a language once — the choice persists for the session and the picker is visible from every page. Switch any time; the page re-renders in place with no data loss.
|
||
|
||
**Coverage** (v1.6): home page, tool cards, the upload + analysis panel, the findings list, the Review & Normalize gate prompt, the sidebar picker, and the shutdown screen. Per-tool page bodies (advanced-option labels, column-mapper prompts, dedup review labels) are tracked for future packs — they currently render in English in both modes. If a string you'd expect to switch doesn't, that's a missing pack key, not a bug in the picker; email support with a screenshot.
|
||
|
||
## 4. Review & Normalize gate
|
||
|
||
Every uploaded file is scanned before any tool sees it.
|
||
|
||
**Confidence tiers**:
|
||
- **High** — round-trip safe. One-click "Auto-fix high-confidence" applies them all.
|
||
- **Medium** — usually right, occasional false positives. Preview first.
|
||
- **Low** — heuristic. Off by default; opt in per finding.
|
||
- **Error** — blocks the gate (empty file, U+FFFD, unrepairable rows).
|
||
|
||
**Encoding override**: when the picker reports `encoding_uncertain` or you spot mojibake (`é`) or `<60>` chars, choose the right codepage at the top of the page (cp1252 for Western Excel, KOI8-R for older Russian, Big5 for traditional Chinese, …) → **Re-analyze**.
|
||
|
||
**Advanced output**: an `⚙️` expander on the download lets you tune encoding, delimiter, and line terminator. The download filename auto-adjusts (`.tsv` for tab, `.csv` otherwise).
|
||
|
||
## 5. Output
|
||
|
||
Every run writes:
|
||
- **Cleaned file** next to the input (or wherever you specify).
|
||
- **Audit file** (per-cell changes for text/format tools, match groups for dedup).
|
||
- **Timestamped log** in `logs/`.
|
||
|
||
Original input is never modified.
|
||
|
||
## 6. Troubleshooting
|
||
|
||
- **GUI won't launch / browser doesn't open** — wait 10-15 s; manually visit `http://localhost:8501`. Port-in-use error → close other instances.
|
||
- **Why does my browser open?** — local web app pattern (same as Jupyter, RStudio). Nothing leaves your machine.
|
||
- **Windows SmartScreen** — click "More info" → "Run anyway". Standard for non-EV-signed software.
|
||
- **macOS "App is damaged"** — re-download (file likely corrupted in transit).
|
||
- **Linux AppImage won't run** — `chmod +x file.AppImage`. Missing FUSE → `sudo apt install libfuse2` or use `.tar.gz`.
|
||
- **Slow on large file** — over ~100k rows takes longer; progress bar shows. Multi-million rows → use the CLI directly.
|
||
- **Need help** — email the address on your purchase receipt.
|
||
|
||
## 7. License
|
||
|
||
Single-user. See `LICENSE.txt`.
|