Files
datatools-dev/docs/USER-GUIDE.md
Michael 2bd94c4441 docs: document installer + portable downloads in en/es
Repo READMEs now show both download flavors side-by-side with
first-launch warnings (SmartScreen, Gatekeeper) and link to the
deeper walkthrough.

USER-GUIDE §1 rewritten from a 9-line stub into six subsections:
- §1.1 Windows: installer (5 steps) + portable (4 steps)
- §1.2 macOS:   DMG (5 steps incl. right-click-Open) + portable
- §1.3 Linux:   AppImage flow (unchanged)
- §1.4 First-launch: port selection, localhost binding, browser open
- §1.5 How the GUI works
- §1.6 System requirements

§6 Troubleshooting picks up portable-specific items: Safari unzip
quirks, antivirus quarantine on Win portable, license file location.

docs/README and Spanish mirrors updated to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 19:30:28 +00:00

212 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
> 🌐 **Language:** English · [Español](USER-GUIDE.es.md)
# User Guide
**Version**: 1.6 · **Updated**: 2026-05-01
## 0. First launch — activation
DataTools must be activated before any tools unlock. On first launch you'll see the **Activate** screen.
Enter your full name + email, paste the license blob from your purchase email (starts with `DTLIC1:`), and click **Activate**. Renewal works the same way — paste the renewal blob, click **Apply renewal**.
**Tiers**:
| Tier | Tools |
|---|---|
| **Lite** | Find Duplicates · Clean Text · Standardize Formats |
| **Core** | All 9 tools |
A Lite user opening a Core-only tool sees an "Upgrade your license" prompt. The home page also shows a 🔒 Locked badge on tool cards your tier doesn't unlock. To upgrade, paste a Core blob on the Activate page.
Every license lasts 1 year. The sidebar shows your tier and days remaining at all times; a renewal warning appears 30 days before expiry. The license file lives at `~/.datatools/license.json` (Windows: `C:\Users\<you>\.datatools\license.json`).
To use the same license on a different machine: deactivate this one (Activate page → **Deactivate this device**) and re-paste your blob on the new machine.
## 1. Install
You don't need Python and you don't need admin rights — the bundle ships its own interpreter and every dependency. Two flavors per OS, pick whichever your IT policy allows:
- **Installer** — wires up Desktop shortcut + Start Menu / Launchpad entry automatically. Recommended for most users.
- **Portable .zip** — unzip and double-click. No registry writes, runs from anywhere (Desktop, USB stick, network share). Use this if you can't run installers, want a single-folder install you can copy between machines, or are evaluating before committing to install.
Both flavors are byte-identical inside: same Python, same dependencies, same launch behavior.
### 1.1 Windows
**Option A — Installer (`DataTools-<ver>-win-setup.exe`)**
1. Download `DataTools-<ver>-win-setup.exe` from your release email or GitHub Releases.
2. Double-click the installer. On the first run Windows SmartScreen will say **"Windows protected your PC"** — click **More info****Run anyway**. (This warning only appears once per build until we have an EV code-signing cert.)
3. Accept the per-user install location (`%LOCALAPPDATA%\Programs\DataTools` by default — no admin prompt). Check **Create a desktop shortcut** if you want one (on by default).
4. Click **Install**, then **Finish**. The installer offers to launch DataTools immediately.
5. From now on launch from: **Start Menu → DataTools**, the **Desktop shortcut**, or just type `DataTools` into Windows Run (Win+R) / cmd.
To pin to the taskbar, launch the app once, right-click its icon in the taskbar, then **Pin to taskbar**. Windows requires this manual step — no installer is allowed to pin programmatically.
**Option B — Portable (`DataTools-<ver>-win-portable.zip`)**
1. Download `DataTools-<ver>-win-portable.zip`.
2. Right-click the .zip → **Extract All…** → pick a folder (e.g. `C:\Tools\DataTools`).
3. Open the extracted `DataTools\` folder, double-click `DataTools.exe`. SmartScreen warning fires the first time only.
4. To create your own desktop shortcut later: right-click `DataTools.exe`**Send to → Desktop (create shortcut)**.
**Uninstall** (installer only): Settings → Apps → DataTools → Uninstall. Portable: delete the folder.
### 1.2 macOS
**Option A — Installer DMG (`DataTools-<ver>-mac.dmg`)**
1. Download `DataTools-<ver>-mac.dmg`.
2. Double-click the .dmg. A Finder window opens showing the **DataTools** icon and an **Applications** alias.
3. Drag **DataTools** onto **Applications**. Wait for the copy to finish, then eject the DMG.
4. On unsigned builds the first launch shows **"DataTools" cannot be opened because the developer cannot be verified**. Fix: right-click DataTools in /Applications → **Open** → confirm **Open** in the dialog. macOS remembers this choice — subsequent launches are clean.
5. Launch from **Launchpad**, **Spotlight** (`⌘ Space` → type "DataTools"), or **Applications** in Finder.
To keep DataTools in the Dock: launch the app, right-click its Dock icon → **Options → Keep in Dock**. macOS doesn't allow installers to pin to the Dock automatically.
**Option B — Portable (`DataTools-<ver>-mac-portable.zip`)**
1. Download `DataTools-<ver>-mac-portable.zip`. Safari auto-unzips on download; in Finder you'll see `DataTools.app` directly.
2. Move `DataTools.app` to **Applications** if you want it discoverable via Launchpad — or keep it on your Desktop, a USB stick, or a network share. The portable .app runs from anywhere.
3. Double-click `DataTools.app`. Right-click → **Open** the first time (same unsigned-build dance as the DMG).
**Uninstall**: drag `DataTools.app` to the Trash. Your data files stay where you put them — nothing else is installed.
### 1.3 Linux
`DataTools-<ver>-linux-x86_64.AppImage` is already portable — no separate zip needed.
1. Download the .AppImage.
2. `chmod +x DataTools-*.AppImage`.
3. Double-click, or run it from a terminal.
If your distro doesn't ship FUSE 2: `sudo apt install libfuse2` (Debian/Ubuntu) or equivalent.
### 1.4 What happens on first launch
The launcher (called `DataTools.exe` / `DataTools.app` / `DataTools.AppImage`) does three things, in order:
1. Picks a free TCP port on `127.0.0.1` — usually 8501, falls back through 8502, 8503, … if another app is using 8501.
2. Starts a local Streamlit server on that port. The server is **bound to localhost only**, never to your LAN.
3. Opens your default browser at `http://127.0.0.1:<port>/`. If the browser doesn't open within 5 seconds, paste that URL into your browser manually.
The launcher window stays open in the background. Closing it stops the server — the browser tab will say "this site can't be reached" the next time you click it.
### 1.5 How the GUI works
- Runs locally on your machine. **No internet, no upload.**
- The browser is just the display surface. Closing it does NOT stop the app — close the launcher window (or quit the macOS .app from the Dock) to fully exit.
- Prefer the terminal? Every tool ships with a CLI too (Section 3).
### 1.6 System requirements
- Windows 10/11 (64-bit), macOS 11+, modern Linux (2020+).
- Modern browser (Chrome, Edge, Firefox, Safari, last 3 years).
- ~400 MB free disk space (the bundle itself is ~200 MB; the rest is working scratch space for large CSVs).
Full numbered support matrix: [REQUIREMENTS.md](REQUIREMENTS.md).
## 2. What's included
| # | Tool | Purpose | Status |
|---|------|---------|--------|
| 01 | Find Duplicates | Exact + fuzzy match, 5 normalizers, audit | Ready |
| 02 | Clean Text | Whitespace, smart chars, BOM, line endings, case ops | Ready |
| 03 | Standardize Formats | Dates / phones / emails / addresses / names / currencies / booleans | Ready |
| 04 | Fix Missing Values | Disguised nulls, imputation, drop-by-threshold | Coming Soon |
| 05 | Map Columns | Rename + enforce schema | Coming Soon |
| 06 | Find Unusual Values | z-score, IQR, multivariate | Coming Soon |
| 07 | Combine Files | Combine multiple files | Coming Soon |
| 08 | Quality Check | Rules + PDF/Excel report | Coming Soon |
| 09 | Automated Workflows | One-click multi-tool launcher | Coming Soon |
**Sample data** (`samples/`): `messy_sales.csv`, `bank_export.xlsx`.
## 3. Usage
### 3.1 GUI (recommended)
1. Launch the bundle.
2. Pick a tool from the sidebar.
3. Drop your file (or select a sample).
4. Defaults are pre-filled — click **Run** to preview.
5. Click **Save Output** to write the cleaned file.
Advanced options are tucked in expander panes. The original file is never modified.
### 3.2 CLI
```bash
deduplicator customers.csv [--apply]
text-cleaner messy.csv [--apply]
format-standardize feed.csv [--apply]
```
Get help: `deduplicator --help`. Full reference: [CLI-REFERENCE.md](CLI-REFERENCE.md).
### 3.3 Run order (when running tools manually)
If you skip Automated Workflows, follow this order:
1. **02 Clean Text** first — normalizes whitespace + special chars.
2. **03 Standardize Formats** — dates, phones, etc. need cleaned text.
3. **04 Fix Missing Values** — sentinel codes hide as numbers.
4. **05 Map Columns** — schema before outlier stats.
5. **06 Find Unusual Values** — needs clean numerics. Stats on data with `NaN` or `-999` are mathematically poisoned.
6. **07 Combine Files**, **08 Quality Check** as needed.
7. **01 Find Duplicates** is order-flexible (normalizes internally for matching).
Automated Workflows enforces this automatically.
### 3.4 Language
The sidebar has a **Language / Idioma** picker. Two packs ship today:
- **English** (default)
- **Español**
Pick a language once — the choice persists for the session and the picker is visible from every page. Switch any time; the page re-renders in place with no data loss.
**Coverage** (v1.6): home page, tool cards, the upload + analysis panel, the findings list, the Review & Normalize gate prompt, the sidebar picker, and the shutdown screen. Per-tool page bodies (advanced-option labels, column-mapper prompts, dedup review labels) are tracked for future packs — they currently render in English in both modes. If a string you'd expect to switch doesn't, that's a missing pack key, not a bug in the picker; email support with a screenshot.
## 4. Review & Normalize gate
Every uploaded file is scanned before any tool sees it.
**Confidence tiers**:
- **High** — round-trip safe. One-click "Auto-fix high-confidence" applies them all.
- **Medium** — usually right, occasional false positives. Preview first.
- **Low** — heuristic. Off by default; opt in per finding.
- **Error** — blocks the gate (empty file, U+FFFD, unrepairable rows).
**Encoding override**: when the picker reports `encoding_uncertain` or you spot mojibake (`é`) or `<60>` chars, choose the right codepage at the top of the page (cp1252 for Western Excel, KOI8-R for older Russian, Big5 for traditional Chinese, …) → **Re-analyze**.
**Advanced output**: an `⚙️` expander on the download lets you tune encoding, delimiter, and line terminator. The download filename auto-adjusts (`.tsv` for tab, `.csv` otherwise).
## 5. Output
Every run writes:
- **Cleaned file** next to the input (or wherever you specify).
- **Audit file** (per-cell changes for text/format tools, match groups for dedup).
- **Timestamped log** in `logs/`.
Original input is never modified.
## 6. Troubleshooting
- **GUI won't launch / browser doesn't open** — wait 10-15 s; manually visit `http://127.0.0.1:8501` (or whichever port the launcher window prints). Port-in-use error → close other instances. The launcher walks ports 85018550 looking for a free one, so a stale instance can shift the URL.
- **Why does my browser open?** — local web app pattern (same as Jupyter, RStudio). Nothing leaves your machine.
- **Windows SmartScreen** — click "More info" → "Run anyway". One-time per build until we have an EV-signed cert.
- **macOS "App is damaged" / "developer cannot be verified"** — right-click the app → **Open** → confirm. If the message persists, the file was likely corrupted in transit — re-download. As a last resort: `xattr -cr /Applications/DataTools.app` clears the quarantine attribute.
- **macOS portable .zip — extracted but won't open** — Safari unzips on download by default; if you see a `__MACOSX/` folder or `._DataTools.app` file you used a different unarchiver. Re-extract with the built-in Archive Utility (right-click the .zip → **Open With → Archive Utility**) so the .app's metadata is preserved.
- **Windows portable .zip — antivirus quarantines DataTools.exe** — your AV doesn't recognize the bundle. Allowlist the extracted folder. The installer .exe trips fewer AV products because it's a known Inno Setup wrapper.
- **Linux AppImage won't run** — `chmod +x file.AppImage`. Missing FUSE → `sudo apt install libfuse2`.
- **Slow on large file** — over ~100k rows takes longer; progress bar shows. Multi-million rows → use the CLI directly.
- **Where does the app store my license / settings?** — `~/.datatools/` on macOS + Linux, `C:\Users\<you>\.datatools\` on Windows. Your input/output files stay where you put them; the app never copies them anywhere else.
- **Need help** — email the address on your purchase receipt.
## 7. License
Single-user. See `LICENSE.txt`.