User-facing docs (USER-GUIDE en+es, README en+es): - New short paragraph under §3.1 GUI noting the in-tool Help button on every detail page, what it contains (When to use / Steps / Examples / Tip), and that content lives in tools.<id>.help_md. - One-line note in the README tool tables pointing at the same. - Mention the sidebar +/- nav indicators replacing Streamlit's default Material Symbols chevron. Developer docs: - DEVELOPER: new "Tool page header" subsection documenting render_tool_header(tool_id), the help_md markdown skeleton, and the fallback to help.missing_body when a tool's help is absent. Update i18n authoring rules to list help.* keys and the per-tool help_md field alongside name/description/page_title/page_caption. - TECHNICAL: new §10c documenting the sidebar nav indicator swap — CSS in _HIDE_CHROME_CSS plus _SWAP_NAV_SECTION_INDICATOR_JS injected through the hide_streamlit_chrome() iframe bundle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
216 lines
13 KiB
Markdown
216 lines
13 KiB
Markdown
> 🌐 **Language:** English · [Español](USER-GUIDE.es.md)
|
||
|
||
# User Guide
|
||
|
||
**Version**: 1.6 · **Updated**: 2026-05-01
|
||
|
||
## 0. First launch — activation
|
||
|
||
DataTools must be activated before any tools unlock. On first launch you'll see the **Activate** screen.
|
||
|
||
Enter your full name + email, paste the license blob from your purchase email (starts with `DTLIC1:`), and click **Activate**. Renewal works the same way — paste the renewal blob, click **Apply renewal**.
|
||
|
||
**Tiers**:
|
||
|
||
| Tier | Tools |
|
||
|---|---|
|
||
| **Lite** | Find Duplicates · Clean Text · Standardize Formats |
|
||
| **Core** | All 9 tools |
|
||
|
||
A Lite user opening a Core-only tool sees an "Upgrade your license" prompt. The home page also shows a 🔒 Locked badge on tool cards your tier doesn't unlock. To upgrade, paste a Core blob on the Activate page.
|
||
|
||
Every license lasts 1 year. The sidebar shows your tier and days remaining at all times; a renewal warning appears 30 days before expiry. The license file lives at `~/.datatools/license.json` (Windows: `C:\Users\<you>\.datatools\license.json`).
|
||
|
||
To use the same license on a different machine: deactivate this one (Activate page → **Deactivate this device**) and re-paste your blob on the new machine.
|
||
|
||
## 1. Install
|
||
|
||
You don't need Python and you don't need admin rights — the bundle ships its own interpreter and every dependency. Two flavors per OS, pick whichever your IT policy allows:
|
||
|
||
- **Installer** — wires up Desktop shortcut + Start Menu / Launchpad entry automatically. Recommended for most users.
|
||
- **Portable .zip** — unzip and double-click. No registry writes, runs from anywhere (Desktop, USB stick, network share). Use this if you can't run installers, want a single-folder install you can copy between machines, or are evaluating before committing to install.
|
||
|
||
Both flavors are byte-identical inside: same Python, same dependencies, same launch behavior.
|
||
|
||
### 1.1 Windows
|
||
|
||
**Option A — Installer (`DataTools-<ver>-win-setup.exe`)**
|
||
|
||
1. Download `DataTools-<ver>-win-setup.exe` from your release email or GitHub Releases.
|
||
2. Double-click the installer. On the first run Windows SmartScreen will say **"Windows protected your PC"** — click **More info** → **Run anyway**. (This warning only appears once per build until we have an EV code-signing cert.)
|
||
3. Accept the per-user install location (`%LOCALAPPDATA%\Programs\DataTools` by default — no admin prompt). Check **Create a desktop shortcut** if you want one (on by default).
|
||
4. Click **Install**, then **Finish**. The installer offers to launch DataTools immediately.
|
||
5. From now on launch from: **Start Menu → DataTools**, the **Desktop shortcut**, or just type `DataTools` into Windows Run (Win+R) / cmd.
|
||
|
||
To pin to the taskbar, launch the app once, right-click its icon in the taskbar, then **Pin to taskbar**. Windows requires this manual step — no installer is allowed to pin programmatically.
|
||
|
||
**Option B — Portable (`DataTools-<ver>-win-portable.zip`)**
|
||
|
||
1. Download `DataTools-<ver>-win-portable.zip`.
|
||
2. Right-click the .zip → **Extract All…** → pick a folder (e.g. `C:\Tools\DataTools`).
|
||
3. Open the extracted `DataTools\` folder, double-click `DataTools.exe`. SmartScreen warning fires the first time only.
|
||
4. To create your own desktop shortcut later: right-click `DataTools.exe` → **Send to → Desktop (create shortcut)**.
|
||
|
||
**Uninstall** (installer only): Settings → Apps → DataTools → Uninstall. Portable: delete the folder.
|
||
|
||
### 1.2 macOS
|
||
|
||
**Option A — Installer DMG (`DataTools-<ver>-mac.dmg`)**
|
||
|
||
1. Download `DataTools-<ver>-mac.dmg`.
|
||
2. Double-click the .dmg. A Finder window opens showing the **DataTools** icon and an **Applications** alias.
|
||
3. Drag **DataTools** onto **Applications**. Wait for the copy to finish, then eject the DMG.
|
||
4. On unsigned builds the first launch shows **"DataTools" cannot be opened because the developer cannot be verified**. Fix: right-click DataTools in /Applications → **Open** → confirm **Open** in the dialog. macOS remembers this choice — subsequent launches are clean.
|
||
5. Launch from **Launchpad**, **Spotlight** (`⌘ Space` → type "DataTools"), or **Applications** in Finder.
|
||
|
||
To keep DataTools in the Dock: launch the app, right-click its Dock icon → **Options → Keep in Dock**. macOS doesn't allow installers to pin to the Dock automatically.
|
||
|
||
**Option B — Portable (`DataTools-<ver>-mac-portable.zip`)**
|
||
|
||
1. Download `DataTools-<ver>-mac-portable.zip`. Safari auto-unzips on download; in Finder you'll see `DataTools.app` directly.
|
||
2. Move `DataTools.app` to **Applications** if you want it discoverable via Launchpad — or keep it on your Desktop, a USB stick, or a network share. The portable .app runs from anywhere.
|
||
3. Double-click `DataTools.app`. Right-click → **Open** the first time (same unsigned-build dance as the DMG).
|
||
|
||
**Uninstall**: drag `DataTools.app` to the Trash. Your data files stay where you put them — nothing else is installed.
|
||
|
||
### 1.3 Linux
|
||
|
||
`DataTools-<ver>-linux-x86_64.AppImage` is already portable — no separate zip needed.
|
||
|
||
1. Download the .AppImage.
|
||
2. `chmod +x DataTools-*.AppImage`.
|
||
3. Double-click, or run it from a terminal.
|
||
|
||
If your distro doesn't ship FUSE 2: `sudo apt install libfuse2` (Debian/Ubuntu) or equivalent.
|
||
|
||
### 1.4 What happens on first launch
|
||
|
||
The launcher (called `DataTools.exe` / `DataTools.app` / `DataTools.AppImage`) does three things, in order:
|
||
|
||
1. Picks a free TCP port on `127.0.0.1` — usually 8501, falls back through 8502, 8503, … if another app is using 8501.
|
||
2. Starts a local Streamlit server on that port. The server is **bound to localhost only**, never to your LAN.
|
||
3. Opens your default browser at `http://127.0.0.1:<port>/`. If the browser doesn't open within 5 seconds, paste that URL into your browser manually.
|
||
|
||
The launcher window stays open in the background. Closing it stops the server — the browser tab will say "this site can't be reached" the next time you click it.
|
||
|
||
### 1.5 How the GUI works
|
||
|
||
- Runs locally on your machine. **No internet, no upload.**
|
||
- The browser is just the display surface. Closing it does NOT stop the app — close the launcher window (or quit the macOS .app from the Dock) to fully exit.
|
||
- Prefer the terminal? Every tool ships with a CLI too (Section 3).
|
||
|
||
### 1.6 System requirements
|
||
|
||
- Windows 10/11 (64-bit), macOS 11+, modern Linux (2020+).
|
||
- Modern browser (Chrome, Edge, Firefox, Safari, last 3 years).
|
||
- ~400 MB free disk space (the bundle itself is ~200 MB; the rest is working scratch space for large CSVs).
|
||
|
||
Full numbered support matrix: [REQUIREMENTS.md](REQUIREMENTS.md).
|
||
|
||
## 2. What's included
|
||
|
||
| # | Tool | Purpose | Status |
|
||
|---|------|---------|--------|
|
||
| 01 | Find Duplicates | Exact + fuzzy match, 5 normalizers, audit | Ready |
|
||
| 02 | Clean Text | Whitespace, smart chars, BOM, line endings, case ops | Ready |
|
||
| 03 | Standardize Formats | Dates / phones / emails / addresses / names / currencies / booleans | Ready |
|
||
| 04 | Fix Missing Values | Disguised nulls, imputation, drop-by-threshold | Coming Soon |
|
||
| 05 | Map Columns | Rename + enforce schema | Coming Soon |
|
||
| 06 | Find Unusual Values | z-score, IQR, multivariate | Coming Soon |
|
||
| 07 | Combine Files | Combine multiple files | Coming Soon |
|
||
| 08 | Quality Check | Rules + PDF/Excel report | Coming Soon |
|
||
| 09 | Automated Workflows | One-click multi-tool launcher | Coming Soon |
|
||
|
||
**Sample data** (`samples/`): `messy_sales.csv`, `bank_export.xlsx`.
|
||
|
||
## 3. Usage
|
||
|
||
### 3.1 GUI (recommended)
|
||
|
||
1. Launch the bundle.
|
||
2. Pick a tool from the sidebar.
|
||
3. Drop your file (or select a sample).
|
||
4. Defaults are pre-filled — click **Run** to preview.
|
||
5. Click **Save Output** to write the cleaned file.
|
||
|
||
Advanced options are tucked in expander panes. The original file is never modified.
|
||
|
||
**In-tool Help**: every tool page has a **Help** button right of the title. Click it to open a popover with a compact how-to (When to use · Steps · Examples · Tip). Use it as a refresher mid-task — the popover closes when you click outside, your inputs are untouched.
|
||
|
||
**Sidebar nav**: the sidebar groups tools into sections (Analysis, Data Cleaners, Transformations, Automations). Each section header shows `+` when collapsed and `−` when expanded — click the header to toggle.
|
||
|
||
### 3.2 CLI
|
||
|
||
```bash
|
||
deduplicator customers.csv [--apply]
|
||
text-cleaner messy.csv [--apply]
|
||
format-standardize feed.csv [--apply]
|
||
```
|
||
|
||
Get help: `deduplicator --help`. Full reference: [CLI-REFERENCE.md](CLI-REFERENCE.md).
|
||
|
||
### 3.3 Run order (when running tools manually)
|
||
|
||
If you skip Automated Workflows, follow this order:
|
||
|
||
1. **02 Clean Text** first — normalizes whitespace + special chars.
|
||
2. **03 Standardize Formats** — dates, phones, etc. need cleaned text.
|
||
3. **04 Fix Missing Values** — sentinel codes hide as numbers.
|
||
4. **05 Map Columns** — schema before outlier stats.
|
||
5. **06 Find Unusual Values** — needs clean numerics. Stats on data with `NaN` or `-999` are mathematically poisoned.
|
||
6. **07 Combine Files**, **08 Quality Check** as needed.
|
||
7. **01 Find Duplicates** is order-flexible (normalizes internally for matching).
|
||
|
||
Automated Workflows enforces this automatically.
|
||
|
||
### 3.4 Language
|
||
|
||
The sidebar has a **Language / Idioma** picker. Two packs ship today:
|
||
|
||
- **English** (default)
|
||
- **Español**
|
||
|
||
Pick a language once — the choice persists for the session and the picker is visible from every page. Switch any time; the page re-renders in place with no data loss.
|
||
|
||
**Coverage** (v1.6): home page, tool cards, the upload + analysis panel, the findings list, the Review & Normalize gate prompt, the sidebar picker, and the shutdown screen. Per-tool page bodies (advanced-option labels, column-mapper prompts, dedup review labels) are tracked for future packs — they currently render in English in both modes. If a string you'd expect to switch doesn't, that's a missing pack key, not a bug in the picker; email support with a screenshot.
|
||
|
||
## 4. Review & Normalize gate
|
||
|
||
Every uploaded file is scanned before any tool sees it.
|
||
|
||
**Confidence tiers**:
|
||
- **High** — round-trip safe. One-click "Auto-fix high-confidence" applies them all.
|
||
- **Medium** — usually right, occasional false positives. Preview first.
|
||
- **Low** — heuristic. Off by default; opt in per finding.
|
||
- **Error** — blocks the gate (empty file, U+FFFD, unrepairable rows).
|
||
|
||
**Encoding override**: when the picker reports `encoding_uncertain` or you spot mojibake (`é`) or `<60>` chars, choose the right codepage at the top of the page (cp1252 for Western Excel, KOI8-R for older Russian, Big5 for traditional Chinese, …) → **Re-analyze**.
|
||
|
||
**Advanced output**: an `⚙️` expander on the download lets you tune encoding, delimiter, and line terminator. The download filename auto-adjusts (`.tsv` for tab, `.csv` otherwise).
|
||
|
||
## 5. Output
|
||
|
||
Every run writes:
|
||
- **Cleaned file** next to the input (or wherever you specify).
|
||
- **Audit file** (per-cell changes for text/format tools, match groups for dedup).
|
||
- **Timestamped log** in `logs/`.
|
||
|
||
Original input is never modified.
|
||
|
||
## 6. Troubleshooting
|
||
|
||
- **GUI won't launch / browser doesn't open** — wait 10-15 s; manually visit `http://127.0.0.1:8501` (or whichever port the launcher window prints). Port-in-use error → close other instances. The launcher walks ports 8501–8550 looking for a free one, so a stale instance can shift the URL.
|
||
- **Why does my browser open?** — local web app pattern (same as Jupyter, RStudio). Nothing leaves your machine.
|
||
- **Windows SmartScreen** — click "More info" → "Run anyway". One-time per build until we have an EV-signed cert.
|
||
- **macOS "App is damaged" / "developer cannot be verified"** — right-click the app → **Open** → confirm. If the message persists, the file was likely corrupted in transit — re-download. As a last resort: `xattr -cr /Applications/DataTools.app` clears the quarantine attribute.
|
||
- **macOS portable .zip — extracted but won't open** — Safari unzips on download by default; if you see a `__MACOSX/` folder or `._DataTools.app` file you used a different unarchiver. Re-extract with the built-in Archive Utility (right-click the .zip → **Open With → Archive Utility**) so the .app's metadata is preserved.
|
||
- **Windows portable .zip — antivirus quarantines DataTools.exe** — your AV doesn't recognize the bundle. Allowlist the extracted folder. The installer .exe trips fewer AV products because it's a known Inno Setup wrapper.
|
||
- **Linux AppImage won't run** — `chmod +x file.AppImage`. Missing FUSE → `sudo apt install libfuse2`.
|
||
- **Slow on large file** — over ~100k rows takes longer; progress bar shows. Multi-million rows → use the CLI directly.
|
||
- **Where does the app store my license / settings?** — `~/.datatools/` on macOS + Linux, `C:\Users\<you>\.datatools\` on Windows. Your input/output files stay where you put them; the app never copies them anywhere else.
|
||
- **Need help** — email the address on your purchase receipt.
|
||
|
||
## 7. License
|
||
|
||
Single-user. See `LICENSE.txt`.
|