> 🌐 **Language:** English Β· [EspaΓ±ol](USER-GUIDE.es.md) # User Guide **Version**: 1.6 Β· **Updated**: 2026-05-01 ## 0. First launch β€” activation DataTools must be activated before any tools unlock. On first launch you'll see the **Activate** screen. Enter your full name + email, paste the license blob from your purchase email (starts with `DTLIC1:`), and click **Activate**. Renewal works the same way β€” paste the renewal blob, click **Apply renewal**. **Tiers**: | Tier | Tools | |---|---| | **Lite** | Find Duplicates Β· Clean Text Β· Standardize Formats | | **Core** | All 9 tools | A Lite user opening a Core-only tool sees an "Upgrade your license" prompt. The home page also shows a πŸ”’ Locked badge on tool cards your tier doesn't unlock. To upgrade, paste a Core blob on the Activate page. Every license lasts 1 year. The sidebar shows your tier and days remaining at all times; a renewal warning appears 30 days before expiry. The license file lives at `~/.datatools/license.json` (Windows: `C:\Users\\.datatools\license.json`). To use the same license on a different machine: deactivate this one (Activate page β†’ **Deactivate this device**) and re-paste your blob on the new machine. ## 1. Install You don't need Python β€” the bundle is self-contained. | OS | File | How | |----|------|-----| | Windows | `BundleName-Setup-1.0.exe` | Double-click installer β†’ desktop shortcut. | | macOS | `BundleName-1.0.dmg` | Mount, drag to Applications. Signed + notarized. | | Linux | `BundleName-1.0.AppImage` | `chmod +x`, double-click. (`.tar.gz` fallback available.) | Launching opens your default browser to a local page (`http://localhost:8501`). ### How the GUI works - Runs locally on your machine. **No internet, no upload.** - Browser is just the display surface. Closing it stops the underlying program. - Prefer the terminal? Every tool ships with a CLI too (Section 3). ### System requirements - Windows 10/11 (64-bit), macOS 11+, modern Linux (2020+). - Modern browser (Chrome, Edge, Firefox, Safari, last 3 years). - ~400-500 MB free disk space. Full numbered support matrix: [REQUIREMENTS.md](REQUIREMENTS.md). ## 2. What's included | # | Tool | Purpose | Status | |---|------|---------|--------| | 01 | Find Duplicates | Exact + fuzzy match, 5 normalizers, audit | Ready | | 02 | Clean Text | Whitespace, smart chars, BOM, line endings, case ops | Ready | | 03 | Standardize Formats | Dates / phones / emails / addresses / names / currencies / booleans | Ready | | 04 | Fix Missing Values | Disguised nulls, imputation, drop-by-threshold | Coming Soon | | 05 | Map Columns | Rename + enforce schema | Coming Soon | | 06 | Find Unusual Values | z-score, IQR, multivariate | Coming Soon | | 07 | Combine Files | Combine multiple files | Coming Soon | | 08 | Quality Check | Rules + PDF/Excel report | Coming Soon | | 09 | Automated Workflows | One-click multi-tool launcher | Coming Soon | **Sample data** (`samples/`): `messy_sales.csv`, `bank_export.xlsx`. ## 3. Usage ### 3.1 GUI (recommended) 1. Launch the bundle. 2. Pick a tool from the sidebar. 3. Drop your file (or select a sample). 4. Defaults are pre-filled β€” click **Run** to preview. 5. Click **Save Output** to write the cleaned file. Advanced options are tucked in expander panes. The original file is never modified. ### 3.2 CLI ```bash deduplicator customers.csv [--apply] text-cleaner messy.csv [--apply] format-standardize feed.csv [--apply] ``` Get help: `deduplicator --help`. Full reference: [CLI-REFERENCE.md](CLI-REFERENCE.md). ### 3.3 Run order (when running tools manually) If you skip Automated Workflows, follow this order: 1. **02 Clean Text** first β€” normalizes whitespace + special chars. 2. **03 Standardize Formats** β€” dates, phones, etc. need cleaned text. 3. **04 Fix Missing Values** β€” sentinel codes hide as numbers. 4. **05 Map Columns** β€” schema before outlier stats. 5. **06 Find Unusual Values** β€” needs clean numerics. Stats on data with `NaN` or `-999` are mathematically poisoned. 6. **07 Combine Files**, **08 Quality Check** as needed. 7. **01 Find Duplicates** is order-flexible (normalizes internally for matching). Automated Workflows enforces this automatically. ### 3.4 Language The sidebar has a **Language / Idioma** picker. Two packs ship today: - **English** (default) - **EspaΓ±ol** Pick a language once β€” the choice persists for the session and the picker is visible from every page. Switch any time; the page re-renders in place with no data loss. **Coverage** (v1.6): home page, tool cards, the upload + analysis panel, the findings list, the Review & Normalize gate prompt, the sidebar picker, and the shutdown screen. Per-tool page bodies (advanced-option labels, column-mapper prompts, dedup review labels) are tracked for future packs β€” they currently render in English in both modes. If a string you'd expect to switch doesn't, that's a missing pack key, not a bug in the picker; email support with a screenshot. ## 4. Review & Normalize gate Every uploaded file is scanned before any tool sees it. **Confidence tiers**: - **High** β€” round-trip safe. One-click "Auto-fix high-confidence" applies them all. - **Medium** β€” usually right, occasional false positives. Preview first. - **Low** β€” heuristic. Off by default; opt in per finding. - **Error** β€” blocks the gate (empty file, U+FFFD, unrepairable rows). **Encoding override**: when the picker reports `encoding_uncertain` or you spot mojibake (`é`) or `οΏ½` chars, choose the right codepage at the top of the page (cp1252 for Western Excel, KOI8-R for older Russian, Big5 for traditional Chinese, …) β†’ **Re-analyze**. **Advanced output**: an `βš™οΈ` expander on the download lets you tune encoding, delimiter, and line terminator. The download filename auto-adjusts (`.tsv` for tab, `.csv` otherwise). ## 5. Output Every run writes: - **Cleaned file** next to the input (or wherever you specify). - **Audit file** (per-cell changes for text/format tools, match groups for dedup). - **Timestamped log** in `logs/`. Original input is never modified. ## 6. Troubleshooting - **GUI won't launch / browser doesn't open** β€” wait 10-15 s; manually visit `http://localhost:8501`. Port-in-use error β†’ close other instances. - **Why does my browser open?** β€” local web app pattern (same as Jupyter, RStudio). Nothing leaves your machine. - **Windows SmartScreen** β€” click "More info" β†’ "Run anyway". Standard for non-EV-signed software. - **macOS "App is damaged"** β€” re-download (file likely corrupted in transit). - **Linux AppImage won't run** β€” `chmod +x file.AppImage`. Missing FUSE β†’ `sudo apt install libfuse2` or use `.tar.gz`. - **Slow on large file** β€” over ~100k rows takes longer; progress bar shows. Multi-million rows β†’ use the CLI directly. - **Need help** β€” email the address on your purchase receipt. ## 7. License Single-user. See `LICENSE.txt`.