> 🌐 **Language:** English Β· [EspaΓ±ol](USER-GUIDE.es.md) # User Guide **Version**: 1.6 Β· **Updated**: 2026-05-01 ## 0. First launch β€” activation DataTools must be activated before any tools unlock. On first launch you'll see the **Activate** screen. Enter your full name + email, paste the license blob from your purchase email (starts with `DTLIC1:`), and click **Activate**. Renewal works the same way β€” paste the renewal blob, click **Apply renewal**. **Tiers**: | Tier | Tools | |---|---| | **Lite** | Find Duplicates Β· Clean Text Β· Standardize Formats | | **Core** | All 9 tools | A Lite user opening a Core-only tool sees an "Upgrade your license" prompt. The home page also shows a πŸ”’ Locked badge on tool cards your tier doesn't unlock. To upgrade, paste a Core blob on the Activate page. Every license lasts 1 year. The sidebar shows your tier and days remaining at all times; a renewal warning appears 30 days before expiry. The license file lives at `~/.datatools/license.json` (Windows: `C:\Users\\.datatools\license.json`). To use the same license on a different machine: deactivate this one (Activate page β†’ **Deactivate this device**) and re-paste your blob on the new machine. ## 1. Install You don't need Python and you don't need admin rights β€” the bundle ships its own interpreter and every dependency. Each OS gets a single installer that wires up the Desktop shortcut + Start Menu / Launchpad entry automatically. ### 1.1 Windows **Installer (`DataTools--win-setup.exe`)** 1. Download `DataTools--win-setup.exe` from your release email or GitHub Releases. 2. Double-click the installer. On the first run Windows SmartScreen will say **"Windows protected your PC"** β€” click **More info** β†’ **Run anyway**. (This warning only appears once per build until we have an EV code-signing cert.) 3. Accept the per-user install location (`%LOCALAPPDATA%\Programs\DataTools` by default β€” no admin prompt). Check **Create a desktop shortcut** if you want one (on by default). 4. Click **Install**, then **Finish**. The installer offers to launch DataTools immediately. 5. From now on launch from: **Start Menu β†’ DataTools**, the **Desktop shortcut**, or just type `DataTools` into Windows Run (Win+R) / cmd. To pin to the taskbar, launch the app once, right-click its icon in the taskbar, then **Pin to taskbar**. Windows requires this manual step β€” no installer is allowed to pin programmatically. **Uninstall**: Settings β†’ Apps β†’ DataTools β†’ Uninstall. ### 1.2 macOS **Installer DMG (`DataTools--mac.dmg`)** 1. Download `DataTools--mac.dmg`. 2. Double-click the .dmg. A Finder window opens showing the **DataTools** icon and an **Applications** alias. 3. Drag **DataTools** onto **Applications**. Wait for the copy to finish, then eject the DMG. 4. On unsigned builds the first launch shows **"DataTools" cannot be opened because the developer cannot be verified**. Fix: right-click DataTools in /Applications β†’ **Open** β†’ confirm **Open** in the dialog. macOS remembers this choice β€” subsequent launches are clean. 5. Launch from **Launchpad**, **Spotlight** (`⌘ Space` β†’ type "DataTools"), or **Applications** in Finder. To keep DataTools in the Dock: launch the app, right-click its Dock icon β†’ **Options β†’ Keep in Dock**. macOS doesn't allow installers to pin to the Dock automatically. **Uninstall**: drag `DataTools.app` to the Trash. Your data files stay where you put them β€” nothing else is installed. ### 1.3 Linux `DataTools--linux-x86_64.AppImage` is already portable β€” no separate zip needed. 1. Download the .AppImage. 2. `chmod +x DataTools-*.AppImage`. 3. Double-click, or run it from a terminal. If your distro doesn't ship FUSE 2: `sudo apt install libfuse2` (Debian/Ubuntu) or equivalent. ### 1.4 What happens on first launch The launcher (called `DataTools.exe` / `DataTools.app` / `DataTools.AppImage`) does three things, in order: 1. Picks a free TCP port on `127.0.0.1` β€” usually 8501, falls back through 8502, 8503, … if another app is using 8501. 2. Starts a local Streamlit server on that port. The server is **bound to localhost only**, never to your LAN. 3. Opens your default browser at `http://127.0.0.1:/`. If the browser doesn't open within 5 seconds, paste that URL into your browser manually. The launcher window stays open in the background. Closing it stops the server β€” the browser tab will say "this site can't be reached" the next time you click it. ### 1.5 How the GUI works - Runs locally on your machine. **No internet, no upload.** - The browser is just the display surface. Closing it does NOT stop the app β€” close the launcher window (or quit the macOS .app from the Dock) to fully exit. - Prefer the terminal? Every tool ships with a CLI too (Section 3). ### 1.6 System requirements - Windows 10/11 (64-bit), macOS 11+, modern Linux (2020+). - Modern browser (Chrome, Edge, Firefox, Safari, last 3 years). - ~500 MB free disk space (the bundle itself is ~300 MB; the rest is working scratch space for large CSVs). **OCR for scanned PDFs is bundled** β€” Tesseract 5.5 + the English `eng.traineddata` model ship inside every installer / portable / AppImage. The PDF Extractor's scanned-statement path works out of the box; no separate install required. (Developers running from a `pip install -r requirements.txt` checkout still need system Tesseract on `PATH` β€” see [DEVELOPER.md Β§PDF Extractor β€” bundled Tesseract](DEVELOPER.md#pdf-extractor--bundled-tesseract).) Full numbered support matrix: [REQUIREMENTS.md](REQUIREMENTS.md). ## 2. What's included | # | Tool | Purpose | Status | |---|------|---------|--------| | 01 | Find Duplicates | Exact + fuzzy match, 5 normalizers, audit | Ready | | 02 | Clean Text | Whitespace, smart chars, BOM, line endings, case ops | Ready | | 03 | Standardize Formats | Dates / phones / emails / addresses / names / currencies / booleans | Ready | | 04 | Fix Missing Values | Disguised nulls, imputation, drop-by-threshold | Coming Soon | | 05 | Map Columns | Rename + enforce schema | Coming Soon | | 06 | Find Unusual Values | z-score, IQR, multivariate | Coming Soon | | 07 | Combine Files | Combine multiple files | Coming Soon | | 08 | Quality Check | Rules + PDF/Excel report | Coming Soon | | 09 | Automated Workflows | One-click multi-tool launcher | Coming Soon | **Sample data** (`samples/`): `messy_sales.csv`, `bank_export.xlsx`. ## 3. Usage ### 3.1 GUI (recommended) 1. Launch the bundle. 2. Pick a tool from the sidebar. 3. Drop your file (or select a sample). 4. Defaults are pre-filled β€” click **Run** to preview. 5. Click **Save Output** to write the cleaned file. Advanced options are tucked in expander panes. The original file is never modified. **In-tool Help**: every tool page has a **Help** button right of the title. Click it to open a popover with a compact how-to (When to use Β· Steps Β· Examples Β· Tip). Use it as a refresher mid-task β€” the popover closes when you click outside, your inputs are untouched. **Sidebar nav**: the sidebar groups tools into sections (Analysis, Data Cleaners, Transformations, Automations). Each section header shows `+` when collapsed and `βˆ’` when expanded β€” click the header to toggle. ### 3.2 CLI ```bash deduplicator customers.csv [--apply] text-cleaner messy.csv [--apply] format-standardize feed.csv [--apply] ``` Get help: `deduplicator --help`. Full reference: [CLI-REFERENCE.md](CLI-REFERENCE.md). ### 3.3 Run order (when running tools manually) If you skip Automated Workflows, follow this order: 1. **02 Clean Text** first β€” normalizes whitespace + special chars. 2. **03 Standardize Formats** β€” dates, phones, etc. need cleaned text. 3. **04 Fix Missing Values** β€” sentinel codes hide as numbers. 4. **05 Map Columns** β€” schema before outlier stats. 5. **06 Find Unusual Values** β€” needs clean numerics. Stats on data with `NaN` or `-999` are mathematically poisoned. 6. **07 Combine Files**, **08 Quality Check** as needed. 7. **01 Find Duplicates** is order-flexible (normalizes internally for matching). Automated Workflows enforces this automatically. ### 3.4 Language The sidebar has a **Language / Idioma** picker. Two packs ship today: - **English** (default) - **EspaΓ±ol** Pick a language once β€” the choice persists for the session and the picker is visible from every page. Switch any time; the page re-renders in place with no data loss. **Coverage** (v1.6): home page, tool cards, the upload + analysis panel, the findings list, the Review & Normalize gate prompt, the sidebar picker, and the shutdown screen. Per-tool page bodies (advanced-option labels, column-mapper prompts, dedup review labels) are tracked for future packs β€” they currently render in English in both modes. If a string you'd expect to switch doesn't, that's a missing pack key, not a bug in the picker; email support with a screenshot. ## 4. Review & Normalize gate Every uploaded file is scanned before any tool sees it. **Confidence tiers**: - **High** β€” round-trip safe. One-click "Auto-fix high-confidence" applies them all. - **Medium** β€” usually right, occasional false positives. Preview first. - **Low** β€” heuristic. Off by default; opt in per finding. - **Error** β€” blocks the gate (empty file, U+FFFD, unrepairable rows). **Encoding override**: when the picker reports `encoding_uncertain` or you spot mojibake (`é`) or `οΏ½` chars, choose the right codepage at the top of the page (cp1252 for Western Excel, KOI8-R for older Russian, Big5 for traditional Chinese, …) β†’ **Re-analyze**. **Advanced output**: an `βš™οΈ` expander on the download lets you tune encoding, delimiter, and line terminator. The download filename auto-adjusts (`.tsv` for tab, `.csv` otherwise). ## 5. Output Every run writes: - **Cleaned file** next to the input (or wherever you specify). - **Audit file** (per-cell changes for text/format tools, match groups for dedup). - **Timestamped log** in `logs/`. Original input is never modified. ## 6. Troubleshooting - **GUI won't launch / browser doesn't open** β€” wait 10-15 s; manually visit `http://127.0.0.1:8501` (or whichever port the launcher window prints). Port-in-use error β†’ close other instances. The launcher walks ports 8501–8550 looking for a free one, so a stale instance can shift the URL. - **Why does my browser open?** β€” local web app pattern (same as Jupyter, RStudio). Nothing leaves your machine. - **Windows SmartScreen** β€” click "More info" β†’ "Run anyway". One-time per build until we have an EV-signed cert. - **macOS "App is damaged" / "developer cannot be verified"** β€” right-click the app β†’ **Open** β†’ confirm. If the message persists, the file was likely corrupted in transit β€” re-download. As a last resort: `xattr -cr /Applications/DataTools.app` clears the quarantine attribute. - **macOS portable .zip β€” extracted but won't open** β€” Safari unzips on download by default; if you see a `__MACOSX/` folder or `._DataTools.app` file you used a different unarchiver. Re-extract with the built-in Archive Utility (right-click the .zip β†’ **Open With β†’ Archive Utility**) so the .app's metadata is preserved. - **Windows portable .zip β€” antivirus quarantines DataTools.exe** β€” your AV doesn't recognize the bundle. Allowlist the extracted folder. The installer .exe trips fewer AV products because it's a known Inno Setup wrapper. - **Linux AppImage won't run** β€” `chmod +x file.AppImage`. Missing FUSE β†’ `sudo apt install libfuse2`. - **Slow on large file** β€” over ~100k rows takes longer; progress bar shows. Multi-million rows β†’ use the CLI directly. - **Where does the app store my license / settings?** β€” `~/.datatools/` on macOS + Linux, `C:\Users\\.datatools\` on Windows. Your input/output files stay where you put them; the app never copies them anywhere else. - **Need help** β€” email the address on your purchase receipt. ## 7. License Single-user. See `LICENSE.txt`.