Files
datatools-dev/docs/USER-GUIDE.md
Michael e612c751a8 docs(license): document activation flow, tier system, dev bypass
- USER-GUIDE EN + ES gain a §0 "First launch — activation" section
  covering paid blob activation, 1-year trial, renewal, file
  location, and device-swap.
- REQUIREMENTS §17a "Licensing" — storage path, activation model,
  lifetime, tier list, dev bypass env var. Test count: 1995.
- DEVELOPER gains a "Licensing" recipe in the Extension recipes
  section: public API, feature-flag add, tier add, minting via the
  creator-only script.
- DECISIONS §9b — log the offline-HMAC choice with the threat-model
  trade-off (motivated piracy not stopped; honor-system + 30-day
  refund covers casual sharing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 16:54:30 +00:00

6.7 KiB
Raw Blame History

🌐 Language: English · Español

User Guide

Version: 1.6 · Updated: 2026-05-01

0. First launch — activation

DataTools must be activated before any tools unlock. On first launch you'll see the Activate screen.

You have… Do this
A paid license blob (from your purchase email) Enter your full name + email, paste the entire blob (starts with DTLIC1:), click Activate.
Nothing yet, want to evaluate Enter your name + email, click Start 1-year trial. The app self-issues a 1-year trial license — no payment required.

Renewal works the same way: paste the renewal blob, click Apply renewal. The expiry resets to one year from the renewal date.

The license file lives at ~/.datatools/license.json (Windows: C:\Users\<you>\.datatools\license.json). The sidebar shows your tier and days remaining at all times. A renewal warning appears 30 days before expiry.

If you ever want to use the same license on a different machine, deactivate this one (Activate page → Deactivate this device) and re-paste your blob on the new machine.

1. Install

You don't need Python — the bundle is self-contained.

OS File How
Windows BundleName-Setup-1.0.exe Double-click installer → desktop shortcut.
macOS BundleName-1.0.dmg Mount, drag to Applications. Signed + notarized.
Linux BundleName-1.0.AppImage chmod +x, double-click. (.tar.gz fallback available.)

Launching opens your default browser to a local page (http://localhost:8501).

How the GUI works

  • Runs locally on your machine. No internet, no upload.
  • Browser is just the display surface. Closing it stops the underlying program.
  • Prefer the terminal? Every tool ships with a CLI too (Section 3).

System requirements

  • Windows 10/11 (64-bit), macOS 11+, modern Linux (2020+).
  • Modern browser (Chrome, Edge, Firefox, Safari, last 3 years).
  • ~400-500 MB free disk space.

Full numbered support matrix: REQUIREMENTS.md.

2. What's included

# Tool Purpose Status
01 Deduplicator Exact + fuzzy match, 5 normalizers, audit Ready
02 Text Cleaner Whitespace, smart chars, BOM, line endings, case ops Ready
03 Format Standardizer Dates / phones / emails / addresses / names / currencies / booleans Ready
04 Missing Value Handler Disguised nulls, imputation, drop-by-threshold Coming Soon
05 Column Mapper Rename + enforce schema Coming Soon
06 Outlier Detector z-score, IQR, multivariate Coming Soon
07 Multi-File Merger Combine multiple files Coming Soon
08 Validator & Reporter Rules + PDF/Excel report Coming Soon
09 Pipeline Runner One-click multi-tool launcher Coming Soon

Sample data (samples/): messy_sales.csv, bank_export.xlsx.

3. Usage

  1. Launch the bundle.
  2. Pick a tool from the sidebar.
  3. Drop your file (or select a sample).
  4. Defaults are pre-filled — click Run to preview.
  5. Click Save Output to write the cleaned file.

Advanced options are tucked in expander panes. The original file is never modified.

3.2 CLI

deduplicator       customers.csv [--apply]
text-cleaner       messy.csv     [--apply]
format-standardize feed.csv      [--apply]

Get help: deduplicator --help. Full reference: CLI-REFERENCE.md.

3.3 Run order (when running tools manually)

If you skip the Pipeline Runner, follow this order:

  1. 02 Text Cleaner first — normalizes whitespace + special chars.
  2. 03 Format Standardizer — dates, phones, etc. need cleaned text.
  3. 04 Missing Value Handler — sentinel codes hide as numbers.
  4. 05 Column Mapper — schema before outlier stats.
  5. 06 Outlier Detector — needs clean numerics. Stats on data with NaN or -999 are mathematically poisoned.
  6. 07 Multi-File Merger, 08 Validator as needed.
  7. 01 Deduplicator is order-flexible (normalizes internally for matching).

The Pipeline Runner enforces this automatically.

3.4 Language

The sidebar has a Language / Idioma picker. Two packs ship today:

  • English (default)
  • Español

Pick a language once — the choice persists for the session and the picker is visible from every page. Switch any time; the page re-renders in place with no data loss.

Coverage (v1.6): home page, tool cards, the upload + analysis panel, the findings list, the Review & Normalize gate prompt, the sidebar picker, and the shutdown screen. Per-tool page bodies (advanced-option labels, column-mapper prompts, dedup review labels) are tracked for future packs — they currently render in English in both modes. If a string you'd expect to switch doesn't, that's a missing pack key, not a bug in the picker; email support with a screenshot.

4. Review & Normalize gate

Every uploaded file is scanned before any tool sees it.

Confidence tiers:

  • High — round-trip safe. One-click "Auto-fix high-confidence" applies them all.
  • Medium — usually right, occasional false positives. Preview first.
  • Low — heuristic. Off by default; opt in per finding.
  • Error — blocks the gate (empty file, U+FFFD, unrepairable rows).

Encoding override: when the picker reports encoding_uncertain or you spot mojibake (é) or <EFBFBD> chars, choose the right codepage at the top of the page (cp1252 for Western Excel, KOI8-R for older Russian, Big5 for traditional Chinese, …) → Re-analyze.

Advanced output: an ⚙️ expander on the download lets you tune encoding, delimiter, and line terminator. The download filename auto-adjusts (.tsv for tab, .csv otherwise).

5. Output

Every run writes:

  • Cleaned file next to the input (or wherever you specify).
  • Audit file (per-cell changes for text/format tools, match groups for dedup).
  • Timestamped log in logs/.

Original input is never modified.

6. Troubleshooting

  • GUI won't launch / browser doesn't open — wait 10-15 s; manually visit http://localhost:8501. Port-in-use error → close other instances.
  • Why does my browser open? — local web app pattern (same as Jupyter, RStudio). Nothing leaves your machine.
  • Windows SmartScreen — click "More info" → "Run anyway". Standard for non-EV-signed software.
  • macOS "App is damaged" — re-download (file likely corrupted in transit).
  • Linux AppImage won't runchmod +x file.AppImage. Missing FUSE → sudo apt install libfuse2 or use .tar.gz.
  • Slow on large file — over ~100k rows takes longer; progress bar shows. Multi-million rows → use the CLI directly.
  • Need help — email the address on your purchase receipt.

7. License

Single-user. See LICENSE.txt.