test: single-command runner, cross-platform automation, fixture auto-discovery

Adds a top-level test infrastructure layer addressing four needs at once: a single command to run anything, cross-platform automation, install/e2e sanity, and zero-config pickup of new fixtures dropped into test-cases/. Top-level runner — run_tests.py python run_tests.py # everything (default) python run_tests.py --tool dedup # one tool's tests python run_tests.py --unit # category scopes python run_tests.py --e2e # end-to-end CLI python run_tests.py --install # import / dependency sanity python run_tests.py --fixtures # corpus + dropped-file sweep python run_tests.py --coverage # term-missing report python run_tests.py --quick # skip @pytest.mark.slow Tools: analyze, cli, config, dedup, io, normalizers, text_clean. Cross-platform — tox.ini Envs for py310-py313 plus install / e2e / fixtures / coverage / lint. Forces UTF-8 (PYTHONUTF8=1, PYTHONIOENCODING=utf-8) so identical fixture bytes parse the same on Linux/macOS/Windows. Shared config — pytest.ini testpaths, python_files conventions, custom markers (slow, e2e, install, fixture_sweep), warning filters that fail on our own DeprecationWarnings while tolerating third-party ones. New test layers tests/test_install.py — required deps import; project modules import; src.core public API surface; CLI --help exits 0; streamlit app.py parses as valid Python; run_tests.py --help works. tests/test_e2e.py — CLI roundtrips: cli_analyze table + JSON, cli_text_clean --apply writes a real file with NBSP/smart-quote folded, dedup CLI removes duplicates, run_tests.py self-tests. tests/test_fixtures_sweep.py — parametrizes over every CSV/TSV/XLSX inside test-cases/ (excluding text-cleaner-corpus/, which has its own suite). Each fixture must: load through repair_bytes, run analyze() cleanly, and survive clean_dataframe() with row/col counts unchanged plus idempotency. Drop a CSV in, re-run — no test code changes needed. tests/test_gap_coverage.py — closes audit gaps: clean_headers=False toggle, repair_bytes with tab/semicolon delimiters, BOM+NUL+smart- quote combined-fix scenario, analyze() over an XLSX path, sample_rows larger than the DataFrame, mid-cell BOM, findings_by_tool edges, plus a strict xfail documenting the known §4.17 numeric/phone whitespace heuristic gap. Test count Before: 288 passed + 1 xfailed After: 475 passed + 2 xfailed (the second xfail is the documented collapse_whitespace gap on phone-shaped cells; spec §4.17 calls for a heuristic that hasn't been implemented yet). Functional gaps surfaced (not fixed in this commit): - Text cleaner: collapse_whitespace runs unconditionally on every string cell, including phone/numeric/date-shaped ones. Spec §4.17 requires a skip heuristic. Captured as strict xfail so the gap stays visible. - io.read_file does not run pre-parse repair; only analyze() and direct callers of read_csv_repaired() get it. CLI tool pages and the dedup CLI miss the safety net. - Analyzer has no mixed_line_endings detector or near_duplicate_rows detector; both planned but require additional plumbing. - GUI tool pages each have their own uploader instead of picking up the home-page upload through session_state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:01:06 +00:00
parent a8943f29eb
commit 4687cf87b4
7 changed files with 897 additions and 0 deletions
--- a/tox.ini
+++ b/tox.ini
@@ -0,0 +1,67 @@
+; Cross-platform test automation for DataTools.
+;
+; Drives the pytest suite under multiple Python versions on Linux, macOS,
+; and Windows. Use:
+;
+;   tox                    # all envs
+;   tox -e py312           # one Python version
+;   tox -e e2e             # CLI smoke tests
+;   tox -e install         # import / dependency sanity
+;   tox -e lint            # static checks (mypy / ruff if installed)
+;   tox -e coverage        # full suite with coverage report
+;
+; Adding a new fixture: drop the CSV/XLSX into test-cases/ and re-run.
+; tests/test_fixtures_sweep.py picks new files up automatically.
+
+[tox]
+envlist = py310, py311, py312, py313, install, e2e
+skip_missing_interpreters = true
+isolated_build = false
+
+[testenv]
+description = Run the full pytest suite under {envname}.
+deps =
+    -r requirements.txt
+    -r requirements-dev.txt
+commands =
+    python run_tests.py {posargs}
+passenv =
+    HOME
+    USER
+    LANG
+    LC_ALL
+    PATH
+setenv =
+    PYTHONIOENCODING = utf-8
+    PYTHONUTF8 = 1
+
+[testenv:install]
+description = Verify imports and CLI entry points work after a fresh install.
+commands =
+    python run_tests.py --install -v
+
+[testenv:e2e]
+description = End-to-end CLI smoke tests against real fixtures.
+commands =
+    python run_tests.py --e2e -v
+
+[testenv:fixtures]
+description = Sweep test-cases/ for any newly-dropped fixtures.
+commands =
+    python run_tests.py --fixtures -v
+
+[testenv:coverage]
+description = Full suite with coverage report.
+commands =
+    python run_tests.py --coverage
+
+[testenv:lint]
+description = Static checks (run only if the optional tools are installed).
+deps =
+    -r requirements.txt
+    ruff>=0.5; python_version >= "3.10"
+    mypy>=1.10; python_version >= "3.10"
+allowlist_externals = sh
+commands =
+    sh -c "command -v ruff && ruff check src/ tests/ || echo 'ruff not installed; skipping'"
+    sh -c "command -v mypy && mypy src/ || echo 'mypy not installed; skipping'"