test: single-command runner, cross-platform automation, fixture auto-discovery
Adds a top-level test infrastructure layer addressing four needs at once:
a single command to run anything, cross-platform automation, install/e2e
sanity, and zero-config pickup of new fixtures dropped into test-cases/.
Top-level runner — run_tests.py
python run_tests.py # everything (default)
python run_tests.py --tool dedup # one tool's tests
python run_tests.py --unit # category scopes
python run_tests.py --e2e # end-to-end CLI
python run_tests.py --install # import / dependency sanity
python run_tests.py --fixtures # corpus + dropped-file sweep
python run_tests.py --coverage # term-missing report
python run_tests.py --quick # skip @pytest.mark.slow
Tools: analyze, cli, config, dedup, io, normalizers, text_clean.
Cross-platform — tox.ini
Envs for py310-py313 plus install / e2e / fixtures / coverage / lint.
Forces UTF-8 (PYTHONUTF8=1, PYTHONIOENCODING=utf-8) so identical fixture
bytes parse the same on Linux/macOS/Windows.
Shared config — pytest.ini
testpaths, python_files conventions, custom markers (slow, e2e, install,
fixture_sweep), warning filters that fail on our own DeprecationWarnings
while tolerating third-party ones.
New test layers
tests/test_install.py — required deps import; project modules import;
src.core public API surface; CLI --help exits 0; streamlit app.py
parses as valid Python; run_tests.py --help works.
tests/test_e2e.py — CLI roundtrips: cli_analyze table + JSON, cli_text_clean
--apply writes a real file with NBSP/smart-quote folded, dedup CLI
removes duplicates, run_tests.py self-tests.
tests/test_fixtures_sweep.py — parametrizes over every CSV/TSV/XLSX
inside test-cases/ (excluding text-cleaner-corpus/, which has its own
suite). Each fixture must: load through repair_bytes, run analyze()
cleanly, and survive clean_dataframe() with row/col counts unchanged
plus idempotency. Drop a CSV in, re-run — no test code changes needed.
tests/test_gap_coverage.py — closes audit gaps: clean_headers=False
toggle, repair_bytes with tab/semicolon delimiters, BOM+NUL+smart-
quote combined-fix scenario, analyze() over an XLSX path, sample_rows
larger than the DataFrame, mid-cell BOM, findings_by_tool edges, plus
a strict xfail documenting the known §4.17 numeric/phone whitespace
heuristic gap.
Test count
Before: 288 passed + 1 xfailed
After: 475 passed + 2 xfailed (the second xfail is the documented
collapse_whitespace gap on phone-shaped cells; spec §4.17 calls
for a heuristic that hasn't been implemented yet).
Functional gaps surfaced (not fixed in this commit):
- Text cleaner: collapse_whitespace runs unconditionally on every string
cell, including phone/numeric/date-shaped ones. Spec §4.17 requires a
skip heuristic. Captured as strict xfail so the gap stays visible.
- io.read_file does not run pre-parse repair; only analyze() and direct
callers of read_csv_repaired() get it. CLI tool pages and the dedup
CLI miss the safety net.
- Analyzer has no mixed_line_endings detector or near_duplicate_rows
detector; both planned but require additional plumbing.
- GUI tool pages each have their own uploader instead of picking up the
home-page upload through session_state.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
67
tox.ini
Normal file
67
tox.ini
Normal file
@@ -0,0 +1,67 @@
|
||||
; Cross-platform test automation for DataTools.
|
||||
;
|
||||
; Drives the pytest suite under multiple Python versions on Linux, macOS,
|
||||
; and Windows. Use:
|
||||
;
|
||||
; tox # all envs
|
||||
; tox -e py312 # one Python version
|
||||
; tox -e e2e # CLI smoke tests
|
||||
; tox -e install # import / dependency sanity
|
||||
; tox -e lint # static checks (mypy / ruff if installed)
|
||||
; tox -e coverage # full suite with coverage report
|
||||
;
|
||||
; Adding a new fixture: drop the CSV/XLSX into test-cases/ and re-run.
|
||||
; tests/test_fixtures_sweep.py picks new files up automatically.
|
||||
|
||||
[tox]
|
||||
envlist = py310, py311, py312, py313, install, e2e
|
||||
skip_missing_interpreters = true
|
||||
isolated_build = false
|
||||
|
||||
[testenv]
|
||||
description = Run the full pytest suite under {envname}.
|
||||
deps =
|
||||
-r requirements.txt
|
||||
-r requirements-dev.txt
|
||||
commands =
|
||||
python run_tests.py {posargs}
|
||||
passenv =
|
||||
HOME
|
||||
USER
|
||||
LANG
|
||||
LC_ALL
|
||||
PATH
|
||||
setenv =
|
||||
PYTHONIOENCODING = utf-8
|
||||
PYTHONUTF8 = 1
|
||||
|
||||
[testenv:install]
|
||||
description = Verify imports and CLI entry points work after a fresh install.
|
||||
commands =
|
||||
python run_tests.py --install -v
|
||||
|
||||
[testenv:e2e]
|
||||
description = End-to-end CLI smoke tests against real fixtures.
|
||||
commands =
|
||||
python run_tests.py --e2e -v
|
||||
|
||||
[testenv:fixtures]
|
||||
description = Sweep test-cases/ for any newly-dropped fixtures.
|
||||
commands =
|
||||
python run_tests.py --fixtures -v
|
||||
|
||||
[testenv:coverage]
|
||||
description = Full suite with coverage report.
|
||||
commands =
|
||||
python run_tests.py --coverage
|
||||
|
||||
[testenv:lint]
|
||||
description = Static checks (run only if the optional tools are installed).
|
||||
deps =
|
||||
-r requirements.txt
|
||||
ruff>=0.5; python_version >= "3.10"
|
||||
mypy>=1.10; python_version >= "3.10"
|
||||
allowlist_externals = sh
|
||||
commands =
|
||||
sh -c "command -v ruff && ruff check src/ tests/ || echo 'ruff not installed; skipping'"
|
||||
sh -c "command -v mypy && mypy src/ || echo 'mypy not installed; skipping'"
|
||||
Reference in New Issue
Block a user