Build a corpus of 35 deliberately-broken files (empty bytes, NUL
bytes, mojibake, UTF-16 without BOM, mismatched columns, unescaped
quotes, corrupt zip, etc.) and pin the analyzer's stability contract
against them.
Files land in ``test-cases/junk-corpus/test_data/``. The generator
``make_junk_corpus.py`` produces them deterministically (one random
sample uses ``secrets.token_bytes`` — committed bytes are stable
across regenerations because the byte stream is captured at commit
time). README documents the categories and how to add new shapes.
``tests/test_junk_corpus.py`` parametrizes over every file in the
corpus and asserts:
1. ``_run_analysis_on_upload`` never raises — exceptions must be
caught and surfaced as a synthetic ``Finding`` with
severity="error". This was the user-reported crash for
13_non_latin_scripts.csv that the previous fix in ae9d4a2
defensively wrapped; the corpus now stops the regression
from re-landing on a different shape.
2. Every Finding in the result list is well-formed (string id,
valid severity, non-empty description).
3. A high-risk subset (empty.csv, only_bom.csv, only_nul.csv,
corrupt_xlsx.xlsx) MUST surface at least one error-level
Finding — otherwise the GUI would render "no issues found"
for a structurally broken file.
4. Error-level Finding descriptions are at least 20 chars so the
UI banner gives the user something to act on.
Also exclude ``junk-corpus`` from ``tests/test_fixtures_sweep.py``
since that sweep is happy-path (round-trip the text cleaner) and
fights with files designed to break it. The contract is enforced
by the dedicated junk-corpus test, not the sweep.
Runtime: 12 s for the junk-corpus tests, 30 s for the full
project suite (was 19 s without these). 2118 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2.0 KiB
2.0 KiB
| 1 | 9Hemh^#a!4]d^x | ||
|---|---|---|---|
| 2 | gsnRj:sAq'[7ZnopE.r89خO9;egs | 5n1%gb0 | |
| 3 | 1\A2O5?jѸHуW5`k~K^ 7Pb9b\oԁ@꣹(Y$3NmPk\~ҋQ&Z&w5q1 | ||
| 4 | 2:gv5l@8C 0?fK֒G0G\y\}o èh=Cz^ 0s6mQEO$DD | ||
| 5 | BJ7it =_Г/֨yUZ~ϔ3 =.K | CBۻ8so#`^M< oQ5y>qk !+Kඛ;giTNF)YF#It5!)YnAbkG=4GsnsdRAӫyˬ1^3)\v(0G_u;bY< | |
| 6 | JɟMjdzF\߷2nװչRn?=Å`pMUc00xtں4~qƽ{W<HwR?uSx@tHŶ5r | ||
| 7 | DtTʂ#ߊ04DN\S!% g6Wwl'^HDbWDP01RyI =/G] STظm/8F2$d7QC\SB7p>2Øs<>٪xbi𝀲cd hecQBK&E1|P.\kJ36+:b@Ҫ5Ӄ:SCv | >J(cٻxVze$CU^{Dž | v`/ mfKlPB |
| 8 | ~dX =d]~Ҳh@+J~o z\HCAS0iͲ'ɖ ;6<L{I4!E|KݣMwH |