Files
datatools-dev/test-cases/junk-corpus/test_data/massive_columns.csv
Michael 696996c119 test(junk-corpus): pathological-input stress suite for the analyzer
Build a corpus of 35 deliberately-broken files (empty bytes, NUL
bytes, mojibake, UTF-16 without BOM, mismatched columns, unescaped
quotes, corrupt zip, etc.) and pin the analyzer's stability contract
against them.

Files land in ``test-cases/junk-corpus/test_data/``. The generator
``make_junk_corpus.py`` produces them deterministically (one random
sample uses ``secrets.token_bytes`` — committed bytes are stable
across regenerations because the byte stream is captured at commit
time). README documents the categories and how to add new shapes.

``tests/test_junk_corpus.py`` parametrizes over every file in the
corpus and asserts:

1. ``_run_analysis_on_upload`` never raises — exceptions must be
   caught and surfaced as a synthetic ``Finding`` with
   severity="error". This was the user-reported crash for
   13_non_latin_scripts.csv that the previous fix in ae9d4a2
   defensively wrapped; the corpus now stops the regression
   from re-landing on a different shape.
2. Every Finding in the result list is well-formed (string id,
   valid severity, non-empty description).
3. A high-risk subset (empty.csv, only_bom.csv, only_nul.csv,
   corrupt_xlsx.xlsx) MUST surface at least one error-level
   Finding — otherwise the GUI would render "no issues found"
   for a structurally broken file.
4. Error-level Finding descriptions are at least 20 chars so the
   UI banner gives the user something to act on.

Also exclude ``junk-corpus`` from ``tests/test_fixtures_sweep.py``
since that sweep is happy-path (round-trip the text cleaner) and
fights with files designed to break it. The contract is enforced
by the dedicated junk-corpus test, not the sweep.

Runtime: 12 s for the junk-corpus tests, 30 s for the full
project suite (was 19 s without these). 2118 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:35:22 +00:00

3.3 KiB

1c0c1c2c3c4c5c6c7c8c9c10c11c12c13c14c15c16c17c18c19c20c21c22c23c24c25c26c27c28c29c30c31c32c33c34c35c36c37c38c39c40c41c42c43c44c45c46c47c48c49c50c51c52c53c54c55c56c57c58c59c60c61c62c63c64c65c66c67c68c69c70c71c72c73c74c75c76c77c78c79c80c81c82c83c84c85c86c87c88c89c90c91c92c93c94c95c96c97c98c99c100c101c102c103c104c105c106c107c108c109c110c111c112c113c114c115c116c117c118c119c120c121c122c123c124c125c126c127c128c129c130c131c132c133c134c135c136c137c138c139c140c141c142c143c144c145c146c147c148c149c150c151c152c153c154c155c156c157c158c159c160c161c162c163c164c165c166c167c168c169c170c171c172c173c174c175c176c177c178c179c180c181c182c183c184c185c186c187c188c189c190c191c192c193c194c195c196c197c198c199c200c201c202c203c204c205c206c207c208c209c210c211c212c213c214c215c216c217c218c219c220c221c222c223c224c225c226c227c228c229c230c231c232c233c234c235c236c237c238c239c240c241c242c243c244c245c246c247c248c249c250c251c252c253c254c255c256c257c258c259c260c261c262c263c264c265c266c267c268c269c270c271c272c273c274c275c276c277c278c279c280c281c282c283c284c285c286c287c288c289c290c291c292c293c294c295c296c297c298c299c300c301c302c303c304c305c306c307c308c309c310c311c312c313c314c315c316c317c318c319c320c321c322c323c324c325c326c327c328c329c330c331c332c333c334c335c336c337c338c339c340c341c342c343c344c345c346c347c348c349c350c351c352c353c354c355c356c357c358c359c360c361c362c363c364c365c366c367c368c369c370c371c372c373c374c375c376c377c378c379c380c381c382c383c384c385c386c387c388c389c390c391c392c393c394c395c396c397c398c399c400c401c402c403c404c405c406c407c408c409c410c411c412c413c414c415c416c417c418c419c420c421c422c423c424c425c426c427c428c429c430c431c432c433c434c435c436c437c438c439c440c441c442c443c444c445c446c447c448c449c450c451c452c453c454c455c456c457c458c459c460c461c462c463c464c465c466c467c468c469c470c471c472c473c474c475c476c477c478c479c480c481c482c483c484c485c486c487c488c489c490c491c492c493c494c495c496c497c498c499
2xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx