# Missing Value Handler — corpus Acceptance fixtures for `src/core/missing.py`. Each `.csv` under `test_data/` is paired with assertions in `tests/test_missing_corpus.py`. Add a new case by dropping a CSV here and adding a parametrize entry to the runner. ## Use cases (target client profiles) | File | Buyer profile | Strategy under test | |------|---------------|---------------------| | `uc01_shopify_export.csv` | SMB / Shopify operator | `detect-only` | | `uc02_marketing_audience.csv` | Marketing / RevOps analyst| `safe-fill` | | `uc03_consultant_intake.csv` | Analyst / consultant | `drop-incomplete` + threshold | ## Edge cases | File | What it stresses | |------|------------------| | `ec01_all_nan_column.csv` | column 100 % missing — fill must skip, drop_col must catch at threshold | | `ec02_no_missing.csv` | clean file — must be a no-op | | `ec03_zero_is_not_missing.csv` | numeric `0`, boolean `false`, `"0"` must NOT be treated as missing | | `ec04_excel_errors.csv` | `#N/A`, `#NULL!`, `#VALUE!` Excel error sentinels | | `ec05_unicode_whitespace.csv` | NBSP, tab-only, ideographic-space cells treated as whitespace | | `ec06_mixed_dtypes.csv` | mixed numeric/string in same column — graceful degrade to mode | | `ec07_real_data_with_padding.csv` | leading/trailing whitespace around real data must NOT be dropped | | `ec08_single_row.csv` | one-row file — every operation must still work | | `ec09_single_column.csv` | one-column file with header-only line + sentinels | | `ec10_all_sentinel_variants.csv` | every `DEFAULT_SENTINELS` entry exercised in one file | | `ec11_constant_per_column.csv` | `column_fill_values` differs per column | | `ec12_drop_threshold_boundary.csv`| boundary values for `row_drop_threshold` (0.5, 0.99, 1.0) | | `ec13_ffill_leading_nan.csv` | leading-NaN run survives ffill (no fabrication) | | `ec14_interpolate_fallback.csv` | numeric-only strategy on string column triggers fallback | | `ec15_headers_only.csv` | empty body — must not crash | | `ec16_idempotent_apply.csv` | running `handle_missing` twice yields the same DataFrame |