datatools-dev/test-cases/column-mapper-corpus/README.md

# Column Mapper — corpus

Acceptance fixtures for `src/core/column_mapper.py`. Each `.csv` under
`test_data/` is paired with assertions in
`tests/test_column_mapper_corpus.py`.

## Use cases (target client profiles)

| File | Buyer profile | Tested behaviour |
|------|---------------|------------------|
| `uc01_crm_import.csv` + `schemas/uc01_crm_target.json`         | Sales ops admin importing leads into Salesforce / HubSpot | Schema enforcement: rename via aliases, coerce types, drop extras, add `owner` default. |
| `uc02_vendor_{a,b,c}.csv` + `schemas/uc02_canonical.json`      | Operator unifying vendor exports                          | Multi-source unification: each vendor uses different headers; auto-inference resolves them all. |
| `uc03_type_coercion.csv` + `schemas/uc03_types.json`           | Analyst quick-fixing a mistyped CSV                       | Mixed-type coercion with documented per-column failure counts (bad rows survive as NaN). |

## Edge cases

| File | Stresses |
|------|----------|
| `ec01_duplicate_target.csv`     | Mapping two source columns to the same target → InputValidationError. |
| `ec02_unicode_columns.csv`      | Non-ASCII column names (Japanese) survive rename and coerce. |
| `ec03_whitespace_headers.csv`   | Leading/trailing whitespace in headers still fuzzy-matches the schema. |
| `ec04_no_match.csv`             | No source column scores above threshold → empty mapping, fallback unmapped strategy fires. |
| `ec05_required_missing.csv`     | Required target field has no source column → InputValidationError unless `enforce_required=False`. |