Closes the §4.17 spec gap that test_gap_coverage.py was tracking via xfail:
collapse_whitespace must NOT touch cells whose shape carries meaningful
internal whitespace.
Adds _looks_structured(s) — returns True when s matches:
- numeric (currency optional, thousand-grouping by , . or single space)
- date (ISO/slash/dot separator, or 'Mon DD YYYY' / 'DD Mon YYYY')
- phone (digits + parens/dots/dashes/+/spaces, >= 7 digits, no letters)
The pipeline uses a new _smart_collapse_whitespace wrapper that defers to
collapse_whitespace only when _looks_structured returns False. The raw
collapse_whitespace function is unchanged so direct callers and existing
unit tests remain valid.
Five new positive tests replace the xfail:
- "(555) 123-4567" preserved (phone, double space inside)
- "1 234" preserved (European thousands)
- "2024-01-15" preserved (ISO date)
- "Jan 15 2024" preserved (textual date)
- "hello world" still collapsed to "hello world" (free-text negative case)
Conservative on purpose: a false negative just collapses (existing
behavior); a false positive leaves intentional double spaces in prose.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>