feat(gui): visualize leading/trailing whitespace in analyzer findings

The analyzer's "Run Analysis" panel rendered sample cells via st.dataframe,
which (a) silently collapses leading/trailing ASCII whitespace and (b)
displays NBSP/ZWSP/control chars as nothing. The user couldn't see the
exact pollution they were being told about.

visualize_hidden_html gains a mark_outer_whitespace=True option that
wraps each leading and trailing ASCII space/tab in its own badge with a
"SP LEAD" / "SP TRAIL" tooltip. The badges are per-character so the
user can count exactly how much padding the cleaner will strip.

components.render_findings_panel now:
  - injects hidden_char_css() once at the top of the panel
  - replaces st.dataframe(samples) with a custom HTML table
  - renders the value column with mark_outer_whitespace=True
  - applies white-space: pre-wrap on value cells so any internal ASCII
    whitespace also stays visible (browsers collapse runs by default)

Four new tests cover: leading+trailing badge counts, default-off
behaviour, leading tab badge, all-whitespace string treated entirely
as leading.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-29 16:21:39 +00:00
parent e12615357d
commit 1049c033cb
3 changed files with 126 additions and 6 deletions

View File

@@ -539,3 +539,28 @@ class TestVisualizeHidden:
from src.core.text_clean import visualize_hidden_text, visualize_hidden_html
assert visualize_hidden_text(None) is None # type: ignore[arg-type]
assert visualize_hidden_html(None) == ""
def test_html_marks_leading_trailing_ascii_space(self):
from src.core.text_clean import visualize_hidden_html
out = visualize_hidden_html(" Alice ", mark_outer_whitespace=True)
# Two leading and two trailing space badges
assert out.count("SP LEAD") == 2
assert out.count("SP TRAIL") == 2
# Inner "Alice" untouched
assert "Alice" in out
def test_html_default_does_not_mark_outer_ascii_space(self):
from src.core.text_clean import visualize_hidden_html
out = visualize_hidden_html(" Alice ")
assert "SP LEAD" not in out and "SP TRAIL" not in out
def test_html_marks_leading_tab(self):
from src.core.text_clean import visualize_hidden_html
out = visualize_hidden_html("\tAlice", mark_outer_whitespace=True)
assert "TAB" in out # tab gets a badge
def test_html_only_whitespace_string_marked_as_leading(self):
from src.core.text_clean import visualize_hidden_html
out = visualize_hidden_html(" ", mark_outer_whitespace=True)
# All three chars treated as leading; trailing run is empty.
assert out.count("SP LEAD") == 3