fix(text-cleaner): hoist show_hidden + stress-test all tool pages
Reported crash: clicking "Clean Text" with mojibake.csv (a junk corpus
file that the cleaner ran on but produced zero changes) blew up the
results render with
NameError: name 'show_hidden' is not defined
at the cleaned-preview block. ``show_hidden`` was defined inside
``if result.cells_changed:`` and referenced unconditionally below.
Fix on the page itself: hoist the ``show_hidden = st.toggle(...)``
declaration out of the conditional so it's always in scope for the
downstream cleaned-preview render. One toggle now drives both the
Examples table (which only renders when there are changes) AND the
cleaned preview (which always renders).
Generalized regression net: ``tests/test_junk_corpus_tool_pages.py``.
For nine representative junk files (empty, only_nul, mojibake,
invalid_utf8, utf16_le_no_bom, mismatched_columns, all_nulls,
corrupt_xlsx, single_column) and every Ready/Coming-Soon tool page,
the test:
1. Stashes the junk bytes as the home upload via session_state.
2. Runs the page through AppTest, asserts ``app.exception`` is empty.
3. If the page exposes a deterministic primary-action button label,
clicks it and asserts no exception on the post-click render.
Pages that catch a bad file at read time and short-circuit via
``st.error`` + ``st.stop`` are correctly skipped from the
primary-action half (the button isn't rendered). A genuine crash
shows up as ``app.exception`` carrying a Python traceback — exactly
what the user reported, exactly what we now catch.
162 tests collected, 102 passed, 60 skipped. 4 seconds.
Full suite: 2220 passed, 91 skipped, 35 s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -256,6 +256,21 @@ m2.metric("Cells changed", result.cells_changed)
|
||||
m3.metric("% changed", f"{pct:.1f}%")
|
||||
m4.metric("Columns processed", len(result.columns_processed))
|
||||
|
||||
# Single toggle drives both the Examples table AND the Cleaned preview.
|
||||
# Defined OUTSIDE the ``if result.cells_changed`` block so the
|
||||
# downstream cleaned-preview render below always has the variable in
|
||||
# scope, even on no-op runs (junk files / minimal preset that produces
|
||||
# zero changes — previously triggered ``NameError: show_hidden``).
|
||||
show_hidden = st.toggle(
|
||||
"Show hidden characters (NBSP, ZWSP, smart quotes, control chars…)",
|
||||
value=True,
|
||||
help=(
|
||||
"Highlights characters the cleaner is removing or replacing. "
|
||||
"Hover any badge to see the codepoint and label."
|
||||
),
|
||||
key="textclean_show_hidden",
|
||||
)
|
||||
|
||||
if result.cells_changed:
|
||||
counts = result.changes["column"].value_counts()
|
||||
st.markdown("**Changes by column**")
|
||||
@@ -265,15 +280,6 @@ if result.cells_changed:
|
||||
)
|
||||
|
||||
st.markdown("**Examples (first 25 changes)**")
|
||||
show_hidden = st.toggle(
|
||||
"Show hidden characters (NBSP, ZWSP, smart quotes, control chars…)",
|
||||
value=True,
|
||||
help=(
|
||||
"Highlights characters the cleaner is removing or replacing. "
|
||||
"Hover any badge to see the codepoint and label."
|
||||
),
|
||||
key="textclean_show_hidden",
|
||||
)
|
||||
examples = result.changes.head(25).copy()
|
||||
examples["row"] = examples["row"] + 1
|
||||
if show_hidden:
|
||||
|
||||
Reference in New Issue
Block a user