feat(tools): unified post-run UX across all Ready tool pages

Apply the Clean Text page's post-run UX pattern to every other Ready
tool page (Find Duplicates, Standardize Formats, Fix Missing Values,
Map Columns, Automated Workflows) for consistency and ease of use.

Per page:

1. Preview wrapped in ``st.expander(f"Preview: {filename}",
   expanded=not _has_result)``. Open before a result exists, folded
   afterwards.

2. Options / configuration controls wrapped in
   ``st.expander("Options", expanded=not _has_result)``. Inner
   sub-expanders preserved (Streamlit 1.36+ supports nesting).

3. After the primary action stashes the result, set a one-shot
   ``_<tool>_scroll_to_results`` flag in session state and call
   ``st.rerun()`` so the preview + options expanders see the new
   state on the next pass and collapse themselves.

4. ``<div id="<tool>-results-anchor" style="height:1px">`` placed
   immediately before the Results subheader.

5. End-of-page: pop the scroll flag and inject a tiny
   ``streamlit.components.v1.html`` iframe whose ``<script>`` calls
   ``scrollIntoView`` on the parent document's anchor. One-shot, so
   unrelated reruns (toggling Show-hidden, etc.) don't yank the
   viewport.

6. Download buttons hardened against the multi-button Streamlit
   footgun: byte buffers pre-computed outside the column scopes,
   explicit unique ``key="<tool>_dl_<purpose>"`` per button,
   ``use_container_width=True``, and previously-conditional buttons
   now render unconditionally with ``disabled=True`` + a help
   tooltip when the underlying data is empty so layout stays steady.

Per-page judgment calls (already noted in agent reports):

- Find Duplicates: sheet picker and delimiter selector kept OUTSIDE
  expanders (the user still needs to see them when a file fails to
  parse).
- Fix Missing Values: missingness profile wrapped INSIDE the Options
  expander together with Strategy — the Results section already
  shows a before/after missingness comparison that supersedes the
  static input profile.
- Map Columns: all three subsections (Target schema, Strategy,
  Mapping) wrapped under one outer Options expander, matching the
  Text Cleaner pattern.
- Automated Workflows: inner "Recommended tool order" expander stays
  nested inside the outer Options wrap; Run button stays outside
  Options so the user can re-run after tweaking the (collapsed)
  editor.

2008 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-16 21:04:37 +00:00
parent d1aaf3c2b9
commit 6415be8bf4
5 changed files with 1250 additions and 879 deletions

View File

@@ -88,224 +88,240 @@ except Exception as e:
)
st.stop()
st.subheader(f"Preview: {uploaded.name}")
st.caption(f"{len(df)} rows, {len(df.columns)} columns")
st.dataframe(df.head(10), use_container_width=True)
# Collapse the input preview once the user has clicked Apply Column
# Mapping so the Results section below is the primary visual focus.
# The user can re-expand the expander to re-inspect the source rows.
_has_result = st.session_state.get("colmap_result") is not None
with st.expander(f"Preview: {uploaded.name}", expanded=not _has_result):
st.caption(f"{len(df)} rows, {len(df.columns)} columns")
st.dataframe(df.head(10), use_container_width=True)
st.divider()
# ---------------------------------------------------------------------------
# Schema input
# Options (Target schema + Strategy + Mapping)
# ---------------------------------------------------------------------------
#
# Wrapped in an outer expander whose default state mirrors the preview
# expander above: open before a result exists, folded once the user has
# clicked Apply Column Mapping. The Mapping editor is the heart of the
# tool, but per the Text Cleaner pattern we still collapse everything
# post-run — the user can re-expand to tweak any of the three sections.
st.subheader("Target schema")
with st.expander("Options", expanded=not _has_result):
# -----------------------------------------------------------------------
# Schema input
# -----------------------------------------------------------------------
schema_mode = st.radio(
"How would you like to define the target schema?",
[
"Build interactively (start from current columns)",
"Upload schema JSON",
"Skip (rename / coerce only — no schema)",
],
index=0,
help=(
"An interactive build is fastest for one-off cleanup. Upload a JSON "
"when you have a fixed contract (a CRM import format, db schema). "
"Skip when you only want to rename or coerce specific columns."
),
)
st.subheader("Target schema")
schema: TargetSchema | None = None
if schema_mode.startswith("Upload"):
schema_file = st.file_uploader(
"Schema JSON",
type=["json"],
key="colmap_schema_upload",
help='Format: {"fields": [{"name": "email", "dtype": "string", "required": true, "aliases": ["EmailAddr"]}, ...]}',
schema_mode = st.radio(
"How would you like to define the target schema?",
[
"Build interactively (start from current columns)",
"Upload schema JSON",
"Skip (rename / coerce only — no schema)",
],
index=0,
help=(
"An interactive build is fastest for one-off cleanup. Upload a JSON "
"when you have a fixed contract (a CRM import format, db schema). "
"Skip when you only want to rename or coerce specific columns."
),
)
if schema_file is not None:
try:
schema = TargetSchema.from_dict(json.loads(schema_file.getvalue()))
st.success(f"Loaded {len(schema.fields)} target field(s).")
except Exception as e:
from src.core.errors import format_for_user
st.error(f"**Could not parse schema**\n\n```\n{format_for_user(e)}\n```")
elif schema_mode.startswith("Build"):
st.caption(
"Edit the table to define your target schema. Add rows for fields the "
"input doesn't have yet (with a default), or remove rows for columns "
"you want to drop."
schema: TargetSchema | None = None
if schema_mode.startswith("Upload"):
schema_file = st.file_uploader(
"Schema JSON",
type=["json"],
key="colmap_schema_upload",
help='Format: {"fields": [{"name": "email", "dtype": "string", "required": true, "aliases": ["EmailAddr"]}, ...]}',
)
if schema_file is not None:
try:
schema = TargetSchema.from_dict(json.loads(schema_file.getvalue()))
st.success(f"Loaded {len(schema.fields)} target field(s).")
except Exception as e:
from src.core.errors import format_for_user
st.error(f"**Could not parse schema**\n\n```\n{format_for_user(e)}\n```")
elif schema_mode.startswith("Build"):
st.caption(
"Edit the table to define your target schema. Add rows for fields the "
"input doesn't have yet (with a default), or remove rows for columns "
"you want to drop."
)
initial = pd.DataFrame({
"name": list(df.columns),
"dtype": ["auto"] * len(df.columns),
"required": [False] * len(df.columns),
"default": [""] * len(df.columns),
"aliases": [""] * len(df.columns),
})
edited = st.data_editor(
initial,
use_container_width=True,
num_rows="dynamic",
column_config={
"name": st.column_config.TextColumn("Target name"),
"dtype": st.column_config.SelectboxColumn(
"Type",
options=[
"auto", "string", "integer", "float",
"boolean", "date", "datetime", "category",
],
),
"required": st.column_config.CheckboxColumn("Required"),
"default": st.column_config.TextColumn("Default (for added cols)"),
"aliases": st.column_config.TextColumn(
"Aliases (comma-sep, helps fuzzy-match)",
),
},
key="colmap_schema_editor",
)
fields: list[TargetField] = []
for _, row in edited.iterrows():
name = str(row.get("name", "")).strip()
if not name:
continue
aliases = [
a.strip() for a in str(row.get("aliases", "") or "").split(",")
if a.strip()
]
default_raw = row.get("default")
default_val = (
default_raw if (default_raw not in (None, "", float("nan")))
else None
)
try:
if isinstance(default_val, float) and pd.isna(default_val):
default_val = None
except TypeError:
pass
fields.append(TargetField(
name=name,
dtype=str(row.get("dtype", "auto")), # type: ignore[arg-type]
required=bool(row.get("required", False)),
aliases=aliases,
default=default_val,
))
if fields:
schema = TargetSchema(fields=fields)
st.divider()
# -----------------------------------------------------------------------
# Strategy
# -----------------------------------------------------------------------
st.subheader("Strategy")
preset_label = st.radio(
"Preset",
[
"rename-only (just rename, leave types alone, keep extras)",
"lenient-schema (rename + coerce + reorder, keep extras)",
"strict-schema (rename + coerce + reorder, drop extras)",
],
index=0,
)
initial = pd.DataFrame({
"name": list(df.columns),
"dtype": ["auto"] * len(df.columns),
"required": [False] * len(df.columns),
"default": [""] * len(df.columns),
"aliases": [""] * len(df.columns),
})
edited = st.data_editor(
initial,
use_container_width=True,
num_rows="dynamic",
column_config={
"name": st.column_config.TextColumn("Target name"),
"dtype": st.column_config.SelectboxColumn(
"Type",
options=[
"auto", "string", "integer", "float",
"boolean", "date", "datetime", "category",
],
),
"required": st.column_config.CheckboxColumn("Required"),
"default": st.column_config.TextColumn("Default (for added cols)"),
"aliases": st.column_config.TextColumn(
"Aliases (comma-sep, helps fuzzy-match)",
),
},
key="colmap_schema_editor",
)
fields: list[TargetField] = []
for _, row in edited.iterrows():
name = str(row.get("name", "")).strip()
if not name:
continue
aliases = [
a.strip() for a in str(row.get("aliases", "") or "").split(",")
if a.strip()
]
default_raw = row.get("default")
default_val = (
default_raw if (default_raw not in (None, "", float("nan")))
else None
preset_key = preset_label.split(" ", 1)[0]
options = MapOptions.from_preset(preset_key)
options.schema = schema
with st.expander("Advanced options"):
col_a, col_b = st.columns(2)
with col_a:
options.unmapped = st.selectbox( # type: ignore[assignment]
"Unmapped source columns",
["keep", "drop", "error"],
index=["keep", "drop", "error"].index(options.unmapped),
)
options.coerce_types = st.checkbox(
"Coerce types per schema", value=options.coerce_types,
)
options.reorder_to_schema = st.checkbox(
"Reorder to schema order", value=options.reorder_to_schema,
)
with col_b:
options.auto_infer = st.checkbox(
"Auto-infer mapping (fuzzy match)", value=options.auto_infer,
)
options.fuzzy_threshold = st.slider(
"Fuzzy match threshold", 0.0, 1.0, options.fuzzy_threshold, 0.05,
)
options.enforce_required = st.checkbox(
"Enforce required fields", value=options.enforce_required,
)
# -----------------------------------------------------------------------
# Mapping editor — show inferred and let user override
# -----------------------------------------------------------------------
st.subheader("Mapping")
if schema is None:
st.caption(
"No schema — define explicit renames below (left blank means keep "
"the source name)."
)
try:
if isinstance(default_val, float) and pd.isna(default_val):
default_val = None
except TypeError:
pass
fields.append(TargetField(
name=name,
dtype=str(row.get("dtype", "auto")), # type: ignore[arg-type]
required=bool(row.get("required", False)),
aliases=aliases,
default=default_val,
))
if fields:
schema = TargetSchema(fields=fields)
st.divider()
# ---------------------------------------------------------------------------
# Strategy
# ---------------------------------------------------------------------------
st.subheader("Strategy")
preset_label = st.radio(
"Preset",
[
"rename-only (just rename, leave types alone, keep extras)",
"lenient-schema (rename + coerce + reorder, keep extras)",
"strict-schema (rename + coerce + reorder, drop extras)",
],
index=0,
)
preset_key = preset_label.split(" ", 1)[0]
options = MapOptions.from_preset(preset_key)
options.schema = schema
with st.expander("Advanced options"):
col_a, col_b = st.columns(2)
with col_a:
options.unmapped = st.selectbox( # type: ignore[assignment]
"Unmapped source columns",
["keep", "drop", "error"],
index=["keep", "drop", "error"].index(options.unmapped),
rename_initial = pd.DataFrame({
"source": list(df.columns),
"target": list(df.columns),
})
rename_edited = st.data_editor(
rename_initial,
use_container_width=True,
column_config={
"source": st.column_config.TextColumn("Source", disabled=True),
"target": st.column_config.TextColumn("Target"),
},
hide_index=True,
key="colmap_rename_only_editor",
)
options.coerce_types = st.checkbox(
"Coerce types per schema", value=options.coerce_types,
explicit_mapping: dict[str, str] = {}
for _, row in rename_edited.iterrows():
src = str(row["source"])
tgt = str(row["target"]).strip()
if tgt and tgt != src:
explicit_mapping[src] = tgt
options.mapping = explicit_mapping
else:
inferred = (
infer_mapping(df, schema, threshold=options.fuzzy_threshold)
if options.auto_infer else {}
)
options.reorder_to_schema = st.checkbox(
"Reorder to schema order", value=options.reorder_to_schema,
target_options = ["(unmapped)"] + schema.field_names()
map_initial = pd.DataFrame({
"source": list(df.columns),
"target": [inferred.get(c, "(unmapped)") for c in df.columns],
"auto": [c in inferred for c in df.columns],
})
map_edited = st.data_editor(
map_initial,
use_container_width=True,
column_config={
"source": st.column_config.TextColumn("Source", disabled=True),
"target": st.column_config.SelectboxColumn(
"Target", options=target_options,
),
"auto": st.column_config.CheckboxColumn("Auto-suggested", disabled=True),
},
hide_index=True,
key="colmap_schema_mapping_editor",
)
with col_b:
options.auto_infer = st.checkbox(
"Auto-infer mapping (fuzzy match)", value=options.auto_infer,
)
options.fuzzy_threshold = st.slider(
"Fuzzy match threshold", 0.0, 1.0, options.fuzzy_threshold, 0.05,
)
options.enforce_required = st.checkbox(
"Enforce required fields", value=options.enforce_required,
)
# ---------------------------------------------------------------------------
# Mapping editor — show inferred and let user override
# ---------------------------------------------------------------------------
st.subheader("Mapping")
if schema is None:
st.caption(
"No schema — define explicit renames below (left blank means keep "
"the source name)."
)
rename_initial = pd.DataFrame({
"source": list(df.columns),
"target": list(df.columns),
})
rename_edited = st.data_editor(
rename_initial,
use_container_width=True,
column_config={
"source": st.column_config.TextColumn("Source", disabled=True),
"target": st.column_config.TextColumn("Target"),
},
hide_index=True,
key="colmap_rename_only_editor",
)
explicit_mapping: dict[str, str] = {}
for _, row in rename_edited.iterrows():
src = str(row["source"])
tgt = str(row["target"]).strip()
if tgt and tgt != src:
explicit_mapping[src] = tgt
options.mapping = explicit_mapping
else:
inferred = (
infer_mapping(df, schema, threshold=options.fuzzy_threshold)
if options.auto_infer else {}
)
target_options = ["(unmapped)"] + schema.field_names()
map_initial = pd.DataFrame({
"source": list(df.columns),
"target": [inferred.get(c, "(unmapped)") for c in df.columns],
"auto": [c in inferred for c in df.columns],
})
map_edited = st.data_editor(
map_initial,
use_container_width=True,
column_config={
"source": st.column_config.TextColumn("Source", disabled=True),
"target": st.column_config.SelectboxColumn(
"Target", options=target_options,
),
"auto": st.column_config.CheckboxColumn("Auto-suggested", disabled=True),
},
hide_index=True,
key="colmap_schema_mapping_editor",
)
explicit_mapping = {}
for _, row in map_edited.iterrows():
src = str(row["source"])
tgt = str(row["target"])
if tgt and tgt != "(unmapped)":
explicit_mapping[src] = tgt
options.mapping = explicit_mapping
# Disable auto-infer for the actual run since the editor already shows
# the user's resolved choices (they can manually re-select to add).
options.auto_infer = False
explicit_mapping = {}
for _, row in map_edited.iterrows():
src = str(row["source"])
tgt = str(row["target"])
if tgt and tgt != "(unmapped)":
explicit_mapping[src] = tgt
options.mapping = explicit_mapping
# Disable auto-infer for the actual run since the editor already shows
# the user's resolved choices (they can manually re-select to add).
options.auto_infer = False
# ---------------------------------------------------------------------------
# Run
@@ -324,6 +340,12 @@ if st.button("Apply Column Mapping", type="primary", use_container_width=True):
st.session_state["colmap_result"] = result
st.session_state["colmap_input_name"] = uploaded.name
st.session_state["colmap_options"] = options.to_dict()
# One-shot flag picked up on the next pass to scroll the parent
# document to the Results anchor (see scroll snippet below).
st.session_state["_colmap_scroll_to_results"] = True
# Force a second rerun so the preview and options expanders see
# the new result on the NEXT script pass and collapse themselves.
st.rerun()
result = st.session_state.get("colmap_result")
if result is None:
@@ -334,6 +356,16 @@ if result is None:
# Results
# ---------------------------------------------------------------------------
# Anchor target for the auto-scroll snippet at the end of this block.
# A bare ``<div id="...">`` survives Streamlit's HTML sanitizer (only
# ``<script>`` is stripped), and a 1px-tall div doesn't visually shift
# anything. Placed before the subheader so the scrolled-to viewport
# starts a few pixels above the section heading rather than below it.
st.markdown(
'<div id="colmap-results-anchor" style="height:1px"></div>',
unsafe_allow_html=True,
)
st.subheader("Results")
m1, m2, m3, m4 = st.columns(4)
@@ -371,46 +403,90 @@ st.dataframe(result.mapped_df.head(10), use_container_width=True)
# ---------------------------------------------------------------------------
# Downloads
# ---------------------------------------------------------------------------
#
# All three byte buffers are prepared up front (outside the columns) so
# each ``st.download_button`` sees stable ``data`` across reruns and an
# explicit ``key`` — without those, Streamlit auto-derived widget IDs
# can collide for multiple download_buttons in adjacent columns and
# only the first one actually fires on click.
st.divider()
stem = Path(st.session_state.get("colmap_input_name", "input")).stem
mapped_bytes = result.mapped_df.to_csv(index=False).encode("utf-8-sig")
audit_bytes = json.dumps({
"mapping": result.mapping,
"inferred_pairs": result.inferred_pairs,
"columns_renamed": result.columns_renamed,
"columns_dropped": result.columns_dropped,
"columns_added": result.columns_added,
"coercion_failures": result.coercion_failures,
"unmapped_kept": result.unmapped_kept,
"missing_required_targets": result.missing_required_targets,
}, indent=2, default=str).encode("utf-8")
config_bytes = json.dumps(
st.session_state.get("colmap_options", {}), indent=2, default=str,
).encode("utf-8")
_no_mapping = not result.mapping
dl_a, dl_b, dl_c = st.columns(3)
with dl_a:
mapped_bytes = result.mapped_df.to_csv(index=False).encode("utf-8-sig")
st.download_button(
"Download mapped CSV",
data=mapped_bytes,
file_name=f"{stem}_mapped.csv",
mime="text/csv",
key="colmap_dl_mapped",
use_container_width=True,
)
with dl_b:
audit_bytes = json.dumps({
"mapping": result.mapping,
"inferred_pairs": result.inferred_pairs,
"columns_renamed": result.columns_renamed,
"columns_dropped": result.columns_dropped,
"columns_added": result.columns_added,
"coercion_failures": result.coercion_failures,
"unmapped_kept": result.unmapped_kept,
"missing_required_targets": result.missing_required_targets,
}, indent=2, default=str).encode("utf-8")
st.download_button(
"Download mapping audit",
data=audit_bytes,
file_name=f"{stem}_mapping.json",
mime="application/json",
key="colmap_dl_audit",
disabled=_no_mapping,
help="No mapping was applied." if _no_mapping else None,
use_container_width=True,
)
with dl_c:
config_bytes = json.dumps(
st.session_state.get("colmap_options", {}), indent=2, default=str,
).encode("utf-8")
st.download_button(
"Download config JSON",
data=config_bytes,
file_name="column_map_config.json",
mime="application/json",
key="colmap_dl_config",
use_container_width=True,
)
st.divider()
st.caption("Runs locally. Your data never leaves this computer. | DataTools v3.0")
# ---------------------------------------------------------------------------
# Post-run auto-scroll
# ---------------------------------------------------------------------------
#
# When the user clicks Apply Column Mapping, the preview + options
# collapse but Streamlit by itself doesn't scroll — the Results section
# is at the bottom of a tall script so the user has to find it. Inject
# a tiny component-html iframe that calls ``scrollIntoView`` on the
# parent's Results anchor. Streamlit's main page is same-origin with
# component iframes so ``window.parent.document`` access is allowed.
#
# The flag is one-shot (``pop`` removes it) so re-renders triggered by
# unrelated widgets in the Results section don't yank the viewport back
# to the top of Results.
if st.session_state.pop("_colmap_scroll_to_results", False):
from streamlit.components.v1 import html as _components_html
_components_html(
"""
<script>
const doc = window.parent.document;
const target = doc.getElementById('colmap-results-anchor');
if (target) target.scrollIntoView({behavior: 'smooth', block: 'start'});
</script>
""",
height=0,
)