feat(tools): unified post-run UX across all Ready tool pages
Apply the Clean Text page's post-run UX pattern to every other Ready
tool page (Find Duplicates, Standardize Formats, Fix Missing Values,
Map Columns, Automated Workflows) for consistency and ease of use.
Per page:
1. Preview wrapped in ``st.expander(f"Preview: {filename}",
expanded=not _has_result)``. Open before a result exists, folded
afterwards.
2. Options / configuration controls wrapped in
``st.expander("Options", expanded=not _has_result)``. Inner
sub-expanders preserved (Streamlit 1.36+ supports nesting).
3. After the primary action stashes the result, set a one-shot
``_<tool>_scroll_to_results`` flag in session state and call
``st.rerun()`` so the preview + options expanders see the new
state on the next pass and collapse themselves.
4. ``<div id="<tool>-results-anchor" style="height:1px">`` placed
immediately before the Results subheader.
5. End-of-page: pop the scroll flag and inject a tiny
``streamlit.components.v1.html`` iframe whose ``<script>`` calls
``scrollIntoView`` on the parent document's anchor. One-shot, so
unrelated reruns (toggling Show-hidden, etc.) don't yank the
viewport.
6. Download buttons hardened against the multi-button Streamlit
footgun: byte buffers pre-computed outside the column scopes,
explicit unique ``key="<tool>_dl_<purpose>"`` per button,
``use_container_width=True``, and previously-conditional buttons
now render unconditionally with ``disabled=True`` + a help
tooltip when the underlying data is empty so layout stays steady.
Per-page judgment calls (already noted in agent reports):
- Find Duplicates: sheet picker and delimiter selector kept OUTSIDE
expanders (the user still needs to see them when a file fails to
parse).
- Fix Missing Values: missingness profile wrapped INSIDE the Options
expander together with Strategy — the Results section already
shows a before/after missingness comparison that supersedes the
static input profile.
- Map Columns: all three subsections (Target schema, Strategy,
Mapping) wrapped under one outer Options expander, matching the
Text Cleaner pattern.
- Automated Workflows: inner "Recommended tool order" expander stays
nested inside the outer Options wrap; Run button stays outside
Options so the user can re-run after tweaking the (collapsed)
editor.
2008 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -89,139 +89,149 @@ except Exception as e:
|
||||
)
|
||||
st.stop()
|
||||
|
||||
st.subheader(f"Preview: {uploaded.name}")
|
||||
st.caption(f"{len(df)} rows, {len(df.columns)} columns")
|
||||
st.dataframe(df.head(10), use_container_width=True)
|
||||
# Collapse the input preview and pipeline editor once the user has clicked
|
||||
# Run Pipeline so the Results section below is the primary visual focus.
|
||||
# The user can re-expand either expander to re-inspect or adjust.
|
||||
_has_result = st.session_state.get("pipeline_result") is not None
|
||||
|
||||
with st.expander(f"Preview: {uploaded.name}", expanded=not _has_result):
|
||||
st.caption(f"{len(df)} rows, {len(df.columns)} columns")
|
||||
st.dataframe(df.head(10), use_container_width=True)
|
||||
|
||||
st.divider()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Pipeline builder
|
||||
# ---------------------------------------------------------------------------
|
||||
#
|
||||
# Wrapped in an outer expander whose default state mirrors the preview
|
||||
# expander above: open before a result exists, folded once the user has
|
||||
# clicked Run Pipeline. The pipeline editor is this page's "Options"
|
||||
# section — structurally analogous to Text Cleaner's options block.
|
||||
|
||||
st.subheader("Pipeline")
|
||||
|
||||
mode = st.radio(
|
||||
"How would you like to define the pipeline?",
|
||||
[
|
||||
"Use the recommended default (text-clean → format → missing → dedup)",
|
||||
"Build interactively",
|
||||
"Upload a saved pipeline JSON",
|
||||
],
|
||||
index=0,
|
||||
)
|
||||
|
||||
if "pipeline_rows" not in st.session_state:
|
||||
default = recommended_pipeline()
|
||||
st.session_state["pipeline_rows"] = pd.DataFrame([
|
||||
{
|
||||
"tool": s.tool, "enabled": s.enabled,
|
||||
"options_json": json.dumps(s.options),
|
||||
}
|
||||
for s in default.steps
|
||||
])
|
||||
|
||||
if mode.startswith("Use the recommended"):
|
||||
default = recommended_pipeline()
|
||||
st.session_state["pipeline_rows"] = pd.DataFrame([
|
||||
{
|
||||
"tool": s.tool, "enabled": s.enabled,
|
||||
"options_json": json.dumps(s.options),
|
||||
}
|
||||
for s in default.steps
|
||||
])
|
||||
elif mode.startswith("Upload"):
|
||||
pipeline_file = st.file_uploader(
|
||||
"Pipeline JSON", type=["json"], key="pipeline_upload",
|
||||
with st.expander("Options", expanded=not _has_result):
|
||||
mode = st.radio(
|
||||
"How would you like to define the pipeline?",
|
||||
[
|
||||
"Use the recommended default (text-clean → format → missing → dedup)",
|
||||
"Build interactively",
|
||||
"Upload a saved pipeline JSON",
|
||||
],
|
||||
index=0,
|
||||
)
|
||||
if pipeline_file is not None:
|
||||
|
||||
if "pipeline_rows" not in st.session_state:
|
||||
default = recommended_pipeline()
|
||||
st.session_state["pipeline_rows"] = pd.DataFrame([
|
||||
{
|
||||
"tool": s.tool, "enabled": s.enabled,
|
||||
"options_json": json.dumps(s.options),
|
||||
}
|
||||
for s in default.steps
|
||||
])
|
||||
|
||||
if mode.startswith("Use the recommended"):
|
||||
default = recommended_pipeline()
|
||||
st.session_state["pipeline_rows"] = pd.DataFrame([
|
||||
{
|
||||
"tool": s.tool, "enabled": s.enabled,
|
||||
"options_json": json.dumps(s.options),
|
||||
}
|
||||
for s in default.steps
|
||||
])
|
||||
elif mode.startswith("Upload"):
|
||||
pipeline_file = st.file_uploader(
|
||||
"Pipeline JSON", type=["json"], key="pipeline_upload",
|
||||
)
|
||||
if pipeline_file is not None:
|
||||
try:
|
||||
data = json.loads(pipeline_file.getvalue())
|
||||
uploaded_pipe = Pipeline.from_dict(data)
|
||||
st.session_state["pipeline_rows"] = pd.DataFrame([
|
||||
{
|
||||
"tool": s.tool, "enabled": s.enabled,
|
||||
"options_json": json.dumps(s.options),
|
||||
}
|
||||
for s in uploaded_pipe.steps
|
||||
])
|
||||
st.success(f"Loaded {len(uploaded_pipe.steps)} step(s).")
|
||||
except Exception as e:
|
||||
from src.core.errors import format_for_user
|
||||
st.error(f"**Could not parse pipeline**\n\n```\n{format_for_user(e)}\n```")
|
||||
|
||||
st.caption(
|
||||
"Edit the table to add, remove, reorder (drag the row index), enable, "
|
||||
"or configure each step. Tool order is recommended, not enforced — "
|
||||
"violations surface as warnings below the table."
|
||||
)
|
||||
edited = st.data_editor(
|
||||
st.session_state["pipeline_rows"],
|
||||
use_container_width=True,
|
||||
num_rows="dynamic",
|
||||
column_config={
|
||||
"tool": st.column_config.SelectboxColumn(
|
||||
"Tool", options=TOOL_NAMES, required=True,
|
||||
),
|
||||
"enabled": st.column_config.CheckboxColumn("Enabled"),
|
||||
"options_json": st.column_config.TextColumn(
|
||||
"Options (JSON)",
|
||||
help='e.g. {"column_types": {"phone": "phone"}}',
|
||||
),
|
||||
},
|
||||
key="pipeline_editor",
|
||||
)
|
||||
st.session_state["pipeline_rows"] = edited
|
||||
|
||||
# Build a Pipeline object from the editor state.
|
||||
steps_list: list[Step] = []
|
||||
parse_errors: list[str] = []
|
||||
for i, row in edited.iterrows():
|
||||
tool = row.get("tool")
|
||||
if not tool or pd.isna(tool):
|
||||
continue
|
||||
raw_opts = row.get("options_json") or "{}"
|
||||
if pd.isna(raw_opts):
|
||||
raw_opts = "{}"
|
||||
try:
|
||||
data = json.loads(pipeline_file.getvalue())
|
||||
uploaded_pipe = Pipeline.from_dict(data)
|
||||
st.session_state["pipeline_rows"] = pd.DataFrame([
|
||||
{
|
||||
"tool": s.tool, "enabled": s.enabled,
|
||||
"options_json": json.dumps(s.options),
|
||||
}
|
||||
for s in uploaded_pipe.steps
|
||||
])
|
||||
st.success(f"Loaded {len(uploaded_pipe.steps)} step(s).")
|
||||
opts = json.loads(raw_opts) if isinstance(raw_opts, str) else dict(raw_opts)
|
||||
if not isinstance(opts, dict):
|
||||
raise ValueError("options must be a JSON object")
|
||||
except Exception as e:
|
||||
from src.core.errors import format_for_user
|
||||
st.error(f"**Could not parse pipeline**\n\n```\n{format_for_user(e)}\n```")
|
||||
parse_errors.append(f"Step {i + 1}: {e}")
|
||||
continue
|
||||
try:
|
||||
steps_list.append(Step(
|
||||
tool=str(tool),
|
||||
options=opts,
|
||||
enabled=bool(row.get("enabled", True)),
|
||||
))
|
||||
except Exception as e:
|
||||
parse_errors.append(f"Step {i + 1}: {e}")
|
||||
|
||||
st.caption(
|
||||
"Edit the table to add, remove, reorder (drag the row index), enable, "
|
||||
"or configure each step. Tool order is recommended, not enforced — "
|
||||
"violations surface as warnings below the table."
|
||||
)
|
||||
edited = st.data_editor(
|
||||
st.session_state["pipeline_rows"],
|
||||
use_container_width=True,
|
||||
num_rows="dynamic",
|
||||
column_config={
|
||||
"tool": st.column_config.SelectboxColumn(
|
||||
"Tool", options=TOOL_NAMES, required=True,
|
||||
),
|
||||
"enabled": st.column_config.CheckboxColumn("Enabled"),
|
||||
"options_json": st.column_config.TextColumn(
|
||||
"Options (JSON)",
|
||||
help='e.g. {"column_types": {"phone": "phone"}}',
|
||||
),
|
||||
},
|
||||
key="pipeline_editor",
|
||||
)
|
||||
st.session_state["pipeline_rows"] = edited
|
||||
if parse_errors:
|
||||
for err in parse_errors:
|
||||
st.error(err)
|
||||
|
||||
# Build a Pipeline object from the editor state.
|
||||
steps_list: list[Step] = []
|
||||
parse_errors: list[str] = []
|
||||
for i, row in edited.iterrows():
|
||||
tool = row.get("tool")
|
||||
if not tool or pd.isna(tool):
|
||||
continue
|
||||
raw_opts = row.get("options_json") or "{}"
|
||||
if pd.isna(raw_opts):
|
||||
raw_opts = "{}"
|
||||
try:
|
||||
opts = json.loads(raw_opts) if isinstance(raw_opts, str) else dict(raw_opts)
|
||||
if not isinstance(opts, dict):
|
||||
raise ValueError("options must be a JSON object")
|
||||
except Exception as e:
|
||||
parse_errors.append(f"Step {i + 1}: {e}")
|
||||
continue
|
||||
try:
|
||||
steps_list.append(Step(
|
||||
tool=str(tool),
|
||||
options=opts,
|
||||
enabled=bool(row.get("enabled", True)),
|
||||
))
|
||||
except Exception as e:
|
||||
parse_errors.append(f"Step {i + 1}: {e}")
|
||||
current_pipeline = Pipeline(steps=steps_list) if steps_list else None
|
||||
|
||||
if parse_errors:
|
||||
for err in parse_errors:
|
||||
st.error(err)
|
||||
if current_pipeline is not None:
|
||||
warnings = validate_pipeline(current_pipeline)
|
||||
if warnings:
|
||||
st.warning(
|
||||
"Pipeline is out of recommended order:\n\n"
|
||||
+ "\n".join(f"- {w}" for w in warnings)
|
||||
+ "\n\nThe pipeline will still run — these are recommendations only."
|
||||
)
|
||||
|
||||
current_pipeline = Pipeline(steps=steps_list) if steps_list else None
|
||||
|
||||
if current_pipeline is not None:
|
||||
warnings = validate_pipeline(current_pipeline)
|
||||
if warnings:
|
||||
st.warning(
|
||||
"Pipeline is out of recommended order:\n\n"
|
||||
+ "\n".join(f"- {w}" for w in warnings)
|
||||
+ "\n\nThe pipeline will still run — these are recommendations only."
|
||||
with st.expander("Recommended tool order — why each step belongs where it does"):
|
||||
st.markdown(
|
||||
"\n".join(
|
||||
f"- **{e}** before **{l}** — {why}"
|
||||
for e, l, why in SOFT_DEPENDENCIES
|
||||
)
|
||||
)
|
||||
|
||||
with st.expander("Recommended tool order — why each step belongs where it does"):
|
||||
st.markdown(
|
||||
"\n".join(
|
||||
f"- **{e}** before **{l}** — {why}"
|
||||
for e, l, why in SOFT_DEPENDENCIES
|
||||
)
|
||||
)
|
||||
|
||||
st.divider()
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -274,6 +284,14 @@ if st.button(
|
||||
progress.progress(1.0, text="Done")
|
||||
st.session_state["pipeline_result"] = result
|
||||
st.session_state["pipeline_input_name"] = uploaded.name
|
||||
# One-shot flag picked up on the next pass to scroll the parent
|
||||
# document to the Results anchor (see scroll snippet at end of file).
|
||||
st.session_state["_pipeline_scroll_to_results"] = True
|
||||
# Force a second rerun so the preview and options expanders see
|
||||
# the new result on the NEXT script pass and collapse themselves.
|
||||
# Without this they stay expanded until the user touches any
|
||||
# other widget.
|
||||
st.rerun()
|
||||
|
||||
result = st.session_state.get("pipeline_result")
|
||||
if result is None:
|
||||
@@ -287,6 +305,16 @@ if result is None:
|
||||
# Results
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
# Anchor target for the auto-scroll snippet at the end of this block.
|
||||
# A bare ``<div id="...">`` survives Streamlit's HTML sanitizer (only
|
||||
# ``<script>`` is stripped), and a 1px-tall div doesn't visually shift
|
||||
# anything. Placed before the subheader so the scrolled-to viewport
|
||||
# starts a few pixels above the section heading rather than below it.
|
||||
st.markdown(
|
||||
'<div id="pipeline-results-anchor" style="height:1px"></div>',
|
||||
unsafe_allow_html=True,
|
||||
)
|
||||
|
||||
st.subheader("Results")
|
||||
|
||||
m1, m2, m3, m4 = st.columns(4)
|
||||
@@ -318,56 +346,105 @@ st.dataframe(result.final_df.head(10), use_container_width=True)
|
||||
# ---------------------------------------------------------------------------
|
||||
# Downloads
|
||||
# ---------------------------------------------------------------------------
|
||||
#
|
||||
# All three byte buffers are prepared up front (outside the columns) so
|
||||
# each ``st.download_button`` sees stable ``data`` across reruns and an
|
||||
# explicit ``key`` — without those, Streamlit auto-derived widget IDs
|
||||
# can collide for multiple download_buttons in adjacent columns and
|
||||
# only the first one actually fires on click. The pipeline-JSON button
|
||||
# now renders unconditionally (disabled when no pipeline is defined)
|
||||
# so the layout stays steady.
|
||||
|
||||
st.divider()
|
||||
stem = Path(st.session_state.get("pipeline_input_name", "input")).stem
|
||||
|
||||
cleaned_bytes = result.final_df.to_csv(index=False).encode("utf-8-sig")
|
||||
pipeline_bytes = json.dumps(
|
||||
current_pipeline.to_dict() if current_pipeline else {"steps": []},
|
||||
indent=2, default=str,
|
||||
).encode("utf-8")
|
||||
audit_bytes = json.dumps({
|
||||
"warnings": result.warnings,
|
||||
"initial_rows": result.initial_rows,
|
||||
"final_rows": result.final_rows,
|
||||
"total_elapsed_seconds": result.total_elapsed,
|
||||
"steps": [
|
||||
{
|
||||
"tool": sr.step.tool,
|
||||
"name": sr.step.display_name(),
|
||||
"enabled": sr.step.enabled,
|
||||
"skipped": sr.skipped,
|
||||
"elapsed_seconds": sr.elapsed_seconds,
|
||||
"summary": sr.summary,
|
||||
"error": sr.error,
|
||||
}
|
||||
for sr in result.step_results
|
||||
],
|
||||
}, indent=2, default=str).encode("utf-8")
|
||||
|
||||
_pipeline_empty = current_pipeline is None or not current_pipeline.steps
|
||||
|
||||
dl_a, dl_b, dl_c = st.columns(3)
|
||||
with dl_a:
|
||||
bytes_csv = result.final_df.to_csv(index=False).encode("utf-8-sig")
|
||||
st.download_button(
|
||||
"Download cleaned CSV",
|
||||
data=bytes_csv,
|
||||
data=cleaned_bytes,
|
||||
file_name=f"{stem}_pipeline.csv",
|
||||
mime="text/csv",
|
||||
key="pipeline_dl_cleaned",
|
||||
use_container_width=True,
|
||||
)
|
||||
with dl_b:
|
||||
pipeline_bytes = json.dumps(
|
||||
current_pipeline.to_dict() if current_pipeline else {"steps": []},
|
||||
indent=2, default=str,
|
||||
).encode("utf-8")
|
||||
st.download_button(
|
||||
"Download pipeline JSON",
|
||||
data=pipeline_bytes,
|
||||
file_name="pipeline.json",
|
||||
mime="application/json",
|
||||
help="Save this and pass --pipeline pipeline.json to the CLI to re-run on next week's file.",
|
||||
key="pipeline_dl_pipeline",
|
||||
disabled=_pipeline_empty,
|
||||
help=(
|
||||
"No pipeline defined."
|
||||
if _pipeline_empty
|
||||
else "Save this and pass --pipeline pipeline.json to the CLI to re-run on next week's file."
|
||||
),
|
||||
use_container_width=True,
|
||||
)
|
||||
with dl_c:
|
||||
audit_bytes = json.dumps({
|
||||
"warnings": result.warnings,
|
||||
"initial_rows": result.initial_rows,
|
||||
"final_rows": result.final_rows,
|
||||
"total_elapsed_seconds": result.total_elapsed,
|
||||
"steps": [
|
||||
{
|
||||
"tool": sr.step.tool,
|
||||
"name": sr.step.display_name(),
|
||||
"enabled": sr.step.enabled,
|
||||
"skipped": sr.skipped,
|
||||
"elapsed_seconds": sr.elapsed_seconds,
|
||||
"summary": sr.summary,
|
||||
"error": sr.error,
|
||||
}
|
||||
for sr in result.step_results
|
||||
],
|
||||
}, indent=2, default=str).encode("utf-8")
|
||||
st.download_button(
|
||||
"Download run audit",
|
||||
data=audit_bytes,
|
||||
file_name=f"{stem}_pipeline_audit.json",
|
||||
mime="application/json",
|
||||
key="pipeline_dl_audit",
|
||||
use_container_width=True,
|
||||
)
|
||||
|
||||
st.divider()
|
||||
st.caption("Runs locally. Your data never leaves this computer. | DataTools v3.0")
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Post-run auto-scroll
|
||||
# ---------------------------------------------------------------------------
|
||||
#
|
||||
# When the user clicks Run Pipeline, the preview + options collapse but
|
||||
# Streamlit by itself doesn't scroll — the Results section is at the
|
||||
# bottom of a tall script so the user has to find it. Inject a tiny
|
||||
# component-html iframe that calls ``scrollIntoView`` on the parent's
|
||||
# Results anchor. Streamlit's main page is same-origin with component
|
||||
# iframes so ``window.parent.document`` access is allowed.
|
||||
#
|
||||
# The flag is one-shot (``pop`` removes it) so re-renders triggered by
|
||||
# unrelated widgets in the Results section don't yank the viewport
|
||||
# back to the top of Results.
|
||||
if st.session_state.pop("_pipeline_scroll_to_results", False):
|
||||
from streamlit.components.v1 import html as _components_html
|
||||
_components_html(
|
||||
"""
|
||||
<script>
|
||||
const doc = window.parent.document;
|
||||
const target = doc.getElementById('pipeline-results-anchor');
|
||||
if (target) target.scrollIntoView({behavior: 'smooth', block: 'start'});
|
||||
</script>
|
||||
""",
|
||||
height=0,
|
||||
)
|
||||
|
||||
Reference in New Issue
Block a user