feat(tools): unified post-run UX across all Ready tool pages

Apply the Clean Text page's post-run UX pattern to every other Ready tool page (Find Duplicates, Standardize Formats, Fix Missing Values, Map Columns, Automated Workflows) for consistency and ease of use. Per page: 1. Preview wrapped in ``st.expander(f"Preview: {filename}", expanded=not _has_result)``. Open before a result exists, folded afterwards. 2. Options / configuration controls wrapped in ``st.expander("Options", expanded=not _has_result)``. Inner sub-expanders preserved (Streamlit 1.36+ supports nesting). 3. After the primary action stashes the result, set a one-shot ``_<tool>_scroll_to_results`` flag in session state and call ``st.rerun()`` so the preview + options expanders see the new state on the next pass and collapse themselves. 4. ``<div id="<tool>-results-anchor" style="height:1px">`` placed immediately before the Results subheader. 5. End-of-page: pop the scroll flag and inject a tiny ``streamlit.components.v1.html`` iframe whose ``<script>`` calls ``scrollIntoView`` on the parent document's anchor. One-shot, so unrelated reruns (toggling Show-hidden, etc.) don't yank the viewport. 6. Download buttons hardened against the multi-button Streamlit footgun: byte buffers pre-computed outside the column scopes, explicit unique ``key="<tool>_dl_<purpose>"`` per button, ``use_container_width=True``, and previously-conditional buttons now render unconditionally with ``disabled=True`` + a help tooltip when the underlying data is empty so layout stays steady. Per-page judgment calls (already noted in agent reports): - Find Duplicates: sheet picker and delimiter selector kept OUTSIDE expanders (the user still needs to see them when a file fails to parse). - Fix Missing Values: missingness profile wrapped INSIDE the Options expander together with Strategy — the Results section already shows a before/after missingness comparison that supersedes the static input profile. - Map Columns: all three subsections (Target schema, Strategy, Mapping) wrapped under one outer Options expander, matching the Text Cleaner pattern. - Automated Workflows: inner "Recommended tool order" expander stays nested inside the outer Options wrap; Run button stays outside Options so the user can re-run after tweaking the (collapsed) editor. 2008 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:04:37 +00:00
parent d1aaf3c2b9
commit 6415be8bf4
5 changed files with 1250 additions and 879 deletions
--- a/src/gui/pages/9_Pipeline_Runner.py
+++ b/src/gui/pages/9_Pipeline_Runner.py
@@ -89,139 +89,149 @@ except Exception as e:
    )
    st.stop()

-st.subheader(f"Preview: {uploaded.name}")
-st.caption(f"{len(df)} rows, {len(df.columns)} columns")
-st.dataframe(df.head(10), use_container_width=True)
+# Collapse the input preview and pipeline editor once the user has clicked
+# Run Pipeline so the Results section below is the primary visual focus.
+# The user can re-expand either expander to re-inspect or adjust.
+_has_result = st.session_state.get("pipeline_result") is not None
+
+with st.expander(f"Preview: {uploaded.name}", expanded=not _has_result):
+    st.caption(f"{len(df)} rows, {len(df.columns)} columns")
+    st.dataframe(df.head(10), use_container_width=True)
+
 st.divider()


 # ---------------------------------------------------------------------------
 # Pipeline builder
 # ---------------------------------------------------------------------------
+#
+# Wrapped in an outer expander whose default state mirrors the preview
+# expander above: open before a result exists, folded once the user has
+# clicked Run Pipeline. The pipeline editor is this page's "Options"
+# section — structurally analogous to Text Cleaner's options block.

-st.subheader("Pipeline")
-
-mode = st.radio(
-    "How would you like to define the pipeline?",
-    [
-        "Use the recommended default (text-clean → format → missing → dedup)",
-        "Build interactively",
-        "Upload a saved pipeline JSON",
-    ],
-    index=0,
-)
-
-if "pipeline_rows" not in st.session_state:
-    default = recommended_pipeline()
-    st.session_state["pipeline_rows"] = pd.DataFrame([
-        {
-            "tool": s.tool, "enabled": s.enabled,
-            "options_json": json.dumps(s.options),
-        }
-        for s in default.steps
-    ])
-
-if mode.startswith("Use the recommended"):
-    default = recommended_pipeline()
-    st.session_state["pipeline_rows"] = pd.DataFrame([
-        {
-            "tool": s.tool, "enabled": s.enabled,
-            "options_json": json.dumps(s.options),
-        }
-        for s in default.steps
-    ])
-elif mode.startswith("Upload"):
-    pipeline_file = st.file_uploader(
-        "Pipeline JSON", type=["json"], key="pipeline_upload",
+with st.expander("Options", expanded=not _has_result):
+    mode = st.radio(
+        "How would you like to define the pipeline?",
+        [
+            "Use the recommended default (text-clean → format → missing → dedup)",
+            "Build interactively",
+            "Upload a saved pipeline JSON",
+        ],
+        index=0,
    )
-    if pipeline_file is not None:
+
+    if "pipeline_rows" not in st.session_state:
+        default = recommended_pipeline()
+        st.session_state["pipeline_rows"] = pd.DataFrame([
+            {
+                "tool": s.tool, "enabled": s.enabled,
+                "options_json": json.dumps(s.options),
+            }
+            for s in default.steps
+        ])
+
+    if mode.startswith("Use the recommended"):
+        default = recommended_pipeline()
+        st.session_state["pipeline_rows"] = pd.DataFrame([
+            {
+                "tool": s.tool, "enabled": s.enabled,
+                "options_json": json.dumps(s.options),
+            }
+            for s in default.steps
+        ])
+    elif mode.startswith("Upload"):
+        pipeline_file = st.file_uploader(
+            "Pipeline JSON", type=["json"], key="pipeline_upload",
+        )
+        if pipeline_file is not None:
+            try:
+                data = json.loads(pipeline_file.getvalue())
+                uploaded_pipe = Pipeline.from_dict(data)
+                st.session_state["pipeline_rows"] = pd.DataFrame([
+                    {
+                        "tool": s.tool, "enabled": s.enabled,
+                        "options_json": json.dumps(s.options),
+                    }
+                    for s in uploaded_pipe.steps
+                ])
+                st.success(f"Loaded {len(uploaded_pipe.steps)} step(s).")
+            except Exception as e:
+                from src.core.errors import format_for_user
+                st.error(f"**Could not parse pipeline**\n\n```\n{format_for_user(e)}\n```")
+
+    st.caption(
+        "Edit the table to add, remove, reorder (drag the row index), enable, "
+        "or configure each step. Tool order is recommended, not enforced — "
+        "violations surface as warnings below the table."
+    )
+    edited = st.data_editor(
+        st.session_state["pipeline_rows"],
+        use_container_width=True,
+        num_rows="dynamic",
+        column_config={
+            "tool": st.column_config.SelectboxColumn(
+                "Tool", options=TOOL_NAMES, required=True,
+            ),
+            "enabled": st.column_config.CheckboxColumn("Enabled"),
+            "options_json": st.column_config.TextColumn(
+                "Options (JSON)",
+                help='e.g. {"column_types": {"phone": "phone"}}',
+            ),
+        },
+        key="pipeline_editor",
+    )
+    st.session_state["pipeline_rows"] = edited
+
+    # Build a Pipeline object from the editor state.
+    steps_list: list[Step] = []
+    parse_errors: list[str] = []
+    for i, row in edited.iterrows():
+        tool = row.get("tool")
+        if not tool or pd.isna(tool):
+            continue
+        raw_opts = row.get("options_json") or "{}"
+        if pd.isna(raw_opts):
+            raw_opts = "{}"
        try:
-            data = json.loads(pipeline_file.getvalue())
-            uploaded_pipe = Pipeline.from_dict(data)
-            st.session_state["pipeline_rows"] = pd.DataFrame([
-                {
-                    "tool": s.tool, "enabled": s.enabled,
-                    "options_json": json.dumps(s.options),
-                }
-                for s in uploaded_pipe.steps
-            ])
-            st.success(f"Loaded {len(uploaded_pipe.steps)} step(s).")
+            opts = json.loads(raw_opts) if isinstance(raw_opts, str) else dict(raw_opts)
+            if not isinstance(opts, dict):
+                raise ValueError("options must be a JSON object")
        except Exception as e:
-            from src.core.errors import format_for_user
-            st.error(f"**Could not parse pipeline**\n\n```\n{format_for_user(e)}\n```")
+            parse_errors.append(f"Step {i + 1}: {e}")
+            continue
+        try:
+            steps_list.append(Step(
+                tool=str(tool),
+                options=opts,
+                enabled=bool(row.get("enabled", True)),
+            ))
+        except Exception as e:
+            parse_errors.append(f"Step {i + 1}: {e}")

-st.caption(
-    "Edit the table to add, remove, reorder (drag the row index), enable, "
-    "or configure each step. Tool order is recommended, not enforced — "
-    "violations surface as warnings below the table."
-)
-edited = st.data_editor(
-    st.session_state["pipeline_rows"],
-    use_container_width=True,
-    num_rows="dynamic",
-    column_config={
-        "tool": st.column_config.SelectboxColumn(
-            "Tool", options=TOOL_NAMES, required=True,
-        ),
-        "enabled": st.column_config.CheckboxColumn("Enabled"),
-        "options_json": st.column_config.TextColumn(
-            "Options (JSON)",
-            help='e.g. {"column_types": {"phone": "phone"}}',
-        ),
-    },
-    key="pipeline_editor",
-)
-st.session_state["pipeline_rows"] = edited
+    if parse_errors:
+        for err in parse_errors:
+            st.error(err)

-# Build a Pipeline object from the editor state.
-steps_list: list[Step] = []
-parse_errors: list[str] = []
-for i, row in edited.iterrows():
-    tool = row.get("tool")
-    if not tool or pd.isna(tool):
-        continue
-    raw_opts = row.get("options_json") or "{}"
-    if pd.isna(raw_opts):
-        raw_opts = "{}"
-    try:
-        opts = json.loads(raw_opts) if isinstance(raw_opts, str) else dict(raw_opts)
-        if not isinstance(opts, dict):
-            raise ValueError("options must be a JSON object")
-    except Exception as e:
-        parse_errors.append(f"Step {i + 1}: {e}")
-        continue
-    try:
-        steps_list.append(Step(
-            tool=str(tool),
-            options=opts,
-            enabled=bool(row.get("enabled", True)),
-        ))
-    except Exception as e:
-        parse_errors.append(f"Step {i + 1}: {e}")
+    current_pipeline = Pipeline(steps=steps_list) if steps_list else None

-if parse_errors:
-    for err in parse_errors:
-        st.error(err)
+    if current_pipeline is not None:
+        warnings = validate_pipeline(current_pipeline)
+        if warnings:
+            st.warning(
+                "Pipeline is out of recommended order:\n\n"
+                + "\n".join(f"- {w}" for w in warnings)
+                + "\n\nThe pipeline will still run — these are recommendations only."
+            )

-current_pipeline = Pipeline(steps=steps_list) if steps_list else None
-
-if current_pipeline is not None:
-    warnings = validate_pipeline(current_pipeline)
-    if warnings:
-        st.warning(
-            "Pipeline is out of recommended order:\n\n"
-            + "\n".join(f"- {w}" for w in warnings)
-            + "\n\nThe pipeline will still run — these are recommendations only."
+    with st.expander("Recommended tool order — why each step belongs where it does"):
+        st.markdown(
+            "\n".join(
+                f"- **{e}** before **{l}** — {why}"
+                for e, l, why in SOFT_DEPENDENCIES
+            )
        )

-with st.expander("Recommended tool order — why each step belongs where it does"):
-    st.markdown(
-        "\n".join(
-            f"- **{e}** before **{l}** — {why}"
-            for e, l, why in SOFT_DEPENDENCIES
-        )
-    )
-
 st.divider()

 # ---------------------------------------------------------------------------
@@ -274,6 +284,14 @@ if st.button(
    progress.progress(1.0, text="Done")
    st.session_state["pipeline_result"] = result
    st.session_state["pipeline_input_name"] = uploaded.name
+    # One-shot flag picked up on the next pass to scroll the parent
+    # document to the Results anchor (see scroll snippet at end of file).
+    st.session_state["_pipeline_scroll_to_results"] = True
+    # Force a second rerun so the preview and options expanders see
+    # the new result on the NEXT script pass and collapse themselves.
+    # Without this they stay expanded until the user touches any
+    # other widget.
+    st.rerun()

 result = st.session_state.get("pipeline_result")
 if result is None:
@@ -287,6 +305,16 @@ if result is None:
 # Results
 # ---------------------------------------------------------------------------

+# Anchor target for the auto-scroll snippet at the end of this block.
+# A bare ``<div id="...">`` survives Streamlit's HTML sanitizer (only
+# ``<script>`` is stripped), and a 1px-tall div doesn't visually shift
+# anything. Placed before the subheader so the scrolled-to viewport
+# starts a few pixels above the section heading rather than below it.
+st.markdown(
+    '<div id="pipeline-results-anchor" style="height:1px"></div>',
+    unsafe_allow_html=True,
+)
+
 st.subheader("Results")

 m1, m2, m3, m4 = st.columns(4)
@@ -318,56 +346,105 @@ st.dataframe(result.final_df.head(10), use_container_width=True)
 # ---------------------------------------------------------------------------
 # Downloads
 # ---------------------------------------------------------------------------
+#
+# All three byte buffers are prepared up front (outside the columns) so
+# each ``st.download_button`` sees stable ``data`` across reruns and an
+# explicit ``key`` — without those, Streamlit auto-derived widget IDs
+# can collide for multiple download_buttons in adjacent columns and
+# only the first one actually fires on click. The pipeline-JSON button
+# now renders unconditionally (disabled when no pipeline is defined)
+# so the layout stays steady.

 st.divider()
 stem = Path(st.session_state.get("pipeline_input_name", "input")).stem

+cleaned_bytes = result.final_df.to_csv(index=False).encode("utf-8-sig")
+pipeline_bytes = json.dumps(
+    current_pipeline.to_dict() if current_pipeline else {"steps": []},
+    indent=2, default=str,
+).encode("utf-8")
+audit_bytes = json.dumps({
+    "warnings": result.warnings,
+    "initial_rows": result.initial_rows,
+    "final_rows": result.final_rows,
+    "total_elapsed_seconds": result.total_elapsed,
+    "steps": [
+        {
+            "tool": sr.step.tool,
+            "name": sr.step.display_name(),
+            "enabled": sr.step.enabled,
+            "skipped": sr.skipped,
+            "elapsed_seconds": sr.elapsed_seconds,
+            "summary": sr.summary,
+            "error": sr.error,
+        }
+        for sr in result.step_results
+    ],
+}, indent=2, default=str).encode("utf-8")
+
+_pipeline_empty = current_pipeline is None or not current_pipeline.steps
+
 dl_a, dl_b, dl_c = st.columns(3)
 with dl_a:
-    bytes_csv = result.final_df.to_csv(index=False).encode("utf-8-sig")
    st.download_button(
        "Download cleaned CSV",
-        data=bytes_csv,
+        data=cleaned_bytes,
        file_name=f"{stem}_pipeline.csv",
        mime="text/csv",
+        key="pipeline_dl_cleaned",
+        use_container_width=True,
    )
 with dl_b:
-    pipeline_bytes = json.dumps(
-        current_pipeline.to_dict() if current_pipeline else {"steps": []},
-        indent=2, default=str,
-    ).encode("utf-8")
    st.download_button(
        "Download pipeline JSON",
        data=pipeline_bytes,
        file_name="pipeline.json",
        mime="application/json",
-        help="Save this and pass --pipeline pipeline.json to the CLI to re-run on next week's file.",
+        key="pipeline_dl_pipeline",
+        disabled=_pipeline_empty,
+        help=(
+            "No pipeline defined."
+            if _pipeline_empty
+            else "Save this and pass --pipeline pipeline.json to the CLI to re-run on next week's file."
+        ),
+        use_container_width=True,
    )
 with dl_c:
-    audit_bytes = json.dumps({
-        "warnings": result.warnings,
-        "initial_rows": result.initial_rows,
-        "final_rows": result.final_rows,
-        "total_elapsed_seconds": result.total_elapsed,
-        "steps": [
-            {
-                "tool": sr.step.tool,
-                "name": sr.step.display_name(),
-                "enabled": sr.step.enabled,
-                "skipped": sr.skipped,
-                "elapsed_seconds": sr.elapsed_seconds,
-                "summary": sr.summary,
-                "error": sr.error,
-            }
-            for sr in result.step_results
-        ],
-    }, indent=2, default=str).encode("utf-8")
    st.download_button(
        "Download run audit",
        data=audit_bytes,
        file_name=f"{stem}_pipeline_audit.json",
        mime="application/json",
+        key="pipeline_dl_audit",
+        use_container_width=True,
    )

 st.divider()
 st.caption("Runs locally. Your data never leaves this computer. | DataTools v3.0")
+
+# ---------------------------------------------------------------------------
+# Post-run auto-scroll
+# ---------------------------------------------------------------------------
+#
+# When the user clicks Run Pipeline, the preview + options collapse but
+# Streamlit by itself doesn't scroll — the Results section is at the
+# bottom of a tall script so the user has to find it. Inject a tiny
+# component-html iframe that calls ``scrollIntoView`` on the parent's
+# Results anchor. Streamlit's main page is same-origin with component
+# iframes so ``window.parent.document`` access is allowed.
+#
+# The flag is one-shot (``pop`` removes it) so re-renders triggered by
+# unrelated widgets in the Results section don't yank the viewport
+# back to the top of Results.
+if st.session_state.pop("_pipeline_scroll_to_results", False):
+    from streamlit.components.v1 import html as _components_html
+    _components_html(
+        """
+        <script>
+          const doc = window.parent.document;
+          const target = doc.getElementById('pipeline-results-anchor');
+          if (target) target.scrollIntoView({behavior: 'smooth', block: 'start'});
+        </script>
+        """,
+        height=0,
+    )