feat(tools): unified post-run UX across all Ready tool pages

Apply the Clean Text page's post-run UX pattern to every other Ready tool page (Find Duplicates, Standardize Formats, Fix Missing Values, Map Columns, Automated Workflows) for consistency and ease of use. Per page: 1. Preview wrapped in ``st.expander(f"Preview: {filename}", expanded=not _has_result)``. Open before a result exists, folded afterwards. 2. Options / configuration controls wrapped in ``st.expander("Options", expanded=not _has_result)``. Inner sub-expanders preserved (Streamlit 1.36+ supports nesting). 3. After the primary action stashes the result, set a one-shot ``_<tool>_scroll_to_results`` flag in session state and call ``st.rerun()`` so the preview + options expanders see the new state on the next pass and collapse themselves. 4. ``<div id="<tool>-results-anchor" style="height:1px">`` placed immediately before the Results subheader. 5. End-of-page: pop the scroll flag and inject a tiny ``streamlit.components.v1.html`` iframe whose ``<script>`` calls ``scrollIntoView`` on the parent document's anchor. One-shot, so unrelated reruns (toggling Show-hidden, etc.) don't yank the viewport. 6. Download buttons hardened against the multi-button Streamlit footgun: byte buffers pre-computed outside the column scopes, explicit unique ``key="<tool>_dl_<purpose>"`` per button, ``use_container_width=True``, and previously-conditional buttons now render unconditionally with ``disabled=True`` + a help tooltip when the underlying data is empty so layout stays steady. Per-page judgment calls (already noted in agent reports): - Find Duplicates: sheet picker and delimiter selector kept OUTSIDE expanders (the user still needs to see them when a file fails to parse). - Fix Missing Values: missingness profile wrapped INSIDE the Options expander together with Strategy — the Results section already shows a before/after missingness comparison that supersedes the static input profile. - Map Columns: all three subsections (Target schema, Strategy, Mapping) wrapped under one outer Options expander, matching the Text Cleaner pattern. - Automated Workflows: inner "Recommended tool order" expander stays nested inside the outer Options wrap; Run button stays outside Options so the user can re-run after tweaking the (collapsed) editor. 2008 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:04:37 +00:00
parent d1aaf3c2b9
commit 6415be8bf4
5 changed files with 1250 additions and 879 deletions
--- a/src/gui/pages/4_Missing_Values.py
+++ b/src/gui/pages/4_Missing_Values.py
@@ -95,175 +95,186 @@ except Exception as e:
    )
    st.stop()

-st.subheader(f"Preview: {uploaded.name}")
-st.caption(f"{len(df)} rows, {len(df.columns)} columns")
-st.dataframe(df.head(10), use_container_width=True)
+# Collapse the input preview + options once the user has clicked
+# Handle Missing Values so the Results section below is the primary
+# visual focus. The user can re-expand to re-inspect the source rows
+# or tweak strategy and rerun.
+_has_result = st.session_state.get("missing_result") is not None
+
+with st.expander(f"Preview: {uploaded.name}", expanded=not _has_result):
+    st.caption(f"{len(df)} rows, {len(df.columns)} columns")
+    st.dataframe(df.head(10), use_container_width=True)

 st.divider()

 # ---------------------------------------------------------------------------
-# Initial profile (read-only)
+# Options (Missingness profile + Strategy)
 # ---------------------------------------------------------------------------
+#
+# Wrapped in an outer expander whose default state mirrors the preview
+# expander above: open before a result exists, folded once the user has
+# clicked Handle Missing Values. The Missingness profile lives inside
+# this expander too — after a run the Results section shows a richer
+# before-vs-after comparison that supersedes the static input profile,
+# so keeping it tucked away with the controls cleanly pushes Results
+# to the top of the visible area.

-st.subheader("Missingness profile")
+with st.expander("Options", expanded=not _has_result):
+    st.subheader("Missingness profile")

-initial_profile = profile_missing(df, MissingOptions())
-prof_df = initial_profile.to_dataframe()
+    initial_profile = profile_missing(df, MissingOptions())
+    prof_df = initial_profile.to_dataframe()

-m1, m2, m3, m4 = st.columns(4)
-m1.metric("Rows", initial_profile.rows_total)
-m2.metric("Cells missing", initial_profile.cells_missing)
-m3.metric("% cells missing", f"{initial_profile.cells_missing_pct:.1f}%")
-m4.metric("Complete rows", initial_profile.rows_complete)
+    m1, m2, m3, m4 = st.columns(4)
+    m1.metric("Rows", initial_profile.rows_total)
+    m2.metric("Cells missing", initial_profile.cells_missing)
+    m3.metric("% cells missing", f"{initial_profile.cells_missing_pct:.1f}%")
+    m4.metric("Complete rows", initial_profile.rows_complete)

-st.dataframe(prof_df, use_container_width=True, hide_index=True)
+    st.dataframe(prof_df, use_container_width=True, hide_index=True)

-if initial_profile.cells_missing == 0:
-    st.success("No missing values or disguised nulls detected. Nothing to handle.")
+    if initial_profile.cells_missing == 0:
+        st.success("No missing values or disguised nulls detected. Nothing to handle.")

-st.divider()
+    st.divider()

-# ---------------------------------------------------------------------------
-# Options
-# ---------------------------------------------------------------------------
+    st.subheader("Strategy")

-st.subheader("Strategy")
+    preset_label = st.radio(
+        "Preset",
+        [
+            "detect-only (standardize sentinels to NaN, no fill or drop)",
+            "safe-fill (numeric → median, categorical → mode)",
+            "drop-incomplete (drop any row with missing)",
+        ],
+        index=0,
+        help=(
+            "detect-only: replace 'N/A', '-', 'NULL', etc. with real NaN, then stop. "
+            "safe-fill: also fill — numeric columns with median, others with mode. "
+            "drop-incomplete: also drop every row that has any missing cell."
+        ),
+    )
+    preset_key = preset_label.split(" ", 1)[0]
+    options = MissingOptions.from_preset(preset_key)

-preset_label = st.radio(
-    "Preset",
-    [
-        "detect-only (standardize sentinels to NaN, no fill or drop)",
-        "safe-fill (numeric → median, categorical → mode)",
-        "drop-incomplete (drop any row with missing)",
-    ],
-    index=0,
-    help=(
-        "detect-only: replace 'N/A', '-', 'NULL', etc. with real NaN, then stop. "
-        "safe-fill: also fill — numeric columns with median, others with mode. "
-        "drop-incomplete: also drop every row that has any missing cell."
-    ),
-)
-preset_key = preset_label.split(" ", 1)[0]
-options = MissingOptions.from_preset(preset_key)
+    with st.expander("Advanced options"):
+        col_a, col_b = st.columns(2)

-with st.expander("Advanced options"):
-    col_a, col_b = st.columns(2)
-
-    with col_a:
-        st.markdown("**Detection**")
-        options.standardize_sentinels = st.checkbox(
-            "Standardize disguised nulls to NaN",
-            value=options.standardize_sentinels,
-            help="Replace 'N/A', '-', 'NULL', whitespace-only cells, etc. with real NaN.",
-        )
-        sentinels_text = st.text_input(
-            "Sentinel values (comma-separated)",
-            value=", ".join(options.sentinels),
-            disabled=not options.standardize_sentinels,
-            help="Matched case-insensitively after stripping whitespace.",
-        )
-        options.sentinels = [
-            s.strip() for s in sentinels_text.split(",") if s.strip()
-        ]
-
-    with col_b:
-        st.markdown("**Strategy override**")
-        strat_options = [
-            "(use preset)",
-            "none", "drop_row", "drop_col", "drop_both",
-            "mean", "median", "mode", "constant",
-            "ffill", "bfill", "interpolate",
-        ]
-        strat_choice = st.selectbox(
-            "Global strategy",
-            strat_options,
-            index=0,
-            help=(
-                "drop_row / drop_col use the thresholds below. "
-                "mean / median / interpolate are numeric only — non-numeric "
-                "columns fall back to the categorical strategy."
-            ),
-        )
-        if strat_choice != "(use preset)":
-            options.strategy = strat_choice  # type: ignore[assignment]
-
-        cat_strat = st.selectbox(
-            "Categorical fallback (for non-numeric columns)",
-            ["mode", "constant", "ffill", "bfill", "none"],
-            index=0,
-        )
-        options.categorical_strategy = cat_strat  # type: ignore[assignment]
-
-        if options.strategy == "constant" or cat_strat == "constant":
-            fill_val = st.text_input(
-                "Constant fill value",
-                value="",
-                help="Used when strategy = constant. Leave blank to fill with empty string.",
+        with col_a:
+            st.markdown("**Detection**")
+            options.standardize_sentinels = st.checkbox(
+                "Standardize disguised nulls to NaN",
+                value=options.standardize_sentinels,
+                help="Replace 'N/A', '-', 'NULL', whitespace-only cells, etc. with real NaN.",
            )
-            options.fill_value = fill_val
+            sentinels_text = st.text_input(
+                "Sentinel values (comma-separated)",
+                value=", ".join(options.sentinels),
+                disabled=not options.standardize_sentinels,
+                help="Matched case-insensitively after stripping whitespace.",
+            )
+            options.sentinels = [
+                s.strip() for s in sentinels_text.split(",") if s.strip()
+            ]

-    st.markdown("**Drop thresholds**")
-    col_c, col_d = st.columns(2)
-    with col_c:
-        options.row_drop_threshold = st.slider(
-            "Row drop threshold (drop rows with ≥ this fraction missing across selected cols)",
-            0.0, 1.0, options.row_drop_threshold, 0.05,
-        )
-    with col_d:
-        options.col_drop_threshold = st.slider(
-            "Column drop threshold (drop columns with ≥ this fraction missing)",
-            0.0, 1.0, options.col_drop_threshold, 0.05,
-        )
-
-    st.markdown("**Scope**")
-    selected_cols = st.multiselect(
-        "Columns to handle (default: all)",
-        options=list(df.columns),
-        default=list(df.columns),
-    )
-    skip_cols = st.multiselect(
-        "Columns to skip",
-        options=list(df.columns),
-        default=[],
-    )
-    options.columns = selected_cols if selected_cols else None
-    options.skip_columns = list(skip_cols)
-
-    st.markdown("**Per-column strategy overrides** (optional)")
-    st.caption(
-        "Set a different strategy for specific columns. Leave any row blank to "
-        "use the global strategy."
-    )
-    per_col_overrides: dict[str, str] = {}
-    only_missing_cols = [
-        r.column for r in initial_profile.columns if r.has_missing
-    ]
-    if only_missing_cols:
-        edit_df = pd.DataFrame({
-            "column": only_missing_cols,
-            "strategy": ["" for _ in only_missing_cols],
-        })
-        edited = st.data_editor(
-            edit_df,
-            use_container_width=True,
-            hide_index=True,
-            column_config={
-                "column": st.column_config.TextColumn("Column", disabled=True),
-                "strategy": st.column_config.SelectboxColumn(
-                    "Override",
-                    options=[
-                        "", "drop_row", "drop_col",
-                        "mean", "median", "mode", "constant",
-                        "ffill", "bfill", "interpolate",
-                    ],
+        with col_b:
+            st.markdown("**Strategy override**")
+            strat_options = [
+                "(use preset)",
+                "none", "drop_row", "drop_col", "drop_both",
+                "mean", "median", "mode", "constant",
+                "ffill", "bfill", "interpolate",
+            ]
+            strat_choice = st.selectbox(
+                "Global strategy",
+                strat_options,
+                index=0,
+                help=(
+                    "drop_row / drop_col use the thresholds below. "
+                    "mean / median / interpolate are numeric only — non-numeric "
+                    "columns fall back to the categorical strategy."
                ),
-            },
-            key="missing_per_col_editor",
+            )
+            if strat_choice != "(use preset)":
+                options.strategy = strat_choice  # type: ignore[assignment]
+
+            cat_strat = st.selectbox(
+                "Categorical fallback (for non-numeric columns)",
+                ["mode", "constant", "ffill", "bfill", "none"],
+                index=0,
+            )
+            options.categorical_strategy = cat_strat  # type: ignore[assignment]
+
+            if options.strategy == "constant" or cat_strat == "constant":
+                fill_val = st.text_input(
+                    "Constant fill value",
+                    value="",
+                    help="Used when strategy = constant. Leave blank to fill with empty string.",
+                )
+                options.fill_value = fill_val
+
+        st.markdown("**Drop thresholds**")
+        col_c, col_d = st.columns(2)
+        with col_c:
+            options.row_drop_threshold = st.slider(
+                "Row drop threshold (drop rows with ≥ this fraction missing across selected cols)",
+                0.0, 1.0, options.row_drop_threshold, 0.05,
+            )
+        with col_d:
+            options.col_drop_threshold = st.slider(
+                "Column drop threshold (drop columns with ≥ this fraction missing)",
+                0.0, 1.0, options.col_drop_threshold, 0.05,
+            )
+
+        st.markdown("**Scope**")
+        selected_cols = st.multiselect(
+            "Columns to handle (default: all)",
+            options=list(df.columns),
+            default=list(df.columns),
        )
-        for _, row in edited.iterrows():
-            if row["strategy"]:
-                per_col_overrides[row["column"]] = row["strategy"]
-        options.column_strategies = per_col_overrides  # type: ignore[assignment]
+        skip_cols = st.multiselect(
+            "Columns to skip",
+            options=list(df.columns),
+            default=[],
+        )
+        options.columns = selected_cols if selected_cols else None
+        options.skip_columns = list(skip_cols)
+
+        st.markdown("**Per-column strategy overrides** (optional)")
+        st.caption(
+            "Set a different strategy for specific columns. Leave any row blank to "
+            "use the global strategy."
+        )
+        per_col_overrides: dict[str, str] = {}
+        only_missing_cols = [
+            r.column for r in initial_profile.columns if r.has_missing
+        ]
+        if only_missing_cols:
+            edit_df = pd.DataFrame({
+                "column": only_missing_cols,
+                "strategy": ["" for _ in only_missing_cols],
+            })
+            edited = st.data_editor(
+                edit_df,
+                use_container_width=True,
+                hide_index=True,
+                column_config={
+                    "column": st.column_config.TextColumn("Column", disabled=True),
+                    "strategy": st.column_config.SelectboxColumn(
+                        "Override",
+                        options=[
+                            "", "drop_row", "drop_col",
+                            "mean", "median", "mode", "constant",
+                            "ffill", "bfill", "interpolate",
+                        ],
+                    ),
+                },
+                key="missing_per_col_editor",
+            )
+            for _, row in edited.iterrows():
+                if row["strategy"]:
+                    per_col_overrides[row["column"]] = row["strategy"]
+            options.column_strategies = per_col_overrides  # type: ignore[assignment]

 # ---------------------------------------------------------------------------
 # Run
@@ -282,6 +293,14 @@ if st.button("Handle Missing Values", type="primary", use_container_width=True):
    st.session_state["missing_result"] = result
    st.session_state["missing_input_name"] = uploaded.name
    st.session_state["missing_options"] = options.to_dict()
+    # One-shot flag picked up on the next pass to scroll the parent
+    # document to the Results anchor (see scroll snippet below).
+    st.session_state["_missing_scroll_to_results"] = True
+    # Force a second rerun so the preview and options expanders see
+    # the new result on the NEXT script pass and collapse themselves.
+    # Without this they stay expanded until the user touches any
+    # other widget.
+    st.rerun()

 result = st.session_state.get("missing_result")
 if result is None:
@@ -292,6 +311,16 @@ if result is None:
 # Results
 # ---------------------------------------------------------------------------

+# Anchor target for the auto-scroll snippet at the end of this block.
+# A bare ``<div id="...">`` survives Streamlit's HTML sanitizer (only
+# ``<script>`` is stripped), and a 1px-tall div doesn't visually shift
+# anything. Placed before the subheader so the scrolled-to viewport
+# starts a few pixels above the section heading rather than below it.
+st.markdown(
+    '<div id="missing-results-anchor" style="height:1px"></div>',
+    unsafe_allow_html=True,
+)
+
 st.subheader("Results")

 m1, m2, m3, m4 = st.columns(4)
@@ -334,38 +363,85 @@ st.dataframe(result.handled_df.head(10), use_container_width=True)
 # ---------------------------------------------------------------------------
 # Downloads
 # ---------------------------------------------------------------------------
+#
+# All three byte buffers are prepared up front (outside the columns) so
+# each ``st.download_button`` sees stable ``data`` across reruns and an
+# explicit ``key`` — without those, Streamlit auto-derived widget IDs
+# can collide for multiple download_buttons in adjacent columns and
+# only the first one actually fires on click. The empty-changes case
+# now renders a disabled button (rather than vanishing) so the layout
+# stays steady and the user understands why nothing's available.

 st.divider()
 stem = Path(st.session_state.get("missing_input_name", "input")).stem

+handled_bytes = result.handled_df.to_csv(index=False).encode("utf-8-sig")
+changes_bytes = (
+    result.changes.to_csv(index=False).encode("utf-8-sig")
+    if not result.changes.empty
+    else b""
+)
+config_bytes = json.dumps(
+    st.session_state.get("missing_options", {}), indent=2, default=str,
+).encode("utf-8")
+
 dl_a, dl_b, dl_c = st.columns(3)
 with dl_a:
-    handled_bytes = result.handled_df.to_csv(index=False).encode("utf-8-sig")
    st.download_button(
        "Download handled CSV",
        data=handled_bytes,
        file_name=f"{stem}_missing.csv",
        mime="text/csv",
+        key="missing_dl_handled",
+        use_container_width=True,
    )
 with dl_b:
-    if not result.changes.empty:
-        changes_bytes = result.changes.to_csv(index=False).encode("utf-8-sig")
-        st.download_button(
-            "Download changes audit",
-            data=changes_bytes,
-            file_name=f"{stem}_missing_changes.csv",
-            mime="text/csv",
-        )
+    st.download_button(
+        "Download changes audit",
+        data=changes_bytes,
+        file_name=f"{stem}_missing_changes.csv",
+        mime="text/csv",
+        key="missing_dl_changes",
+        disabled=result.changes.empty,
+        help="No changes to audit." if result.changes.empty else None,
+        use_container_width=True,
+    )
 with dl_c:
-    config_bytes = json.dumps(
-        st.session_state.get("missing_options", {}), indent=2, default=str,
-    ).encode("utf-8")
    st.download_button(
        "Download config JSON",
        data=config_bytes,
        file_name="missing_config.json",
        mime="application/json",
+        key="missing_dl_config",
+        use_container_width=True,
    )

 st.divider()
 st.caption("Runs locally. Your data never leaves this computer. | DataTools v3.0")
+
+# ---------------------------------------------------------------------------
+# Post-run auto-scroll
+# ---------------------------------------------------------------------------
+#
+# When the user clicks Handle Missing Values, the preview + options
+# collapse but Streamlit by itself doesn't scroll — the Results section
+# is at the bottom of a tall script so the user has to find it. Inject
+# a tiny component-html iframe that calls ``scrollIntoView`` on the
+# parent's Results anchor. Streamlit's main page is same-origin with
+# component iframes so ``window.parent.document`` access is allowed.
+#
+# The flag is one-shot (``pop`` removes it) so re-renders triggered by
+# unrelated widgets in the Results section don't yank the viewport
+# back to the top of Results.
+if st.session_state.pop("_missing_scroll_to_results", False):
+    from streamlit.components.v1 import html as _components_html
+    _components_html(
+        """
+        <script>
+          const doc = window.parent.document;
+          const target = doc.getElementById('missing-results-anchor');
+          if (target) target.scrollIntoView({behavior: 'smooth', block: 'start'});
+        </script>
+        """,
+        height=0,
+    )