feat(audit): JSONL audit log for support diagnostics

New ``src/audit.py`` module records GUI actions to a per-session
JSONL file under ``~/.datatools/logs/`` (overrideable via
``DATATOOLS_AUDIT_DIR``). The file is human-readable (one JSON
object per line, each with a ``message`` field) AND trivially
machine-parseable — the support flow is "client mails the file,
we read it and explain what went wrong."

Format example::

    {"ts":"2026-05-17T05:30:00.123+00:00","level":"info","category":"session",
     "session":"a1b2c3d4","message":"Session started",
     "platform":"Windows 11","python":"3.14.0","user":"Michael Dombaugh",
     "log_file":"C:\\Users\\Michael Dombaugh\\.datatools\\logs\\datatools-...jsonl"}
    {"ts":"...","category":"upload","message":"Uploaded customers.csv",
     "filename":"customers.csv","bytes":24813}
    {"ts":"...","category":"analyze","message":"Analyzed customers.csv (3 findings)",
     "filename":"customers.csv","findings":3,"rows":120,"cols":8}
    {"ts":"...","category":"tool_run","message":"Clean Text run",
     "page":"2_Text_Cleaner"}
    {"ts":"...","category":"error","level":"error",
     "message":"analyze(weird.csv): EmptyDataError: No columns to parse",
     "filename":"weird.csv","outcome":"empty_after_repair"}

Public API:

- ``log_event(category, message, **extra)``
- ``log_session_start()`` — idempotent banner with platform info
- ``log_page_open(slug)`` — emit a ``nav`` event, deduplicated per
  Streamlit session so reruns don't spam the log
- ``log_exception(where, exc, **extra)`` — convenience wrapper
- ``audit_log_path()`` / ``audit_log_dir()`` — for the UI

Wired in at:

- ``hide_streamlit_chrome``: stamps session start, mounts a small
  "🩺  Diagnostics" expander in the sidebar with the log path and
  an "Open log folder" button so the user can grab the file to
  attach to a support email.
- Home page: ``upload`` event on every new file, ``upload`` event
  on per-file remove, ``analyze`` event with file count when
  Run-analysis fires.
- ``_run_analysis_on_upload``: ``analyze`` event with rows / cols /
  findings count per file, plus ``error`` events on every caught
  exception (empty upload, empty after repair, pandas EmptyDataError,
  generic Exception).
- Every Ready tool page (1, 2, 3, 4, 5, 9): ``tool_run`` event
  immediately after the primary action stashes its result.
- Every tool page (1-9): ``log_page_open(slug)`` on render — deduped
  via session state so we don't get one event per Streamlit rerun.

Safety:

- ``log_event`` wraps every write in try/except. A broken audit
  log must NOT crash the GUI.
- Non-JSON-serializable extras are ``str()``-coerced before writing.
- File CONTENTS are never logged. We capture filename, byte count,
  and (in the analyzer) a 12-char sha1 fingerprint of the bytes so
  the same file re-uploaded gets the same trace.
- License keys, session cookies, etc. are not logged.
- ``DATATOOLS_AUDIT_DIR`` env var lets tests redirect writes into a
  tmp dir.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This commit is contained in:

Michael

2026-05-17 01:36:35 +00:00

parent f0885aeb1e

commit c73d716d06

12 changed files with 373 additions and 3 deletions

									
										4

src/gui/pages/9_Pipeline_Runner.py
									
												View File
												
				@@ -36,6 +36,8 @@ from src.license import FeatureFlag

				hide_streamlit_chrome()

				render_sticky_footer()

				from src.audit import log_page_open

				log_page_open("9_Pipeline_Runner")

				require_feature_or_render_upgrade(FeatureFlag.PIPELINE_RUNNER)

				@@ -283,6 +285,8 @@ if st.button(

				    progress.progress(1.0, text="Done")

				    st.session_state["pipeline_result"] = result

				    from src.audit import log_event

				    log_event("tool_run", "Automated Workflows run", page="9_Pipeline_Runner")

				    st.session_state["pipeline_input_name"] = uploaded.name

				    # One-shot flag picked up on the next pass to scroll the parent

				    # document to the Results anchor (see scroll snippet at end of file).

feat(audit): JSONL audit log for support diagnostics

4 src/gui/pages/9_Pipeline_Runner.py Unescape Escape View File

4

src/gui/pages/9_Pipeline_Runner.py

View File