feat(audit): JSONL audit log for support diagnostics
New ``src/audit.py`` module records GUI actions to a per-session
JSONL file under ``~/.datatools/logs/`` (overrideable via
``DATATOOLS_AUDIT_DIR``). The file is human-readable (one JSON
object per line, each with a ``message`` field) AND trivially
machine-parseable — the support flow is "client mails the file,
we read it and explain what went wrong."
Format example::
{"ts":"2026-05-17T05:30:00.123+00:00","level":"info","category":"session",
"session":"a1b2c3d4","message":"Session started",
"platform":"Windows 11","python":"3.14.0","user":"Michael Dombaugh",
"log_file":"C:\\Users\\Michael Dombaugh\\.datatools\\logs\\datatools-...jsonl"}
{"ts":"...","category":"upload","message":"Uploaded customers.csv",
"filename":"customers.csv","bytes":24813}
{"ts":"...","category":"analyze","message":"Analyzed customers.csv (3 findings)",
"filename":"customers.csv","findings":3,"rows":120,"cols":8}
{"ts":"...","category":"tool_run","message":"Clean Text run",
"page":"2_Text_Cleaner"}
{"ts":"...","category":"error","level":"error",
"message":"analyze(weird.csv): EmptyDataError: No columns to parse",
"filename":"weird.csv","outcome":"empty_after_repair"}
Public API:
- ``log_event(category, message, **extra)``
- ``log_session_start()`` — idempotent banner with platform info
- ``log_page_open(slug)`` — emit a ``nav`` event, deduplicated per
Streamlit session so reruns don't spam the log
- ``log_exception(where, exc, **extra)`` — convenience wrapper
- ``audit_log_path()`` / ``audit_log_dir()`` — for the UI
Wired in at:
- ``hide_streamlit_chrome``: stamps session start, mounts a small
"🩺 Diagnostics" expander in the sidebar with the log path and
an "Open log folder" button so the user can grab the file to
attach to a support email.
- Home page: ``upload`` event on every new file, ``upload`` event
on per-file remove, ``analyze`` event with file count when
Run-analysis fires.
- ``_run_analysis_on_upload``: ``analyze`` event with rows / cols /
findings count per file, plus ``error`` events on every caught
exception (empty upload, empty after repair, pandas EmptyDataError,
generic Exception).
- Every Ready tool page (1, 2, 3, 4, 5, 9): ``tool_run`` event
immediately after the primary action stashes its result.
- Every tool page (1-9): ``log_page_open(slug)`` on render — deduped
via session state so we don't get one event per Streamlit rerun.
Safety:
- ``log_event`` wraps every write in try/except. A broken audit
log must NOT crash the GUI.
- Non-JSON-serializable extras are ``str()``-coerced before writing.
- File CONTENTS are never logged. We capture filename, byte count,
and (in the analyzer) a 12-char sha1 fingerprint of the bytes so
the same file re-uploaded gets the same trace.
- License keys, session cookies, etc. are not logged.
- ``DATATOOLS_AUDIT_DIR`` env var lets tests redirect writes into a
tmp dir.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -86,6 +86,7 @@ def _home_page() -> None:
|
||||
help=t("upload.uploader_help"),
|
||||
)
|
||||
if new_files:
|
||||
from src.audit import log_event
|
||||
changed = False
|
||||
for f in new_files:
|
||||
if f.name not in home_uploads:
|
||||
@@ -94,6 +95,12 @@ def _home_page() -> None:
|
||||
"size": f.size,
|
||||
}
|
||||
changed = True
|
||||
log_event(
|
||||
"upload",
|
||||
f"Uploaded {f.name}",
|
||||
filename=f.name,
|
||||
bytes=f.size,
|
||||
)
|
||||
if changed:
|
||||
st.session_state["home_uploads"] = home_uploads
|
||||
|
||||
@@ -139,6 +146,12 @@ def _home_page() -> None:
|
||||
to_remove = name
|
||||
|
||||
if to_remove is not None:
|
||||
from src.audit import log_event
|
||||
log_event(
|
||||
"upload",
|
||||
f"Removed {to_remove}",
|
||||
filename=to_remove,
|
||||
)
|
||||
del home_uploads[to_remove]
|
||||
# Drop any findings/results tied to the removed file.
|
||||
findings_by_file_drop = st.session_state.get(
|
||||
@@ -209,6 +222,12 @@ def _home_page() -> None:
|
||||
st.rerun()
|
||||
|
||||
if run_clicked:
|
||||
from src.audit import log_event
|
||||
log_event(
|
||||
"analyze",
|
||||
f"Run analysis clicked on {len(pending)} file(s)",
|
||||
files=list(pending),
|
||||
)
|
||||
progress = st.progress(0.0, text=t("upload.scanning"))
|
||||
for i, name in enumerate(pending, start=1):
|
||||
stashed = _StashedUpload(name, home_uploads[name]["bytes"])
|
||||
|
||||
Reference in New Issue
Block a user