Ship session metrics as thin waved slices; defer the signal framework
We want ax to surface derived per-session metrics and cross-session, multi-hop graph insights (durability of a session's commits, time-to-land, fragility cascades, skill→durability efficacy, recovery→outcome, expertise leverage, …), and to make adding the next insight cheap.
The first design reached for a framework: a declarative SignalDefinition
registry (mirroring StageRegistry), a topo-ordered derive stage, COMPUTED
fields generated from the registry, a session_metrics table, three signal
kinds (session-scalar / aggregate / relation) plus a primitive layer,
a SignalContext.dep DAG so deep signals reuse cheap precomputed primitives,
and generic CLI/MCP/dashboard surfaces that auto-render any registered signal.
The pitch: adding insight #20 becomes a one-file change.
We reject building that framework first. A five-perspective design review (two alternative-architecture proposals, an adversarial roast, a YAGNI pass, and a SurrealDB-feasibility audit) converged on the same conclusion, and one finding was a genuine correctness bug. This ADR records what we debunked and why.
The freshness model was wrong, not just imperfect
The framework's headline claim was that forward-looking signals stay fresh by
being "recomputed each ingest, bounded to the ingest window, as the rows that
move them arrive." Traced against closure.ts, this is false. later_fixed_by
-
the edge
commit_revertedand thereforedurability_ratio,file_fragility,rework_chain, andfragility_cascadeall depend on - is itself a derived edge thatclosure.tsrebuilds window-bounded: itDELETEs everylater_fixed_byand re-RELATEs only among commits inside--since. Under the daemon'singest --since=1: -
a fix commit landing today for a three-week-old bug leaves the old feature commit outside the window, so the
feature→fixedge is never recreated; and the old session that owns the durability number is not in the recompute window either. The thing that changed (new fix) and the thing that needs recomputing (old session) sit on opposite sides of the window boundary. -
worse, the blanket
DELETEtruncates the old edge, so the old commit flips toreverted = falseand the old session's durability moves the wrong direction on new data.
Forward-looking is the one direction that justified going graph-native at all,
and the window-bounded model degrades exactly there while looking fresh and
authoritative. So regardless of framework-or-not, the freshness model must
change to a dirty-set recompute: when an edge changes, recompute the
sessions transitively reachable from it, and compute Layer-0 primitives
(commit_reverted) over full history, decoupled from --since.
What we debunked
Framework-first is premature abstraction. We have ~17 named signals and
zero shipped. Every abstraction boundary - the three kinds, the cost tiers,
the dep DAG - was drawn from imagined signals. The likely outcome is paying
for the framework, then discovering at signal #4 that a kind doesn't fit or
dep needs windowing we didn't model, and refactoring it anyway. Build the
signals; let the duplication, once real, justify the registry.
COMPUTED-field codegen is the worst cost/value trade and reintroduces a
known crash class. Generating DEFINE FIELD … COMPUTED <sql> from app code
means the schema is partly generated, and retiring a signal needs REMOVE FIELD reconciliation - a drop mechanism that does not exist in this repo
(schema apply is an append-only surreal import). That is the orphan-field
NONE-crash we have already been burned by. COMPUTED is verified to traverse
graph edges per read - which is precisely the per-edge-deref hang on an N-row
listing (ax sessions metrics). We keep COMPUTED only as a possible
single-record convenience on sessions show, never a graph-traversing column
on a listing, and we do not generate fields from a registry. Derived scalars
are written by the derive stage instead.
Materialized views (DEFINE TABLE … AS SELECT) do not fit our pipeline.
Verified against the docs: table views are "not triggered when importing
data," and only the FROM table's writes trigger them. ax ingest is bulk
import, and the signals are cross-table, so views would silently never refresh.
Rejected for these signals.
EXISTS(...) is not SurrealQL. The seed formulas used it; existence is
count(->edge->table) > 0. Mechanical, but the spec's formulas did not parse
as written.
The name signals is taken. derive-signals.ts already derives
friction/recovery/correction edges. We name this work metrics
(session_metrics, a derive-metrics stage, ax sessions metrics), which
also matches the surface users asked for.
Cross-session relation recompute must be gated. Recomputing
whole-history relations on every daemon transcript is an O(edges) walk per save
and will peg SurrealDB (the re-ingest watcher race). Relations run on
manual/deep ingest, not the --since=1 daemon path.
Decision
Ship session metrics as thin, correctness-first waves, not a framework:
- a hand-written
session_metricstable + aderive-metricsstage; commit_revertedcomputed over full history via a dirty-set, feedingdurability_ratio, plustime_to_land,lines_added/removed, and one cross-session insight (fragility_cascade) as a plain query;ax sessions metricssurfacing them.
Every named signal remains deliverable - later waves add them as plain modules
in apps/axctl/src/metrics/. We extract a registry/DAG only after 5–6 real
signals reveal the true shape (refactor-when-it-hurts), and when we do, the
likely substrate for deep weighted traversal is SurrealDB fn:: stored
functions with {1..N} recursion idioms (verified capable, 256-hop max) rather
than hand-rolled JS DAG joins.
Consequences
- No registration boilerplate saved up front; adding wave-2/3 signals is a small module + a surface line, not a one-field change - acceptable at this count.
- The freshness fix (dirty-set + full-history primitives) is now a wave-1 backbone and a prerequisite for every forward-looking signal.
- We carry a small, explicit deferral: when
apps/axctl/src/metrics/starts feeling like copy-paste (~signal #6), revisit the framework - and it will be smaller and correct, because the signals taught it their shape. - Two deep signals (
fragility_cascadeweight,error_recovery_efficacycausation-vs-coincidence) need their exact queries pinned before they ship; that is signal-definition work, independent of this decision.