feat(kernel): §M11 session-liveness gate + §M12 plumbing hook#30
Conversation
Implements §M11 (kernel-liveness as session-occupancy) per doctrine in
kb/research/kernel/20260417-t187-kernel-proper.md and the design answers
Guido landed on ffs0#33. Plumbs §M12's AdminScopeRewrite call site as
dormant so PR 4 becomes a pure-additive diff.
Closes the three design questions from the T=171 M11/M12 implementation
plan:
Q1 — session_urn on envelope: explicit field wins, unambiguous reverse-
lookup falls back, ambiguous or absent rejects.
Q2 — heartbeat line: below. Allowlist covers sweep WF13, kernel-actor
emissions, and infrastructure ADDs (user/workstation/kernel).
SeedIfAbsent additionally bypasses liveness structurally so
bootstrap from zero state works on every fresh kernel.
Q3 — admin-scope classifier: dormant in PR 3 (returns false). PR 4
fills the logic for authority_scope=kernel MUTATEs on non-kernel
nodes + ontology-governed type touches.
Envelope surface (graph/rewrite.go):
- New optional SessionURN URN field with json tag "session_urn". Additive
and backward-compatible: clients that omit it fall through the reverse-
lookup path; clients where one actor drives multiple sessions must
set it. Docstring updated with §M11 reference.
Resolver (operad/session_context.go):
- ResolveSessionForEnvelope(state, env) returns a structured
ResolveSessionResult with one of six Kind values:
- ResolveSessionActorIsSession — actor's node.TypeID == "session"
- ResolveSessionExplicit — env.SessionURN verified
- ResolveSessionInferred — unambiguous reverse-lookup
- ResolveSessionExplicitMismatch
- ResolveSessionAmbiguous (Candidates populated)
- ResolveSessionAbsent
Called by the kernel liveness gate; exported so PR 4's §M12 pass can
reuse the same resolution for capability walks.
- SystemInternalEnvelope(env) allowlist classifier:
- kernel-URN actors (sweep WF13, reactive type-drift, admin-authority
MUTATEs)
- ADD of infrastructure types (user, workstation, kernel)
Kept conservative; additions weaken §M11.
- AdminScopeRewrite(env, state) returns false — PR 3 plumbing hook for
§M12. PR 4 fills in authority_scope=kernel MUTATE detection and
ontology-governed-type touches.
Gate (kernel/liveness.go):
- Runtime.checkLiveness(env) is called from Apply and ApplyProgram BEFORE
operad validation so failure paths are short.
- Registry-less mode: no-op. Matches existing validator shape.
- Error messages cite §M11 / §M12 and name the failure mode so log
readers can trace the doctrine path.
SeedIfAbsent bypass (kernel/runtime.go):
- Apply now routes through applyWithOptions(env, applyOptions{}). Public
API unchanged.
- SeedIfAbsent calls applyWithOptions(env, applyOptions{skipLiveness: true}).
Bootstrap runs before any session exists, so requiring occupancy would
deadlock the first run. Only exception; all other emitters either hit
the allowlist or pass the resolver.
Replay unaffected (prospective-only invariant):
- fold.Replay does not call checkLiveness (same pattern as PR 1).
Pre-PR-3 persisted envelopes (no SessionURN field) rebuild state
identically. Test pins the invariant.
Tests (30 added):
operad/session_context_test.go — 19:
- Resolver: ActorIsSession, Explicit (OK + three mismatch variants),
Inferred, Ambiguous (Candidates populated), Absent
- Classifier: kernel/sweep actors, infrastructure ADDs, user-actor
non-infra ADD/LINK (both rejected)
- AdminScopeRewrite dormant across three representative envelopes
kernel/liveness_test.go — 11:
- Apply paths for Explicit, Inferred, Ambiguous (rejected), Absent
(rejected), ExplicitMismatch (rejected)
- Allowlist: kernel-actor and infrastructure ADD
- SeedIfAbsent bypass
- ApplyProgram atomic rejection when one envelope fails liveness
- fold.Replay invariant for pre-M11 logs
- Registry-less no-op
Build + test clean:
go build ./...
go test ./...
Stacking: PR 3 is independent of #29 per Guido's §4 plan. Merge order
within round 11: this PR → PR 4 (§M12 fills in AdminScopeRewrite +
extends classifier) → round close.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request implements the §M11 liveness gate and §M12 admin-capability hooks to enforce session occupancy requirements for kernel rewrites. It introduces a SessionURN field to the Envelope struct and adds logic to resolve session contexts, with exemptions for system-internal and bootstrap operations. Review feedback highlights critical issues including missing constant and function definitions that will prevent compilation, as well as data races and logic errors in the Runtime where liveness checks are performed without adequate locking or consideration for state changes within atomic programs.
| import ( | ||
| "moos/kernel/internal/graph" | ||
| ) |
There was a problem hiding this comment.
The constants hasOccupantSrcPort and isOccupantOfTgtPort are used later in this file (lines 114 and 135) but are not defined anywhere in the package. This will cause a compilation error. Please define them here to match the system's port naming convention.
| import ( | |
| "moos/kernel/internal/graph" | |
| ) | |
| import ( | |
| "moos/kernel/internal/graph" | |
| ) | |
| const ( | |
| hasOccupantSrcPort = "has-occupant" | |
| isOccupantOfTgtPort = "is-occupant-of" | |
| ) | |
| // | ||
| // Returns nil on pass, a fmt.Errorf wrapping the failure mode on reject. | ||
| // Error messages name the doctrine section so log readers can trace back. | ||
| func (rt *Runtime) checkLiveness(env graph.Envelope) error { |
There was a problem hiding this comment.
To avoid race conditions and support ApplyProgram's evolving state, checkLiveness should accept the state as an argument instead of accessing rt.state directly. This allows the caller to provide a thread-safe snapshot or a working state.
| func (rt *Runtime) checkLiveness(env graph.Envelope) error { | |
| func (rt *Runtime) checkLiveness(state graph.GraphState, env graph.Envelope) error { |
|
|
||
| // Step 2 — resolve session context. Pass the live state so reverse | ||
| // has-occupant walks see the most recent seat assignments. | ||
| res := operad.ResolveSessionForEnvelope(rt.state, env) |
| if operad.AdminScopeRewrite(env, rt.state) { | ||
| if !operad.CheckAdminCapability(rt.state, env.Actor) { |
There was a problem hiding this comment.
Use the state parameter instead of rt.state. Additionally, operad.CheckAdminCapability is not defined in the operad package in this PR, which will cause a compilation error. A stub must be provided even if the full implementation is deferred to PR 4.
| if operad.AdminScopeRewrite(env, rt.state) { | |
| if !operad.CheckAdminCapability(rt.state, env.Actor) { | |
| if operad.AdminScopeRewrite(env, state) { | |
| if !operad.CheckAdminCapability(state, env.Actor) { |
| // session context fails fast without paying the structural-validation | ||
| // cost. The check is registry-aware: registry-less mode passes through. | ||
| if !opts.skipLiveness { | ||
| if err := rt.checkLiveness(env); err != nil { |
There was a problem hiding this comment.
| // single session-less envelope fails the whole batch fast. | ||
| func (rt *Runtime) ApplyProgram(envelopes []graph.Envelope) ([]graph.EvalResult, error) { | ||
| for _, env := range envelopes { | ||
| if err := rt.checkLiveness(env); err != nil { |
There was a problem hiding this comment.
There are two significant issues here: 1) Accessing rt.state inside checkLiveness without a lock is a data race. 2) Checking liveness for all envelopes against the initial state breaks programs that create a session and use it in the same batch (e.g., ADD session followed by an ADD using that session). This check must be moved into the workingState loop (under the lock) to ensure it validates against the state as it evolves sequentially.
There was a problem hiding this comment.
Pull request overview
Implements §M11 session-liveness gating in the kernel (with a prospective-only invariant for replay) and adds §M12 “admin-scope” plumbing as a dormant hook, while extending the rewrite envelope surface with an optional session_urn field to support unambiguous session resolution.
Changes:
- Add
Envelope.SessionURN(json:"session_urn,omitempty") to support explicit session context on rewrites. - Introduce pure operad helpers to resolve session context and classify system-internal envelopes; add a dormant
AdminScopeRewritehook. - Integrate a new
Runtime.checkLivenessgate intoApply/ApplyProgram, withSeedIfAbsentbypassing liveness for bootstrap seeding; add targeted tests.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/graph/rewrite.go | Adds optional SessionURN field to the envelope JSON surface and documents its semantics. |
| internal/operad/session_context.go | Implements session resolution + allowlist classifier and provides a dormant §M12 admin-scope classifier hook. |
| internal/operad/session_context_test.go | Adds unit tests covering resolver outcomes and classifier behavior (incl. dormant §M12 hook). |
| internal/kernel/liveness.go | Adds Runtime.checkLiveness implementing §M11 gate + §M12 hook integration. |
| internal/kernel/runtime.go | Wires liveness into Apply/ApplyProgram and adds internal applyWithOptions + SeedIfAbsent bypass. |
| internal/kernel/liveness_test.go | Adds tests for liveness acceptance/rejection paths, allowlist behavior, program atomic rejection, replay invariance, and registry-less mode. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func ResolveSessionForEnvelope(state graph.GraphState, env graph.Envelope) ResolveSessionResult { | ||
| // Case 1 — actor is itself a session node. No hop needed. | ||
| if actorNode, ok := state.Nodes[env.Actor]; ok && actorNode.TypeID == "session" { | ||
| return ResolveSessionResult{Kind: ResolveSessionActorIsSession, SessionURN: env.Actor} | ||
| } |
There was a problem hiding this comment.
ResolveSessionForEnvelope treats env.Actor being a session node as an automatic success (ResolveSessionActorIsSession) without verifying that the session is actually occupied (has a canonical WF19 has-occupant relation). Because kernel.checkLiveness treats this Kind as a pass, an unoccupied session node can currently emit user-space rewrites and bypass §M11’s “session-occupancy” gate. If session-actor envelopes are meant to be allowed, the resolver (or the liveness gate) should still require the session to be occupied; otherwise return Absent/ExplicitMismatch for unoccupied sessions.
| // session context fails fast without paying the structural-validation | ||
| // cost. The check is registry-aware: registry-less mode passes through. | ||
| if !opts.skipLiveness { | ||
| if err := rt.checkLiveness(env); err != nil { |
There was a problem hiding this comment.
applyWithOptions calls rt.checkLiveness(env) before taking rt.mu but checkLiveness reads rt.state (maps + indexes). This can race with concurrent Apply/ApplyProgram calls and lead to concurrent map read and map write panics under load. Fix by taking rt.mu.RLock() around the liveness check (or have checkLiveness operate on a snapshot/cloned state captured under RLock) before proceeding to validation/locking.
| if err := rt.checkLiveness(env); err != nil { | |
| rt.mu.RLock() | |
| err := rt.checkLiveness(env) | |
| rt.mu.RUnlock() | |
| if err != nil { |
| func (rt *Runtime) ApplyProgram(envelopes []graph.Envelope) ([]graph.EvalResult, error) { | ||
| for _, env := range envelopes { | ||
| if err := rt.checkLiveness(env); err != nil { | ||
| return nil, err | ||
| } | ||
| if err := rt.validate(env); err != nil { | ||
| return nil, err | ||
| } |
There was a problem hiding this comment.
ApplyProgram performs rt.checkLiveness(env) for each envelope before acquiring rt.mu, but checkLiveness reads rt.state maps. If any other goroutine is applying/persisting at the same time, this introduces a data race and can panic with concurrent map access. Consider taking rt.mu.RLock() for the whole preflight loop (liveness + validate) or capturing a state snapshot under RLock and running all liveness checks against that snapshot.
Addresses all three Copilot line-level findings on #30. (1) Unoccupied-session-as-actor bypass (session_context.go:80): ResolveSessionForEnvelope previously returned ResolveSessionActorIsSession unconditionally whenever env.Actor was a session node. An orphan session (node exists but has no has-occupant relation) could therefore emit user-space rewrites and sail past §M11. Fixed by requiring the session itself to have at least one canonical has-occupant relation pointing at a principal — mirrors CheckAdminCapability's hop pattern. Unoccupied session-as-actor now returns ResolveSessionAbsent; the kernel gate rejects with the same §M11 message as the general absent case. New helper sessionHasAnyOccupant reuses ResolveSessionOccupant so the principal-type check (user|agent) stays in one place. (2) Apply: state-read race with concurrent writers (runtime.go:102): applyWithOptions called rt.checkLiveness before acquiring rt.mu, and checkLiveness reads rt.state.Nodes / Relations / indexes. Under concurrent writes this is a data race with potential "concurrent map read and map write" panics. Wrapped the checkLiveness call in rt.mu.RLock() / RUnlock() so state reads are synchronised. The short window between RUnlock and Lock is acceptable — fold.Evaluate under the write-lock is the authoritative apply-time check for structural invariants. (3) ApplyProgram: same race across batch preflight (runtime.go:181): The preflight loop called checkLiveness on every envelope against rt.state without a lock. Wrapped the entire preflight loop in a single RLock / RUnlock so liveness observations across the batch are consistent with each other. Release before acquiring Lock for the apply body. No lock-upgrade is attempted — Go's sync.RWMutex does not support it — and reactive / sweep paths are unaffected because they use their own lock discipline (applyReactiveLocked is called with Lock already held). Tests (2 added): - operad.TestResolveSessionForEnvelope_ActorIsSession_Unoccupied — pins the resolver fix. Unoccupied session returns Absent, not ActorIsSession. - kernel.TestApply_M11_UnoccupiedSessionAsActor_Rejected — integration: end-to-end Apply rejects the envelope with §M11 error. - kernel.TestApply_M11_OccupiedSessionAsActor_Accepted — positive pair: an occupied session-as-actor still passes, covering the kernel-internal session-heartbeat / turn-count path. go build ./... # clean go test ./... # all packages pass Note: go test -race skipped locally (CGO / gcc not available on this Windows host). Fix correctness argued by construction: RLock held for all state reads in checkLiveness. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Review fixup pushed as 6e67893. All 3 Copilot findings addressed. (1)
|
MSD21091969
left a comment
There was a problem hiding this comment.
LGTM on 6e67893 (the fixup commit). Gemini + Copilot reviews were posted against d31251b (initial), before your 12-minute-later fixup that closed both HIGH-priority concerns. The current head addresses everything substantive.
Verification of prior concerns
| Reviewer | Flag | Status on 6e67893 |
|---|---|---|
Gemini HIGH session_context.go:5 |
"Constants not defined, compile error" | Stale. hasOccupantSrcPort / isOccupantOfTgtPort were defined by moos-kernel#29 at occupancy.go:43 and merged to master at 13:21 — 15 min before PR 30 opened. Your branch picks them up via master. Verified via grep -rn hasOccupantSrcPort internal/operad/ on local master post-pull. |
Gemini HIGH liveness.go:43/60/79 |
"Data race — reads rt.state without lock" | Fixed. applyWithOptions now does rt.mu.RLock() / checkLiveness / rt.mu.RUnlock() around the check. The comment is explicit about the release-before-write-lock pattern (RWMutex no upgrade). |
Gemini HIGH runtime.go:102/196 |
"Data race + ApplyProgram against initial state" | Fixed for the race. ApplyProgram holds rt.mu.RLock() for the entire preflight loop (liveness + validate). Initial-state-vs-working-state is a design call I'll raise separately below. |
Gemini HIGH liveness.go:79 |
"CheckAdminCapability not defined" | Stale. Exists at occupancy.go:142 on master (pre-round-11, confirmed by the M11/M12 plan §1 helper table). |
Copilot session_context.go:91 |
"ActorIsSession bypass lets orphan sessions emit rewrites" | Fixed. Resolver now returns ResolveSessionAbsent when an actor-session has no canonical has-occupant. Test TestApply_M11_UnoccupiedSessionAsActor_Rejected + positive-pair TestApply_M11_OccupiedSessionAsActor_Accepted pin the invariant. Comment on ResolveSessionForEnvelope §1 is explicit about the reasoning. Nice close. |
Copilot runtime.go:102/203 |
"Data race on rt.state in pre-lock checkLiveness" | Fixed. Same RLock pattern. |
Request Gemini + Copilot to re-run on 6e67893 if you want their LGTM on record; the concerns they raised don't re-apply.
Core design — all right
session_urnfield (graph/rewrite.go) — additive +omitempty, the shape I recommended. Backward-compatible with clients that don't set it, which covers pre-PR-30 envelopes at replay and external clients during migration.- Structured resolver (
ResolveSessionResult+ 6-kind enum) — theCandidatesfield onAmbiguousis a good touch; lets the error message list what the caller could setsession_urnto. Error surface quality matters here. - Allowlist + SeedIfAbsent bypass — kernel-URN actors, infrastructure ADDs, seed bootstrap. That's the three classes I'd have asked for. The
skipLivenessoption kept internal (not exported) is the right ergonomic call. AdminScopeRewriteplumbed dormant — returningfalsein PR 3 and letting PR 4 fill it in means PR 4 is a pure-additive surface. Keeps each PR's review scope small.fold.Replayinvariant —TestReplay_PreservesPreM11Rewritespins prospective-only. Same discipline as PR 1 / PR 27.
One design question — ApplyProgram preflight vs working state
Gemini's runtime.go:196 included a semantic concern beyond the race: "Checking liveness for all envelopes against the initial state breaks programs that create a session and use it in the same batch."
I think this is a non-issue under the current doctrine — here's why, and I want your read:
The resolver checks env.SessionURN or env.Actor's session context — those refer to the emitter's session (the session the rewrite runs under), not the session being created or modified. So a batch like:
seq N ADD session:foo actor=user:sam, session_urn=sam.governance
seq N+1 LINK ... on session:foo actor=user:sam, session_urn=sam.governance
Both envelopes' session context is sam.governance (exists, occupied). The newly-ADDed session:foo is the target of the rewrites, not the emitter's context. Initial-state check passes for both.
The scenario that would break:
seq N ADD session:bar actor=user:sam, session_urn=sam.governance
seq N+1 LINK ... actor=user:sam, session_urn=session:bar # <- references just-created session
That's unusual (why would a batch create a session and immediately attribute emits to it?) — typically the kernel creates the session, then in a later round agents attribute envelopes to it. But it's not impossible: atomic-pair creation of session + first occupant is exactly this shape, and the seed flow does it today via skipLiveness.
My read: keep the initial-state check. Document the constraint: "an envelope's session_urn must reference a session that exists in state at the start of the batch; same-batch session creation is a bootstrap-via-seed-only pattern, protected by skipLiveness". Add a test pinning the rejection path. That's cheaper than refactoring ApplyProgram to thread working-state through the preflight.
If you disagree — e.g. you want to support atomic session-birth+first-occupant outside the seed path — we'd need to thread workingState through preflight, but I don't see a driving use case yet. Your call.
Minor doctrine polish — non-blocking
The comment on ResolveSessionForEnvelope (the one explaining the unoccupied-session-as-actor rejection) is excellent. Suggest mirror-ing the reasoning into the doctrine note at kb/research/kernel/20260421-t171-m11-m12-implementation-plan.md §2 — "Session-as-actor is permitted only when that session is itself occupied; mirrors §M12's hop-through-has-occupant pattern." One line in running-state's round-11 section too when round-close fires. Not blocking PR 30.
Merge path
Ready when you are. PR 4 (§M12 admin-cap fill-in) can open on top of this. Standing by for either.
— Guido, session:sam.governance
Guido flagged in #30 review that the ApplyProgram preflight checks §M11 against the state at batch start, not a working state that evolves envelope-by-envelope. Per his suggested resolution — keep the initial- state discipline (simpler, not batch-order-dependent) and document the constraint explicitly — this test pins the rejection path for an intra-batch session reference: env 1: ADD session:newborn (passes under emitter=governance) env 2: session_urn=session:newborn (rejected: not in initial state) If the preflight ever starts threading a working state through, this test must be adjusted deliberately. The rejection behavior is the design, not an accident. Doctrine note tightened at ffs0/kb/research/kernel/20260421-t171-m11-m12-implementation-plan.md §2 with: - Session-as-actor rule (only permitted when session is itself occupied; mirrors §M12's hop pattern) - ApplyProgram initial-state-check rule (explicit rationale for why working-state-through-preflight is NOT the design) Test-only addition here; doctrine commit on ffs0 separately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Ships §M11 (kernel-liveness as session-occupancy) per doctrine in
kb/research/kernel/20260417-t187-kernel-proper.mdand the design answers Guido landed on ffs0#33. Plumbs §M12'sAdminScopeRewritecall site as dormant so PR 4 becomes a pure-additive diff.Closes the three design questions from the T=171 M11/M12 implementation plan:
session_urnfield on envelope wins; unambiguous reverse-lookup falls back; ambiguous/absent rejectsChanges
graph/rewrite.go— envelope surfaceNew optional field
SessionURN URNwith json tagsession_urn. Additive, backward-compatible (omitempty). Clients that omit it fall through the reverse-lookup path; clients where one actor drives multiple sessions MUST set it.operad/session_context.go— resolvers + classifiers (new)ResolveSessionForEnvelope(state, env)returns a structuredResolveSessionResultwith one of six kinds:ActorIsSession,Explicit,Inferred,ExplicitMismatch,Ambiguous(populatesCandidates),Absent.SystemInternalEnvelope(env)allowlist: kernel-URN actors (sweep WF13, reactive type-drift, admin-authority MUTATEs) +ADDof infrastructure types (user,workstation,kernel).AdminScopeRewrite(env, state)returnsfalse— PR 3 plumbing hook. PR 4 fills it in.kernel/liveness.go— the gate (new)Runtime.checkLiveness(env)called fromApply+ApplyProgrambefore operad validation so failure paths are short. Registry-less mode is a no-op. Error messages cite §M11 / §M12 and name the failure mode.kernel/runtime.go— integrationApplyroutes throughapplyWithOptions(env, applyOptions{}). Public API unchanged.SeedIfAbsentcallsapplyWithOptions(env, applyOptions{skipLiveness: true})— bootstrap runs before any session exists, so requiring occupancy would deadlock the first run. The only structural bypass; all other emitters hit the allowlist or pass the resolver.Replay unaffected (prospective-only invariant)
fold.Replaydoes not callcheckLiveness— same pattern as PR 1. Pre-PR-3 persisted envelopes (noSessionURNfield) rebuild state identically.TestReplay_PreservesPreM11Rewritespins the invariant.Tests (30 added)
operad/session_context_test.go(19):ActorIsSession,Explicit(OK + 3 mismatch variants),Inferred,Ambiguous(withCandidates),AbsentAdminScopeRewritedormant across three representative envelopeskernel/liveness_test.go(11):Explicit,Inferred,Ambiguous(rejected),Absent(rejected),ExplicitMismatch(rejected)SeedIfAbsentbypassApplyProgramatomic rejection when one envelope fails livenessfold.Replayinvariant for pre-M11 logsStacking
ResolveSessionOccupant; rotation writes viaRotateSessionOccupant). No merge conflict.AdminScopeRewrite+ extends classifier forkerneltype ADDs per Guido's flag).SeedIfAbsentbypasses structurally, the post-restart seed flow works without additional changes.Doctrine references
kb/research/kernel/20260417-t187-kernel-proper.md§§M11, M12kb/research/kernel/20260421-t171-m11-m12-implementation-plan.md🤖 Generated with Claude Code