Skip to content

feat(orchestrator): multi-agent goal execution via session_policy spawn#377

Merged
dimakis merged 4 commits into
mainfrom
feat/multi-agent-orchestration
Jun 27, 2026
Merged

feat(orchestrator): multi-agent goal execution via session_policy spawn#377
dimakis merged 4 commits into
mainfrom
feat/multi-agent-orchestration

Conversation

@dimakis

@dimakis dimakis commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Summary

Enable the TaskOrchestrator to spawn dedicated headless sessions for individual tasks, unlocking multi-agent goal coordination. This is Phase 1 of 4 in the multi-agent orchestration initiative.

What changed

  • task-orchestrator.ts: Implement session_policy: 'spawn' path in tick(). When a task has sessionPolicy: 'spawn', the orchestrator creates a dedicated headless session instead of routing to the pinned session. Spawned tasks don't set activeTaskId, enabling parallel execution. Falls back to pinned session on failure.

  • index.ts: Wire spawnSession dependency — creates worktrees, registers in EventStore, starts headless session with NullTransport.

  • app.ts: Add POST /api/signals/resolve endpoint. External agents (e.g. Centaur) can resolve wait_for_signal tasks by gate metadata (type + repo + PR) without knowing task IDs. Authenticated via internal token.

  • api-schemas.ts: Add SignalResolveBody zod schema for the new endpoint.

  • task-store.ts: Add findActiveSignalTasks(gateType) query method for gate-metadata-based task lookup.

Context: Multi-Agent Orchestration Roadmap

The session_policy column and wait_for_signal stage type already existed in the schema but were unused. This PR activates them.

Phase 1 (this PR): Multi-session tick() + signal resolve endpoint
Phase 2: Centaur signal bridge — Centaur POSTs ReviewCompleted to /api/signals/resolve
Phase 3: PR Shepherd creates Task Board goals instead of ad-hoc sessions
Phase 4: Scheduled PR discovery trigger

A PR lifecycle would look like:

Goal: "Shepherd PR #360"
  ├── [agent_work, spawn]         Request Centaur review
  ├── [wait_for_signal]           Wait for review completion
  ├── [agent_work, spawn]         Address findings
  ├── [wait_for_signal, gh_ci]    Wait for CI
  └── [human_review]              Approve merge

Key files

  • server/task-orchestrator.ts — spawn path in tick()
  • server/index.ts — spawnSession wiring
  • server/app.ts — POST /api/signals/resolve
  • server/task-store.ts — findActiveSignalTasks()

Test plan

  • 7 new tests: spawn dispatch, activeTaskId semantics, fallback, signal lookup
  • All 48 existing tests pass
  • TypeScript compilation clean
  • E2E: create goal with spawn tasks, verify sessions are created
  • E2E: POST to /api/signals/resolve, verify matching tasks resolve

🤖 Generated with Claude Code

@dimakis dimakis left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Centaur Review

Found 8 issue(s) (5 warning).

server/task-orchestrator.ts

Solid feature addition with clean separation between spawn/reuse paths; main concerns are race conditions in the async spawn-failure callback (stale this.goalId/this.pinnedClientId after stop()) and a missing human_approval case in the signal resolve endpoint.

  • 🟡 bugs (L355): In the spawn failure callback (.then handler), this.goalId! is used with a non-null assertion, but stop() could have been called between spawn initiation and callback resolution, setting goalId to null. The ! assertion would then pass a null to setTaskContext. Guard with if (!this.goalId) return; before the fallback logic. [fixable]
  • 🟡 bugs (L356): The spawn-failure fallback sets task context and sends prompt to the pinned session, but does NOT set this.activeTaskId = next.id. This means the orchestrator's status reports no active task, yet a task is being worked on by the pinned session. If another tick() fires (e.g., from a concurrent task completing), getNextExecutable won't return this task (it's already active), but the missing activeTaskId breaks the status reporting contract. Additionally, this.pinnedClientId could also be null by the time the callback fires if stop() was called. [fixable]
  • 🔵 unsafe_assumptions (L368): The recursive this.tick() after spawning is safe (bounded by tasks transitioning from pending to active), but it runs synchronously on the call stack. With many spawn-policy tasks (e.g., 50+ parallel subtasks), this creates deep recursion. Consider using queueMicrotask(() => this.tick()) or setImmediate to avoid stack overflow in large task trees. [fixable]

server/app.ts

Solid feature addition with clean separation between spawn/reuse paths; main concerns are race conditions in the async spawn-failure callback (stale this.goalId/this.pinnedClientId after stop()) and a missing human_approval case in the signal resolve endpoint.

  • 🟡 bugs (L1006): SignalResolveBody accepts type: 'human_approval' but the switch statement in the resolve endpoint has no case 'human_approval' branch. A human_approval signal will match the gate type via findActiveSignalTasks but isMatch will remain false since no case handles it — the signal will silently be ignored. Either add a matching case or remove human_approval from the schema. [fixable]
  • 🟡 missing_tests (L980): The new POST /api/signals/resolve endpoint has no integration tests. The existing /api/tasks/:id/signal endpoint also lacks tests, but adding a new gate-matching endpoint with per-type dispatch logic (centaur_review, gh_ci, gh_review) without tests is risky — especially the matching logic with optional repo/pr/pr_url fields. [fixable]
  • 🔵 style (L1008): The gate config property access pattern (gc as Record<string, unknown>).repo as string | undefined is repeated 6 times across 3 switch cases. Consider destructuring the gate config once before the switch: const { repo: taskRepo, pr: taskPr, pr_url: taskPrUrl } = gc as Record<string, unknown>. This reduces duplication and makes the matching logic easier to read. [fixable]

server/index.ts

Solid feature addition with clean separation between spawn/reuse paths; main concerns are race conditions in the async spawn-failure callback (stale this.goalId/this.pinnedClientId after stop()) and a missing human_approval case in the signal resolve endpoint.

  • 🟡 unsafe_assumptions (L240): createSessionWorktrees(wtId, BASE_REPO, config.repos) is called synchronously and can throw (e.g., git failures). The try/catch handles this, but if worktree creation fails, the function returns null and the orchestrator falls back to the pinned session — yet the task is already marked active in the orchestrator (line 342). The fallback in the orchestrator's .then(null) path sends to the pinned session but the task was spawned for isolation. Consider whether worktree failure should revert the task to pending instead. [fixable]

server/__tests__/task-orchestrator.test.ts

Solid feature addition with clean separation between spawn/reuse paths; main concerns are race conditions in the async spawn-failure callback (stale this.goalId/this.pinnedClientId after stop()) and a missing human_approval case in the signal resolve endpoint.

  • 🔵 missing_tests (L576): The spawn tests verify the happy path and fallback, but don't test the rejection path (line 359-363 in task-orchestrator.ts) where spawnSession throws — the task should revert to pending. Also missing: a test for multiple spawn tasks being dispatched in parallel (verifying the recursive tick() spawns all pending tasks in sequence). [fixable]

Comment thread server/task-orchestrator.ts Outdated
log.info('spawned session for task', { taskId: next.id, clientId });
} else {
log.error('failed to spawn session, falling back to pinned', { taskId: next.id });
this.deps.setTaskContext(next.id, this.goalId!);

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 bugs: In the spawn failure callback (.then handler), this.goalId! is used with a non-null assertion, but stop() could have been called between spawn initiation and callback resolution, setting goalId to null. The ! assertion would then pass a null to setTaskContext. Guard with if (!this.goalId) return; before the fallback logic. [fixable]

Comment thread server/task-orchestrator.ts Outdated
} else {
log.error('failed to spawn session, falling back to pinned', { taskId: next.id });
this.deps.setTaskContext(next.id, this.goalId!);
if (this.pinnedClientId) sendToChat(this.pinnedClientId, prompt);

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 bugs: The spawn-failure fallback sets task context and sends prompt to the pinned session, but does NOT set this.activeTaskId = next.id. This means the orchestrator's status reports no active task, yet a task is being worked on by the pinned session. If another tick() fires (e.g., from a concurrent task completing), getNextExecutable won't return this task (it's already active), but the missing activeTaskId breaks the status reporting contract. Additionally, this.pinnedClientId could also be null by the time the callback fires if stop() was called. [fixable]

Comment thread server/task-orchestrator.ts Outdated

// Don't set activeTaskId — spawned tasks run independently.
// Continue ticking to find more parallel work.
this.tick();

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 unsafe_assumptions: The recursive this.tick() after spawning is safe (bounded by tasks transitioning from pending to active), but it runs synchronously on the call stack. With many spawn-policy tasks (e.g., 50+ parallel subtasks), this creates deep recursion. Consider using queueMicrotask(() => this.tick()) or setImmediate to avoid stack overflow in large task trees. [fixable]

Comment thread server/app.ts
if (!gc) continue;

let isMatch = false;
switch (type) {

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 bugs: SignalResolveBody accepts type: 'human_approval' but the switch statement in the resolve endpoint has no case 'human_approval' branch. A human_approval signal will match the gate type via findActiveSignalTasks but isMatch will remain false since no case handles it — the signal will silently be ignored. Either add a matching case or remove human_approval from the schema. [fixable]

Comment thread server/app.ts
* External agents (e.g. Centaur) POST here after completing work —
* they don't need to know task IDs, just the gate parameters.
*/
app.post('/api/signals/resolve', (req, res) => {

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 missing_tests: The new POST /api/signals/resolve endpoint has no integration tests. The existing /api/tasks/:id/signal endpoint also lacks tests, but adding a new gate-matching endpoint with per-type dispatch logic (centaur_review, gh_ci, gh_review) without tests is risky — especially the matching logic with optional repo/pr/pr_url fields. [fixable]

Comment thread server/app.ts Outdated
let isMatch = false;
switch (type) {
case 'centaur_review': {
const taskPrUrl = (gc as Record<string, unknown>).pr_url as string | undefined;

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 style: The gate config property access pattern (gc as Record<string, unknown>).repo as string | undefined is repeated 6 times across 3 switch cases. Consider destructuring the gate config once before the switch: const { repo: taskRepo, pr: taskPr, pr_url: taskPrUrl } = gc as Record<string, unknown>. This reduces duplication and makes the matching logic easier to read. [fixable]

Comment thread server/index.ts Outdated
const config = getRepoConfig();

try {
createSessionWorktrees(wtId, BASE_REPO, config.repos);

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 unsafe_assumptions: createSessionWorktrees(wtId, BASE_REPO, config.repos) is called synchronously and can throw (e.g., git failures). The try/catch handles this, but if worktree creation fails, the function returns null and the orchestrator falls back to the pinned session — yet the task is already marked active in the orchestrator (line 342). The fallback in the orchestrator's .then(null) path sends to the pinned session but the task was spawned for isolation. Consider whether worktree failure should revert the task to pending instead. [fixable]


const [, prompt, goalArg] = spawnSession.mock.calls[0];
expect(goalArg).toBe(goal.id);
expect(prompt).toContain('Spawn task');

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 missing_tests: The spawn tests verify the happy path and fallback, but don't test the rejection path (line 359-363 in task-orchestrator.ts) where spawnSession throws — the task should revert to pending. Also missing: a test for multiple spawn tasks being dispatched in parallel (verifying the recursive tick() spawns all pending tasks in sequence). [fixable]

Enable the TaskOrchestrator to spawn dedicated headless sessions for
individual tasks, unlocking multi-agent goal coordination.

Changes:
- task-orchestrator: implement session_policy 'spawn' path in tick().
  Spawned tasks get independent sessions and don't block activeTaskId,
  allowing parallel execution. Falls back to pinned session on failure.
- index.ts: wire spawnSession dep — creates worktrees, registers in
  event store, starts headless session with NullTransport.
- app.ts: add POST /api/signals/resolve endpoint for external agents
  (e.g. Centaur) to resolve wait_for_signal tasks by gate metadata
  (type + repo + PR) without knowing task IDs.
- api-schemas: add SignalResolveBody zod schema.
- task-store: add findActiveSignalTasks(gateType) query method.
- Tests: 7 new tests covering spawn dispatch, activeTaskId semantics,
  fallback behavior, and signal task lookup.

Part of multi-agent orchestration initiative — Phase 1 of 4.
Next: Centaur signal bridge (Phase 2), PR Shepherd goals (Phase 3).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dimakis dimakis force-pushed the feat/multi-agent-orchestration branch from 615d0d3 to c000d4f Compare June 20, 2026 17:39
@dimakis

dimakis commented Jun 20, 2026

Copy link
Copy Markdown
Owner Author

@centaur-review Please review this PR.

This is Phase 1 of the multi-agent orchestration initiative — enables session_policy: 'spawn' in the TaskOrchestrator and adds /api/signals/resolve for external signal resolution.

Key areas to review:

  • task-orchestrator.ts — spawn path in tick(), parallel execution semantics
  • app.ts — new /api/signals/resolve endpoint, auth model
  • task-store.tsfindActiveSignalTasks() query
  • index.tsspawnSession wiring with NullTransport

@dimakis dimakis left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Centaur Review

Found 7 issue(s) (4 warning).

server/task-orchestrator.ts

Sound multi-agent spawn design with correct DFS dispatch and orphan recovery, but the async .then() callbacks read this.goalId/this.pinnedClientId live — creating a race with stop() that can pass null to setTaskContext. The /api/signals/resolve endpoint needs test coverage and has an unhandled human_approval gate type.

  • 🟡 bugs (L355): Race condition: this.goalId! is read inside an async .then() callback, but stop() nullifies this.goalId (line 141). If the orchestrator is stopped before spawnSession resolves and returns null, this.goalId! evaluates to null — passed to setTaskContext as a null goalId. Capture goalId before the async call (e.g. const goalId = this.goalId;) and use the captured value in the callback, with a guard for the null case. [fixable]
  • 🟡 bugs (L356): Same race: this.pinnedClientId is read inside the async .then() callback, but stop() nullifies it (line 145). If the orchestrator stops before the spawn-failure fallback fires, the fallback silently does nothing — the task stays active with no session and no pinned client to handle it, effectively orphaned until the next tick reclaims it. Consider guarding with if (this.state !== 'running') return; at the top of both .then() callbacks. [fixable]
  • 🔵 unsafe_assumptions (L368): The recursive this.tick() call after spawning enables parallel task dispatch, but there's no depth guard. If a goal has many spawn-policy leaf tasks, tick() recurses once per task before any stack frame returns. For a goal with ~100 spawn tasks this could approach stack limits. Consider using a loop or queueMicrotask(()=> this.tick()) to avoid unbounded recursion. [fixable]

server/app.ts

Sound multi-agent spawn design with correct DFS dispatch and orphan recovery, but the async .then() callbacks read this.goalId/this.pinnedClientId live — creating a race with stop() that can pass null to setTaskContext. The /api/signals/resolve endpoint needs test coverage and has an unhandled human_approval gate type.

  • 🟡 bugs (L1027): human_approval is accepted by the SignalResolveBody schema (api-schemas.ts:188) but has no case in the switch statement (lines 1027-1043). Tasks with type: 'human_approval' will be fetched from the DB but never matched, silently returning { ok: true, matched: [] }. Either add a matching case or remove human_approval from the schema's enum to avoid misleading callers into thinking their resolve request succeeded. [fixable]
  • 🟡 missing_tests (L1001): The /api/signals/resolve endpoint has no test coverage. The gate-matching logic (centaur_review by pr_url/repo+pr, gh_ci/gh_review by repo+pr) is non-trivial and untested at the HTTP layer. At minimum, add tests for: successful match by repo+pr, match by pr_url, no-match when gate metadata doesn't align, and auth rejection. [fixable]
  • 🔵 style (L1029): Repeated (gc as Record<string, unknown>).fieldName as Type casts across the switch cases are noisy. Consider extracting a typed interface for gate config metadata (e.g. { repo?: string; pr?: number; pr_url?: string }) or a helper function to reduce the casting boilerplate. [fixable]

server/__tests__/task-orchestrator.test.ts

Sound multi-agent spawn design with correct DFS dispatch and orphan recovery, but the async .then() callbacks read this.goalId/this.pinnedClientId live — creating a race with stop() that can pass null to setTaskContext. The /api/signals/resolve endpoint needs test coverage and has an unhandled human_approval gate type.

  • 🔵 missing_tests: No test covers the spawn-failure fallback path (spawnSession returning null or rejecting). The existing tests verify the happy path and the missing-spawnSession fallback, but not the case where spawnSession is provided and fails. This is where the this.goalId! race condition lives. [fixable]

Comment thread server/task-orchestrator.ts Outdated
log.info('spawned session for task', { taskId: next.id, clientId });
} else {
log.error('failed to spawn session, falling back to pinned', { taskId: next.id });
this.deps.setTaskContext(next.id, this.goalId!);

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 bugs: Race condition: this.goalId! is read inside an async .then() callback, but stop() nullifies this.goalId (line 141). If the orchestrator is stopped before spawnSession resolves and returns null, this.goalId! evaluates to null — passed to setTaskContext as a null goalId. Capture goalId before the async call (e.g. const goalId = this.goalId;) and use the captured value in the callback, with a guard for the null case. [fixable]

Comment thread server/task-orchestrator.ts Outdated
} else {
log.error('failed to spawn session, falling back to pinned', { taskId: next.id });
this.deps.setTaskContext(next.id, this.goalId!);
if (this.pinnedClientId) sendToChat(this.pinnedClientId, prompt);

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 bugs: Same race: this.pinnedClientId is read inside the async .then() callback, but stop() nullifies it (line 145). If the orchestrator stops before the spawn-failure fallback fires, the fallback silently does nothing — the task stays active with no session and no pinned client to handle it, effectively orphaned until the next tick reclaims it. Consider guarding with if (this.state !== 'running') return; at the top of both .then() callbacks. [fixable]

Comment thread server/task-orchestrator.ts Outdated

// Don't set activeTaskId — spawned tasks run independently.
// Continue ticking to find more parallel work.
this.tick();

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 unsafe_assumptions: The recursive this.tick() call after spawning enables parallel task dispatch, but there's no depth guard. If a goal has many spawn-policy leaf tasks, tick() recurses once per task before any stack frame returns. For a goal with ~100 spawn tasks this could approach stack limits. Consider using a loop or queueMicrotask(()=> this.tick()) to avoid unbounded recursion. [fixable]

Comment thread server/app.ts
if (!gc) continue;

let isMatch = false;
switch (type) {

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 bugs: human_approval is accepted by the SignalResolveBody schema (api-schemas.ts:188) but has no case in the switch statement (lines 1027-1043). Tasks with type: 'human_approval' will be fetched from the DB but never matched, silently returning { ok: true, matched: [] }. Either add a matching case or remove human_approval from the schema's enum to avoid misleading callers into thinking their resolve request succeeded. [fixable]

Comment thread server/app.ts
* External agents (e.g. Centaur) POST here after completing work —
* they don't need to know task IDs, just the gate parameters.
*/
app.post('/api/signals/resolve', (req, res) => {

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 missing_tests: The /api/signals/resolve endpoint has no test coverage. The gate-matching logic (centaur_review by pr_url/repo+pr, gh_ci/gh_review by repo+pr) is non-trivial and untested at the HTTP layer. At minimum, add tests for: successful match by repo+pr, match by pr_url, no-match when gate metadata doesn't align, and auth rejection. [fixable]

Comment thread server/app.ts Outdated
let isMatch = false;
switch (type) {
case 'centaur_review': {
const taskPrUrl = (gc as Record<string, unknown>).pr_url as string | undefined;

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 style: Repeated (gc as Record<string, unknown>).fieldName as Type casts across the switch cases are noisy. Consider extracting a typed interface for gate config metadata (e.g. { repo?: string; pr?: number; pr_url?: string }) or a helper function to reduce the casting boilerplate. [fixable]

- Fix race condition: capture goalId/pinnedClientId before async
  boundary and compare against current state in callbacks. Handles
  stop()+start() with different goal, not just stop() alone.
- Add spawn depth guard (MAX_SPAWN_DEPTH=50) to prevent runaway
  recursive tick dispatch for goals with many spawn tasks.
- Fix fallback conflict: when spawn returns null, only claim pinned
  session if activeTaskId is null; otherwise mark task as blocked.
- Add human_approval case to /api/signals/resolve switch statement.
- Extract typed destructuring for gate config in resolve endpoint.
- Add tests: goal-restart race, pinned-session-busy fallback,
  gate matching logic for all signal types, endpoint resolve flow.

All 63 tests pass. Addresses all findings from Centaur reviews.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dimakis

dimakis commented Jun 20, 2026

Copy link
Copy Markdown
Owner Author

@centaur-review Please re-review — all findings from the previous two reviews have been addressed:

  1. Race condition fixed — captured goalId/pinnedClientId as locals before async boundary; guard now checks this.goalId !== capturedGoalId (handles stop+restart, not just stop)
  2. Spawn depth guardMAX_SPAWN_DEPTH=50 prevents runaway recursive tick dispatch
  3. Fallback conflict fixed — only claims pinned session if activeTaskId is null; otherwise marks task as blocked
  4. human_approval case added — switch now handles all gate types
  5. Typed destructuring — eliminated repeated (gc as Record<string, unknown>) casts
  6. Tests added — goal-restart race, pinned-session-busy fallback, gate matching for all signal types, endpoint resolve flow (63 tests total, all passing)

@dimakis dimakis left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Centaur Review

Found 9 issue(s) (3 critical) (4 warning).

server/index.ts

The spawn lifecycle has three critical bugs: double worktree creation with divergent IDs (leaking worktrees), an EventStore entry disconnected from the real session, and await startChat() blocking for the full session lifetime rather than returning early — making the fire-and-forget spawn pattern ineffective.

  • 🔴 bugs (L249): Double worktree creation with divergent IDs. spawnSession calls createSessionWorktrees(wtId, BASE_REPO, config.repos) from worktree.ts, creating primary + secondary worktrees. Then await startChat(...) internally calls generateWtId() (line 799 of chat.ts) to produce a DIFFERENT wtId and creates its own worktrees. The agent runs in the second set; the first set is orphaned and leaked. Either remove the createSessionWorktrees call here (let startChat handle it) or pass the pre-created worktree path via startChat's cwd option to skip its internal worktree creation. [fixable]
  • 🔴 bugs (L251): EventStore entry uses the wrong sessionId. eventStore.upsertSession({ sessionId: wtId, ... }) stores an entry keyed by the spawner's wtId, but startChat generates its own wtId internally and creates a separate EventStore entry for the actual session. The entry created here (with telosTaskId linking to the goal) is disconnected from the real session, and the real session won't have telosTaskId set. [fixable]
  • 🔴 bugs (L261): await startChat(...) blocks until runQueryLoop() finishes — i.e., the entire agent session lifecycle. This means spawnSession only returns the clientId after the agent has completed (or crashed). The .then() callback in the orchestrator (task-orchestrator.ts:361) that calls setSessionId() fires when the session is already over, making the session assignment pointless. The intent appears to be fire-and-forget spawning, but the actual behavior is 'await full session completion then return'. The startChat call should not be awaited here, or a callback like onSessionResolved should be used to get the clientId early. [fixable]

server/task-orchestrator.ts

The spawn lifecycle has three critical bugs: double worktree creation with divergent IDs (leaking worktrees), an EventStore entry disconnected from the real session, and await startChat() blocking for the full session lifetime rather than returning early — making the fire-and-forget spawn pattern ineffective.

  • 🟡 bugs (L366): Orphan detection ID format mismatch. setSessionId(next.id, clientId) stores the clientId (format headless:xxx) as the task's sessionId, but getActiveSessionIds() (index.ts:235-241) returns SDK-assigned UUIDs from the registry (via session.sessionId). These formats will never match, so if this code path executes while the task is still active, orphan detection will falsely reclaim it. Currently mitigated by timing (the callback fires after session completion), but it's a latent correctness issue. [fixable]
  • 🟡 unsafe_assumptions (L147): No spawned session cleanup on stop(). When the orchestrator stops, in-flight spawned headless sessions continue running with no way to abort them. The stop() method resets local state but doesn't track or abort spawned sessions. Consider tracking spawned clientIds and aborting them on stop, or adding an abortSpawnedSession callback to OrchestratorDeps. [fixable]

server/app.ts

The spawn lifecycle has three critical bugs: double worktree creation with divergent IDs (leaking worktrees), an EventStore entry disconnected from the real session, and await startChat() blocking for the full session lifetime rather than returning early — making the fire-and-forget spawn pattern ineffective.

  • 🟡 unsafe_assumptions (L1041): human_approval gate type matches ALL active tasks of that type without any discriminator. If multiple tasks are waiting for human approval, a single resolve signal resolves every one of them. Consider requiring a task-specific identifier (e.g., a gate_id field) to disambiguate which approval is being granted. [fixable]
  • 🟡 missing_tests (L1001): No HTTP integration test for POST /api/signals/resolve. The signal-processor unit tests mirror the matching logic, but the actual endpoint (auth guard, Zod validation, response shape, task broadcast, 503 when signalProcessor is null) is untested. Other endpoints in this file have supertest-based route tests. [fixable]

server/task-store.ts

The spawn lifecycle has three critical bugs: double worktree creation with divergent IDs (leaking worktrees), an EventStore entry disconnected from the real session, and await startChat() blocking for the full session lifetime rather than returning early — making the fire-and-forget spawn pattern ineffective.

  • 🔵 style (L508): findActiveSignalTasks fetches all active signal tasks from SQLite then filters by gate type in JavaScript via JSON.parse (in rowToTask). Since gate_config is stored as JSON text, the filter could use SQLite's json_extract(gate_config, '$.type') = ? to push filtering into the query and avoid deserializing non-matching rows. [fixable]

server/__tests__/signal-processor.test.ts

The spawn lifecycle has three critical bugs: double worktree creation with divergent IDs (leaking worktrees), an EventStore entry disconnected from the real session, and await startChat() blocking for the full session lifetime rather than returning early — making the fire-and-forget spawn pattern ineffective.

  • 🔵 style (L250): The test describe block gate matching (mirrors /api/signals/resolve logic) tests findActiveSignalTasks + manual matching in test code, but the actual matching logic lives separately in app.ts. If the endpoint logic diverges from these tests, the tests won't catch it. Consider extracting the matching logic into a shared function that both the endpoint and tests use. [fixable]

Comment thread server/index.ts Outdated
const config = getRepoConfig();

try {
createSessionWorktrees(wtId, BASE_REPO, config.repos);

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 bugs: Double worktree creation with divergent IDs. spawnSession calls createSessionWorktrees(wtId, BASE_REPO, config.repos) from worktree.ts, creating primary + secondary worktrees. Then await startChat(...) internally calls generateWtId() (line 799 of chat.ts) to produce a DIFFERENT wtId and creates its own worktrees. The agent runs in the second set; the first set is orphaned and leaked. Either remove the createSessionWorktrees call here (let startChat handle it) or pass the pre-created worktree path via startChat's cwd option to skip its internal worktree creation. [fixable]

Comment thread server/index.ts Outdated
try {
createSessionWorktrees(wtId, BASE_REPO, config.repos);

eventStore.upsertSession({

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 bugs: EventStore entry uses the wrong sessionId. eventStore.upsertSession({ sessionId: wtId, ... }) stores an entry keyed by the spawner's wtId, but startChat generates its own wtId internally and creates a separate EventStore entry for the actual session. The entry created here (with telosTaskId linking to the goal) is disconnected from the real session, and the real session won't have telosTaskId set. [fixable]

Comment thread server/index.ts Outdated
});

const transport = new NullTransport();
await startChat(transport, clientId, prompt, {

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 bugs: await startChat(...) blocks until runQueryLoop() finishes — i.e., the entire agent session lifecycle. This means spawnSession only returns the clientId after the agent has completed (or crashed). The .then() callback in the orchestrator (task-orchestrator.ts:361) that calls setSessionId() fires when the session is already over, making the session assignment pointless. The intent appears to be fire-and-forget spawning, but the actual behavior is 'await full session completion then return'. The startChat call should not be awaited here, or a callback like onSessionResolved should be used to get the clientId early. [fixable]

if (this.goalId !== capturedGoalId) return;

if (clientId) {
this.deps.store.setSessionId(next.id, clientId);

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 bugs: Orphan detection ID format mismatch. setSessionId(next.id, clientId) stores the clientId (format headless:xxx) as the task's sessionId, but getActiveSessionIds() (index.ts:235-241) returns SDK-assigned UUIDs from the registry (via session.sessionId). These formats will never match, so if this code path executes while the task is still active, orphan detection will falsely reclaim it. Currently mitigated by timing (the callback fires after session completion), but it's a latent correctness issue. [fixable]

@@ -141,6 +147,7 @@ export class TaskOrchestrator {
this.specMode = false;

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 unsafe_assumptions: No spawned session cleanup on stop(). When the orchestrator stops, in-flight spawned headless sessions continue running with no way to abort them. The stop() method resets local state but doesn't track or abort spawned sessions. Consider tracking spawned clientIds and aborting them on stop, or adding an abortSpawnedSession callback to OrchestratorDeps. [fixable]

Comment thread server/app.ts
}
case 'human_approval': {
// human_approval signals match any active task of this gate type
isMatch = true;

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 unsafe_assumptions: human_approval gate type matches ALL active tasks of that type without any discriminator. If multiple tasks are waiting for human approval, a single resolve signal resolves every one of them. Consider requiring a task-specific identifier (e.g., a gate_id field) to disambiguate which approval is being granted. [fixable]

Comment thread server/app.ts
* External agents (e.g. Centaur) POST here after completing work —
* they don't need to know task IDs, just the gate parameters.
*/
app.post('/api/signals/resolve', (req, res) => {

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 missing_tests: No HTTP integration test for POST /api/signals/resolve. The signal-processor unit tests mirror the matching logic, but the actual endpoint (auth guard, Zod validation, response shape, task broadcast, 503 when signalProcessor is null) is untested. Other endpoints in this file have supertest-based route tests. [fixable]

Comment thread server/task-store.ts

/** Find active wait_for_signal tasks matching a gate type. */
findActiveSignalTasks(gateType: string): Task[] {
const rows = this.getDb()

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 style: findActiveSignalTasks fetches all active signal tasks from SQLite then filters by gate type in JavaScript via JSON.parse (in rowToTask). Since gate_config is stored as JSON text, the filter could use SQLite's json_extract(gate_config, '$.type') = ? to push filtering into the query and avoid deserializing non-matching rows. [fixable]

});
});

describe('gate matching (mirrors /api/signals/resolve logic)', () => {

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 style: The test describe block gate matching (mirrors /api/signals/resolve logic) tests findActiveSignalTasks + manual matching in test code, but the actual matching logic lives separately in app.ts. If the endpoint logic diverges from these tests, the tests won't catch it. Consider extracting the matching logic into a shared function that both the endpoint and tests use. [fixable]

@dimakis dimakis left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Centaur Review

Found 7 issue(s) (1 critical) (4 warning).

server/index.ts

Critical double-worktree-creation bug in spawnSession (index.ts creates worktrees then startChat creates more with a different ID); matching logic has a potential type coercion issue; missing HTTP-level tests for the new resolve endpoint.

  • 🔴 bugs (L249): Double worktree creation. spawnSession calls createSessionWorktrees(wtId, BASE_REPO, config.repos) from worktree.ts, then calls startChat(transport, clientId, prompt, { isolation: true }). Inside _startChatInner (chat.ts:799), a new wtId is generated and createSessionWorktrees from chat.ts is called again with isolation: true — creating a second, orphaned worktree. The agent session will use the second worktree (from startChat), while the first (from index.ts) is leaked. Fix: either pass isolation: false to startChat (and pass a cwd pointing to the worktree created on line 249), or remove the explicit createSessionWorktrees call on line 249 and let startChat handle it. [fixable]
  • 🟡 bugs (L251): eventStore.upsertSession is called with sessionId: wtId, but startChat will internally generate a different wtId (chat.ts:799) and register the session under a different ID. The session metadata stored here won't match the actual session created by startChat, since startChat resolves its own sessionId asynchronously via the SDK. [fixable]

server/app.ts

Critical double-worktree-creation bug in spawnSession (index.ts creates worktrees then startChat creates more with a different ID); matching logic has a potential type coercion issue; missing HTTP-level tests for the new resolve endpoint.

  • 🟡 unsafe_assumptions (L1026): gateConfig is cast to Record<string, unknown> and destructured for repo, pr, pr_url. The pr field from the request body is a number (per Zod schema), but taskPr from the gate config is parsed from JSON and could be a number or string depending on how it was stored. The === comparison on line 1031/1036 (taskPr === pr) will fail if one is a string and the other a number. Consider using == or explicit coercion for the pr comparison. [fixable]
  • 🟡 missing_tests (L1001): The /api/signals/resolve endpoint has no HTTP-level integration test. The signal-processor test suite mirrors the matching logic in-process but doesn't test the endpoint's auth check, Zod validation, or 503 when signalProcessor is null. An app.test.ts (supertest) test would cover these layers. [fixable]

server/task-orchestrator.ts

Critical double-worktree-creation bug in spawnSession (index.ts creates worktrees then startChat creates more with a different ID); matching logic has a potential type coercion issue; missing HTTP-level tests for the new resolve endpoint.

  • 🟡 unsafe_assumptions (L345): next.sessionPolicy ?? 'reuse' is dead code — sessionPolicy always has a value from the DB (default 'auto'). Tasks created via the API without specifying sessionPolicy get 'auto', which falls through to the else (reuse) branch because 'auto' !== 'spawn'. This works but 'auto' is undocumented as a policy. If 'auto' should eventually mean something different (e.g., spawn when resources are available), the fallback logic needs updating. Consider making this explicit: policy === 'spawn' is the only check that matters.

server/task-store.ts

Critical double-worktree-creation bug in spawnSession (index.ts creates worktrees then startChat creates more with a different ID); matching logic has a potential type coercion issue; missing HTTP-level tests for the new resolve endpoint.

  • 🔵 style (L509): findActiveSignalTasks fetches all active signal tasks from SQLite, then filters by gate type in JavaScript. The gate type could be pushed into the SQL query using json_extract(gate_config, '$.type') = ? to reduce data transfer and deserialization overhead, especially if the number of active signal tasks grows. [fixable]

server/__tests__/task-orchestrator.test.ts

Critical double-worktree-creation bug in spawnSession (index.ts creates worktrees then startChat creates more with a different ID); matching logic has a potential type coercion issue; missing HTTP-level tests for the new resolve endpoint.

  • 🔵 missing_tests: No test covers the spawnSession error handler when the orchestrator is still on the same goal (lines 386-399 of task-orchestrator.ts). The existing 'blocks task when spawnSession rejects' test validates the task status change, but doesn't verify that broadcastTasks is called in the rejection handler. Also, no test covers the MAX_SPAWN_DEPTH limit being hit — the depth guard at line 405 is untested. [fixable]

Comment thread server/index.ts Outdated
const config = getRepoConfig();

try {
createSessionWorktrees(wtId, BASE_REPO, config.repos);

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 bugs: Double worktree creation. spawnSession calls createSessionWorktrees(wtId, BASE_REPO, config.repos) from worktree.ts, then calls startChat(transport, clientId, prompt, { isolation: true }). Inside _startChatInner (chat.ts:799), a new wtId is generated and createSessionWorktrees from chat.ts is called again with isolation: true — creating a second, orphaned worktree. The agent session will use the second worktree (from startChat), while the first (from index.ts) is leaked. Fix: either pass isolation: false to startChat (and pass a cwd pointing to the worktree created on line 249), or remove the explicit createSessionWorktrees call on line 249 and let startChat handle it. [fixable]

Comment thread server/index.ts Outdated
try {
createSessionWorktrees(wtId, BASE_REPO, config.repos);

eventStore.upsertSession({

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 bugs: eventStore.upsertSession is called with sessionId: wtId, but startChat will internally generate a different wtId (chat.ts:799) and register the session under a different ID. The session metadata stored here won't match the actual session created by startChat, since startChat resolves its own sessionId asynchronously via the SDK. [fixable]

Comment thread server/app.ts
const gc = task.gateConfig;
if (!gc) continue;

const { repo: taskRepo, pr: taskPr, pr_url: taskPrUrl } = gc as Record<string, unknown>;

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 unsafe_assumptions: gateConfig is cast to Record<string, unknown> and destructured for repo, pr, pr_url. The pr field from the request body is a number (per Zod schema), but taskPr from the gate config is parsed from JSON and could be a number or string depending on how it was stored. The === comparison on line 1031/1036 (taskPr === pr) will fail if one is a string and the other a number. Consider using == or explicit coercion for the pr comparison. [fixable]

Comment thread server/app.ts
* External agents (e.g. Centaur) POST here after completing work —
* they don't need to know task IDs, just the gate parameters.
*/
app.post('/api/signals/resolve', (req, res) => {

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 missing_tests: The /api/signals/resolve endpoint has no HTTP-level integration test. The signal-processor test suite mirrors the matching logic in-process but doesn't test the endpoint's auth check, Zod validation, or 503 when signalProcessor is null. An app.test.ts (supertest) test would cover these layers. [fixable]

this.deps.setTaskContext(next.id, this.goalId);
this.deps.broadcastTasks();
this.deps.broadcastStatus(this.getStatus());
const policy = next.sessionPolicy ?? 'reuse';

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 unsafe_assumptions: next.sessionPolicy ?? 'reuse' is dead code — sessionPolicy always has a value from the DB (default 'auto'). Tasks created via the API without specifying sessionPolicy get 'auto', which falls through to the else (reuse) branch because 'auto' !== 'spawn'. This works but 'auto' is undocumented as a policy. If 'auto' should eventually mean something different (e.g., spawn when resources are available), the fallback logic needs updating. Consider making this explicit: policy === 'spawn' is the only check that matters.

Comment thread server/task-store.ts
/** Find active wait_for_signal tasks matching a gate type. */
findActiveSignalTasks(gateType: string): Task[] {
const rows = this.getDb()
.prepare(

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 style: findActiveSignalTasks fetches all active signal tasks from SQLite, then filters by gate type in JavaScript. The gate type could be pushed into the SQL query using json_extract(gate_config, '$.type') = ? to reduce data transfer and deserialization overhead, especially if the number of active signal tasks grows. [fixable]

dimakis and others added 2 commits June 27, 2026 14:16
…duplicate worktrees

Fixes three critical bugs in spawnSession (Centaur review on #377):

1. Remove duplicate worktree creation — startChat handles worktrees
   internally via createSessionWorktrees. The pre-creation in
   spawnSession leaked orphaned worktree directories.
2. Remove duplicate EventStore entry — startChat persists session
   metadata via onSessionResolved callback. The pre-creation stored
   an entry keyed by a different wtId than the real session.
3. Don't await startChat — it blocks until runQueryLoop completes
   (full session lifetime). Fire-and-forget with .catch() for errors.

Also: push gate_config type filtering into SQLite via json_extract,
and document human_approval broad-match as intentional MVP design.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Centaur flagged that taskPr from JSON gate_config could be string or
number while the Zod-validated request pr is always number. Using ===
would silently fail to match. Coerce both sides via Number() for safe
comparison.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dimakis

dimakis commented Jun 27, 2026

Copy link
Copy Markdown
Owner Author

Centaur Review

Found 7 issue(s) (1 critical) (3 warning).

server/task-orchestrator.ts

The spawn dispatch mechanism is well-designed (depth guard, goal-epoch guard, microtask recursion), but the orchestrator transitions to 'paused' after dispatching all spawn tasks, which blocks all task-completion callbacks — spawned tasks complete but the orchestrator never advances to goal completion.

  • 🔴 bugs (L304): After dispatching all spawn tasks, tick() finds no more pending tasks (they're all active) and transitions to state = 'paused'. This blocks all advancement: onTaskCompleted() (line 195), onTaskBlocked() (line 203), and direct tick() calls (line 260) all guard on state === 'running'. Spawned tasks that complete via the REST API (app.ts:668) or signal resolution (index.ts:181) will be silently ignored. The goal will never reach terminal state detection. Fix: before pausing, check whether active spawned tasks exist (e.g. tasks with status='active' and sessionPolicy='spawn'); if so, stay in 'running' rather than transitioning to 'paused'. [fixable]
  • 🟡 bugs (L372): When a spawn fails and the orchestrator has already transitioned to 'paused' (no more pending tasks), the fallback code sets this.activeTaskId and calls sendToChat() — but this.state remains 'paused'. When this fallback task completes, onTaskCompleted returns early because state !== 'running'. The orchestrator is stuck. This is a consequence of the critical bug above, but the fallback path should additionally set this.state = 'running' when claiming the pinned session. [fixable]
  • 🔵 style (L376): sendToChat is imported and called directly (not via deps), making the fallback-to-pinned path untestable — the tests mock deps.setTaskContext but can't verify the prompt was sent. The reuse path (line 423) has the same pattern pre-existing, but since this PR adds a new async fallback code path, consider routing through a dep callback for testability. [fixable]

server/index.ts

The spawn dispatch mechanism is well-designed (depth guard, goal-epoch guard, microtask recursion), but the orchestrator transitions to 'paused' after dispatching all spawn tasks, which blocks all task-completion callbacks — spawned tasks complete but the orchestrator never advances to goal completion.

  • 🟡 bugs (L260): When a spawned headless session fails after startChat initially succeeds (the .catch handler), the task remains in 'active' status with a dead session. The orphan detection in tick() would normally reclaim it, but tick is blocked by the 'paused' state guard. Even after fixing the pause bug, consider calling orchestrator.onTaskBlocked(taskId) or updating the task status in this .catch handler so the orchestrator knows the session died. [fixable]

server/__tests__/task-orchestrator.test.ts

The spawn dispatch mechanism is well-designed (depth guard, goal-epoch guard, microtask recursion), but the orchestrator transitions to 'paused' after dispatching all spawn tasks, which blocks all task-completion callbacks — spawned tasks complete but the orchestrator never advances to goal completion.

  • 🟡 missing_tests: No test verifies that the orchestrator reaches goal completion after spawned tasks finish. The existing tests check initial dispatch, fallback, and error handling, but none simulate task completion (via onTaskCompleted) after spawn. This would have caught the paused-state bug — after all spawn tasks are dispatched, onTaskCompleted is a no-op because the orchestrator paused. [fixable]

server/app.ts

The spawn dispatch mechanism is well-designed (depth guard, goal-epoch guard, microtask recursion), but the orchestrator transitions to 'paused' after dispatching all spawn tasks, which blocks all task-completion callbacks — spawned tasks complete but the orchestrator never advances to goal completion.

  • 🔵 missing_tests (L1001): The new POST /api/signals/resolve endpoint has no integration test. The matching logic (pr_url vs repo+pr, human_approval broadcast) is tested in signal-processor.test.ts at the store level, but the HTTP layer (auth, validation, response shape, broadcast side effect) is untested. Other signal/task endpoints have supertest-based tests in task-routes.test.ts. [fixable]
  • 🔵 unsafe_assumptions (L1039): human_approval matches ANY active task of that gate type, which the comment acknowledges as intentionally broad for MVP. If two goals both have a pending human_approval gate, a single resolve call will approve all of them. Consider logging a warning when candidates.length > 1 for human_approval to surface the ambiguity. [fixable]

@dimakis

dimakis commented Jun 27, 2026

Copy link
Copy Markdown
Owner Author

Centaur Review

Found 6 issue(s) (1 critical) (2 warning).

server/task-orchestrator.ts

Spawn dispatching is well-implemented with good concurrency guards (depth limit, goal-change detection, fallback paths), but the completion loop is broken: spawned sessions don't receive the task MCP server (taskContext never set), and the orchestrator auto-pauses after dispatching all spawn tasks, preventing goal lifecycle completion.

  • 🔴 bugs (L347): Spawned sessions never get the task MCP server. buildTaskMcpServer() in chat.ts (line 562) returns null when session.taskContext is not set, and setTaskContext is never called for spawn-policy tasks — it's only called in the reuse path (line 417) or the null-fallback path (line 374). The agent prompt says 'Use TaskComplete when done' but the tool isn't available. Spawned tasks stay active forever with no completion path. [fixable]
  • 🟡 bugs (L304): After dispatching all spawn-policy tasks, getNextExecutable() returns null (all tasks are active, not pending), causing the orchestrator to set state = 'paused'. Both onTaskCompleted() (line 195) and tick() (line 267) guard on state !== 'running' and return early. Even if task completion were wired up, the orchestrator wouldn't detect goal completion or advance. A spawn-aware check is needed here — e.g., only pause when there are no active spawned tasks remaining. [fixable]

server/index.ts

Spawn dispatching is well-implemented with good concurrency guards (depth limit, goal-change detection, fallback paths), but the completion loop is broken: spawned sessions don't receive the task MCP server (taskContext never set), and the orchestrator auto-pauses after dispatching all spawn tasks, preventing goal lifecycle completion.

  • 🟡 unsafe_assumptions (L246): The outer try/catch is dead code. startChat is an async function — any synchronous throw inside it becomes a rejected promise, handled by the .catch() on line 260. The only statement that could throw synchronously is new NullTransport(), which is a trivial no-op constructor. This gives a false sense of error coverage and masks the fact that all errors flow through the .catch() handler (which logs but doesn't notify the orchestrator of failure — the orchestrator's .then() rejection handler is the one that marks the task blocked). [fixable]

server/app.ts

Spawn dispatching is well-implemented with good concurrency guards (depth limit, goal-change detection, fallback paths), but the completion loop is broken: spawned sessions don't receive the task MCP server (taskContext never set), and the orchestrator auto-pauses after dispatching all spawn tasks, preventing goal lifecycle completion.

  • 🔵 missing_tests (L1001): The /api/signals/resolve endpoint has no HTTP-level tests. Signal matching logic is tested via signal-processor.test.ts, but the endpoint's auth (401), initialization guard (503), validation (400), multi-match resolution, and broadcast side-effect are untested. An integration test using supertest (the project already has app.ts extracted for this purpose) would cover these code paths. [fixable]
  • 🔵 style (L1042): The human_approval match-any behavior is documented with a comment about adding a discriminator field, but the SignalResolveBody schema doesn't include an optional gate_id field. Consider adding gate_id: z.string().optional() to the schema now (even if unused) to avoid a breaking schema change later when multi-approval support is added. [fixable]

server/__tests__/task-orchestrator.test.ts

Spawn dispatching is well-implemented with good concurrency guards (depth limit, goal-change detection, fallback paths), but the completion loop is broken: spawned sessions don't receive the task MCP server (taskContext never set), and the orchestrator auto-pauses after dispatching all spawn tasks, preventing goal lifecycle completion.

  • 🔵 missing_tests: No test covers the end-to-end spawn lifecycle: dispatch → spawned task completes → orchestrator detects goal completion. The current tests verify dispatching and error handling but not the completion callback path, which is where the state='paused' bug (finding fix: auto-reconnect WebSocket and show connection state #2) would surface. [fixable]

@dimakis dimakis merged commit 32877d3 into main Jun 27, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant