Skip to content

fix(workflow): preserve council delegation reliability#516

Merged
jerryfane merged 4 commits into
jerryfane:mainfrom
gaijinjoe:upstream-council-fixes
Jun 28, 2026
Merged

fix(workflow): preserve council delegation reliability#516
jerryfane merged 4 commits into
jerryfane:mainfrom
gaijinjoe:upstream-council-fixes

Conversation

@gaijinjoe

@gaijinjoe gaijinjoe commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR upstreams the council reliability fixes currently carried in gaijinjoe/gitmoot so downstream council users can run upstream Gitmoot without reintroducing known failures.

Fixes

  • Fail closed when a review integration worktree cannot be built, instead of letting reviewers inspect stale/base code.
  • Make delegation depth and total-job bounds environment-overridable via GITMOOT_MAX_DELEGATION_DEPTH and GITMOOT_MAX_DELEGATION_TOTAL_JOBS, preserving existing defaults when unset/invalid.
  • Start Kimi jobs with a fresh session per job instead of resuming a directory-bound session that fails in per-delegation worktrees.

Verification

  • go build ./...
  • go test ./internal/db ./internal/workflow ./internal/runtime
  • go test ./internal/workflow ./internal/runtime

Notes

These changes are based on upstream main at a06b6ad and are already validated locally with the council stack.

gaijinjoe and others added 3 commits June 28, 2026 15:32
… be built (#1)

A read-only delegation (e.g. a council review) that deps on an implement leg is
supposed to run in an integration worktree with the leg branch merged in. If
integrationDepBranches resolved ZERO leg branches (the leg's branch was not
readable — e.g. transient SQLITE_BUSY contention when several repo-scoped daemons
share one ~/.gitmoot, or the branch not yet committed/visible), the delegation
silently fell through to the BASE checkout and reviewed code WITHOUT the
implemented change — producing false verdicts (observed: council reviewers split,
some judging the base while siblings judged the merged work).

Fix: when a read-only delegation declares a dep on a non-read-only (implement)
sibling but no leg branch resolves, fail closed (block + a
delegation_integration_unresolved event) instead of reviewing base. Also raise the
SQLite busy_timeout 5s -> 15s so multi-daemon write bursts drain instead of
erroring (the contention that triggered the stale read). New regression test:
TestAllocateAndEnqueueDelegationBlocksWhenImplementLegUnresolved.

Co-authored-by: gaijinjoe <crazyjerry543@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
)

MaxDelegationDepth (8) and MaxDelegationTotalJobs (64) are hard caps. A council
coordinator legitimately runs long continuation chains — 6 rounds plus up to 4 fix
cycles, and EACH round/fix-cycle continuation increments DelegationDepth — so the
default-8 depth is exhausted after ~1 fix cycle and the run dies with
'delegation depth limit of 8 reached' long before the 4-fix-cycle policy completes.

Make both bounds env-overridable (GITMOOT_MAX_DELEGATION_DEPTH /
GITMOOT_MAX_DELEGATION_TOTAL_JOBS, positive ints), defaulting to the existing safe
consts when unset/invalid. The width, wall-clock, and token/cost backstops still
bound runaway recursion, so the default is unchanged for everyone; only a daemon
that opts in (e.g. a council daemon) raises them. Test: env override + invalid
fallback.

Co-authored-by: gaijinjoe <crazyjerry543@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(kimi): start a fresh session per job (don't resume a dir-bound session)

Kimi Code sessions are DIRECTORY-SCOPED. The council-kimi seat pinned a single
runtime_ref session and resumed it with -S, but review delegations run in
per-delegation worktrees, so the resume failed every time with 'Session ... was
created under a different directory' — and since kimi is a NATIVE voter, that
failure forced a hard block, holding the gate on essentially every worktree review.

KimiAdapter.Deliver now omits -S and starts a fresh session per job; gitmoot jobs
carry full context in the prompt, so a fresh session works in any worktree and
kimi stays a native seat. RuntimeRef is kept (validated) but not used to resume.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(kimi): update adapter tests for fresh-session-per-job (no -S resume)

Deliver/Health no longer pass -S <session>; assertions now expect a fresh session
command. Renamed TestKimiDeliverCommandResumesSession -> StartsFreshSession and
TestKimiHealthUsesRegisteredSession -> RunsFreshSession to match the new behavior.

---------

Co-authored-by: gaijinjoe <crazyjerry543@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jerryfane

Copy link
Copy Markdown
Owner

Review note from local PR check:

I think this PR should preserve the existing “implement leg already on base” case before merge.

integrationDepBranches currently documents/skips implement deps whose child payload branch equals the parent base branch because that work is already present on the base checkout. With the new fail-closed guard in allocateAndEnqueueDelegation, any implement dependency with zero resolved leg branches appears to become delegation_integration_unresolved, which would block the dependent read-only review.

Could you add a regression test for this case?

  • implement sibling succeeded
  • child payload Branch == parent payload Branch
  • dependent review has Deps: ["that-leg"]
  • allocation/advance does not block, and the review runs on the base/shared checkout

If that case is intentionally impossible after this change, please document why; otherwise I think the guard should distinguish “no branch because already on base” from “no branch because unresolved/missing”.

@jerryfane jerryfane merged commit 65bb46f into jerryfane:main Jun 28, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants