Skip to content

Cross-harness orchestration: run Claude Code, Codex, OpenCode, Copilot, and OpenClaw as subagents#2796

Open
mcavage-docker wants to merge 24 commits into
docker:mainfrom
mcavage-docker:gm/cross-harness-orchestration
Open

Cross-harness orchestration: run Claude Code, Codex, OpenCode, Copilot, and OpenClaw as subagents#2796
mcavage-docker wants to merge 24 commits into
docker:mainfrom
mcavage-docker:gm/cross-harness-orchestration

Conversation

@mcavage-docker
Copy link
Copy Markdown

Summary

Adds cross-harness orchestration to docker-agent. Any agent in a team can now be backed by an external agent runtime (Claude Code, Codex, OpenCode, Copilot CLI, OpenClaw) instead of a model. The orchestrator delegates tasks to harness subagents via transfer_task and gets results back through docker-agent's normal event stream, TUI, and session model.

What this does

Declare a harness-backed agent in your team YAML:

agents:
  root:
    model: anthropic/claude-sonnet-4-5
    description: Orchestrator
    instruction: Route coding tasks to the appropriate specialist.
    sub_agents:
      - claude-coder
      - codex-coder

  claude-coder:
    description: Claude Code for complex refactors
    instruction: You are a senior software engineer.
    harness:
      type: claude-code

  codex-coder:
    description: Codex for code generation
    instruction: You are a software engineer.
    harness:
      type: codex

The orchestrator routes tasks to harness subagents. Each harness runs in its own subprocess, executes tools internally, and streams results back.

Supported harnesses

Type Binary Protocol
claude-code claude NDJSON stdout
codex codex JSONL stdout
opencode opencode NDJSON stdout
copilot copilot ACP (JSON-RPC stdio)
openclaw openclaw ACP (JSON-RPC stdio)

Key details

  • Aligned to github.com/rumpl/harness -- uses Djordje's Provider interface as the adapter contract. PRs open upstream for EventReasoning and --include-partial-messages streaming.
  • Token streaming -- Claude Code uses --include-partial-messages for live token streaming in the TUI. Text appears incrementally, not as a wall at the end.
  • Cost tracking -- harness run cost attached to the assistant message and picked up by TotalCost(). Codex shows -- (cost unknown) rather than $0.00.
  • Config version 10 -- adds harness: key to AgentConfig, mutually exclusive with model:. Version 9 configs upgrade automatically.
  • Security -- subprocess env uses an allowlist (not full os.Environ()). ACP file operations sandboxed to session working directory. Harness command validated against injection characters. Subprocesses run in isolated process groups (Setpgid: true) to prevent terminal corruption.
  • Error messages -- missing binary shows install hint. Bad type lists valid values. model + harness conflict caught at config parse.

Testing

  • 50+ unit tests across all adapters and the runtime integration
  • UAT verified: file writes, tool calls, cost tracking, error cases, config validation
  • Pre-existing test failures in pkg/config TestCheckRequiredEnvVars and pkg/teamloader TestLoadExamples/dmr are unrelated to this change

Example

examples/harness-team.yaml

Closes # (if applicable)

…s stubs

- Bump config version to 10; freeze pkg/config/v9 snapshot
- Add HarnessConfig and PermissionPolicyConfig to AgentConfig
- Add validation: model/harness mutual exclusion, supported types, sub_agents/handoffs rejection
- Add HarnessSpec, PermissionPolicy, WithHarness opt to pkg/agent
- Add Session.HarnessSession map for multi-turn resume tokens
- Add pkg/harness: HarnessAdapter, ACPAdapter, canonical 14-event discriminated union, EventSink, ToolExecutor, PermissionRequester, registry with token ownership guard
- Add pkg/harness/replay: Recorder for fixture generation
- Add teamloader harness branch: skip model/toolset resolution for harness-backed agents
- Update agent-schema.json with HarnessConfig and PermissionPolicyConfig
- pkg/runtime/harness_delegation.go: runHarnessForwarding, runHarnessCollecting,
  translateSink (canonical→runtime events), collectingSink, runAdapter/runAdapterACP
  with panic recovery (FR-NEW-10), runtimePermissionRequester, noopToolExecutor
- pkg/runtime/agent_delegation.go: branch runForwarding/runCollecting on HasHarness()
- pkg/harness/claude/: Adapter, Config, NDJSON translator, 3 test fixtures, 7 tests
- pkg/harness/harness_test.go: registry, token ownership, event types, error codes (13 tests)
- All 20 new tests pass; pre-existing failures unchanged
- pkg/harness/sandbox: path confinement (ErrEscape, symlink detection, non-existent write targets), env allowlist with sensitive key filtering
- 9 sandbox tests: traversal, symlink escape, absolute outside, non-existent file, env filtering
- examples/harness-team.yaml: cross-harness team with claude-code + codex subagents
Security (5 criticals addressed):
- Wire sandbox.Resolve into ACP ReadTextFile/WriteTextFile (path traversal prevention)
- Switch subprocess env to allowlist model (prevent credential leakage to harnesses)
- Default permission requester to deny; require auto_allow + i_understand_the_risk
- Validate Harness.Command for injection characters
- Add PermissionPolicy.Mode enum validation

Code review (5 findings addressed):
- Fix SubSessionCompleted emitted on error path (now only on success)
- Wire spec.Timeout into context.WithTimeout
- Fix temp prompt file leak (defer os.Remove)
- Add Args field to ToolCallStart; populate from Claude tool_use input
- translateSink tracks active tool args through ToolCallStart/End pair

ACP:
- Terminal stubs return errors instead of fake success (prevents false agent reasoning)

Config/schema:
- Add version 9 and 10 to schema enum
- Update TestParseExamples to allow harness-backed agents without model
- docs/configuration/agents/index.md: add harness: to schema reference, properties table, and new Harness-Backed Agents section with examples, permission policy, and known limitations
- docs/configuration/overview/index.md: update current version from 8 to 10
- CHANGELOG.md: add Unreleased section for cross-harness orchestration feature
Bug 1: panic in AgentsInfo when harness agent has empty models slice
  - agent.Model() now returns nil for harness-backed agents instead of
    calling rand.Intn(0)
  - Added len(a.models)==0 guard for non-harness agents with no models
  - Added regression test TestHarnessAgentModelReturnsNil
  - Relaxed NewLocalRuntime check to allow harness root agents

Bug 2: harness root agent rejected by NewLocalRuntime
  - Added runHarnessRoot() path in loop.go: when RunStream resolves a
    harness-backed agent, dispatch directly through the harness path
    instead of the model loop
  - Removed the HasHarness() error guard from NewLocalRuntime

Bug 3: harness adapters not registered (init() never called)
  - Added pkg/harness/all/all.go with blank imports of all 5 adapters
  - Blank-imported pkg/harness/all from pkg/teamloader (always in binary
    dependency chain)

Also:
  - Added ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY, GITHUB_TOKEN,
    NODE_PATH to env allowlist (harnesses need their API keys to function)
  - Removed version: "10" from examples/harness-team.yaml (examples must
    not pin a version per teamloader test convention)
- Update codex exec invocation: --ask-for-approval never + --sandbox <mode>
  replaced by --dangerously-bypass-approvals-and-sandbox (codex 0.130.0+)
- Change --cd to -C for working directory flag
- Add message-based error code inference for top-level error events
  (codex error events carry detail in message, not code field)
  401/Unauthorized -> auth_failed, 429/rate_limit -> rate_limited
- Update codex tests to match new flag format
Cost tracking:
- Store harness run cost on the final assistant message (chat.Message.Cost)
  so OwnCost()/TotalCost() pick it up when parent walks sub-sessions
- Store token counts on sub-session via SetUsage() so they persist through
  SubSessionCompletedEvent -> AddSubSession
- Root harness path also attaches cost to the final message
- Verified: cost appears in token_usage event AND in persisted sub-session

AgentInfo from RunStart:
- Add Model field to harness.RunStart so adapters can surface the model name
- Claude adapter populates Model from system/init event
- Codex adapter populates Model from thread.started event
- translateSink emits AgentInfo(agentName, model) on RunStart so sidebar
  shows the harness model name immediately

Codex adapter flags (v0.130.0):
- Replace --ask-for-approval never + --sandbox <mode> with
  --dangerously-bypass-approvals-and-sandbox
- Replace --cd with -C
- Add message-based error code inference (401 -> auth_failed)
- Update tests to match new flag format

Adapter registration:
- Add pkg/harness/all/all.go with blank imports of all 5 adapters
- Blank-import from pkg/teamloader so adapters register in any binary

Auth:
- Add ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY, GITHUB_TOKEN,
  NODE_PATH to env allowlist in both claude and acp adapters

UAT verified:
- Config loads cleanly, dry-run works
- Root harness agent: writes files, produces output, cost tracked
- Subagent delegation: claude-coder and codex-coder both work
- Cost: token_usage event fires with correct cost, persisted to sub-session
- Error cases: missing binary, bad type, model+harness conflict, injection
  attempt, bad permission mode -- all give clear actionable errors
- Config validation: sub_agents on harness agent rejected
Claude Code streaming:
- Add --include-partial-messages to default invocation
- Parse stream_event wrapper, unwrap Anthropic SSE events
- Emit TextStart/TextDelta/TextEnd per content_block_delta text_delta
- Emit ReasoningStart/Delta/End per thinking_delta
- Emit ToolCallStart/ArgsDelta/End from content_block events
- Dedupe in translateAssistant: skip blocks already streamed by index
- Config opt-out: include_partial_messages: false in harness.config
- Update capabilities: TextDeltas: true, StreamingArgs: true
- Add TestTranslateStreamPartialMessages with real fixture

Codex cost display:
- Add CostUnknown bool to harness.UsageSummary
- Codex sets CostUnknown: true when cost_usd is absent
- translateSink emits cost=-1 sentinel when CostUnknown
- Sidebar: costUnknown flag + formatTotalCost() shows -- not $0.00
- Persisted session cost stays 0 (sentinel not written to store)
…corruption

Codex (and potentially Claude Code) spawn bash subprocesses to execute
shell commands. Without Setpgid, these children share docker-agent's
process group and can interact with the terminal, corrupting TUI state.
Setting SysProcAttr.Setpgid=true puts each harness subprocess in its
own process group, isolating its children from the TUI's terminal.
@mcavage-docker mcavage-docker requested a review from a team as a code owner May 14, 2026 16:21
@docker-agent
Copy link
Copy Markdown

PR Review Failed — The review agent encountered an error and could not complete the review. View logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants