Cross-harness orchestration: run Claude Code, Codex, OpenCode, Copilot, and OpenClaw as subagents by mcavage-docker · Pull Request #2796 · docker/docker-agent

mcavage-docker · 2026-05-14T16:21:38Z

Summary

Adds cross-harness orchestration to docker-agent. Any agent in a team can now be backed by an external agent runtime (Claude Code, Codex, OpenCode, Copilot CLI, OpenClaw) instead of a model. The orchestrator delegates tasks to harness subagents via transfer_task and gets results back through docker-agent's normal event stream, TUI, and session model.

What this does

Declare a harness-backed agent in your team YAML:

agents:
  root:
    model: anthropic/claude-sonnet-4-5
    description: Orchestrator
    instruction: Route coding tasks to the appropriate specialist.
    sub_agents:
      - claude-coder
      - codex-coder

  claude-coder:
    description: Claude Code for complex refactors
    instruction: You are a senior software engineer.
    harness:
      type: claude-code

  codex-coder:
    description: Codex for code generation
    instruction: You are a software engineer.
    harness:
      type: codex

The orchestrator routes tasks to harness subagents. Each harness runs in its own subprocess, executes tools internally, and streams results back.

Supported harnesses

Type	Binary	Protocol
`claude-code`	`claude`	NDJSON stdout
`codex`	`codex`	JSONL stdout
`opencode`	`opencode`	NDJSON stdout
`copilot`	`copilot`	ACP (JSON-RPC stdio)
`openclaw`	`openclaw`	ACP (JSON-RPC stdio)

Key details

Aligned to github.com/rumpl/harness -- uses Djordje's Provider interface as the adapter contract. PRs open upstream for EventReasoning and --include-partial-messages streaming.
Token streaming -- Claude Code uses --include-partial-messages for live token streaming in the TUI. Text appears incrementally, not as a wall at the end.
Cost tracking -- harness run cost attached to the assistant message and picked up by TotalCost(). Codex shows -- (cost unknown) rather than $0.00.
Config version 10 -- adds harness: key to AgentConfig, mutually exclusive with model:. Version 9 configs upgrade automatically.
Security -- subprocess env uses an allowlist (not full os.Environ()). ACP file operations sandboxed to session working directory. Harness command validated against injection characters. Subprocesses run in isolated process groups (Setpgid: true) to prevent terminal corruption.
Error messages -- missing binary shows install hint. Bad type lists valid values. model + harness conflict caught at config parse.

Testing

50+ unit tests across all adapters and the runtime integration
UAT verified: file writes, tool calls, cost tracking, error cases, config validation
Pre-existing test failures in pkg/config TestCheckRequiredEnvVars and pkg/teamloader TestLoadExamples/dmr are unrelated to this change

Example

examples/harness-team.yaml

Closes # (if applicable)

…s stubs - Bump config version to 10; freeze pkg/config/v9 snapshot - Add HarnessConfig and PermissionPolicyConfig to AgentConfig - Add validation: model/harness mutual exclusion, supported types, sub_agents/handoffs rejection - Add HarnessSpec, PermissionPolicy, WithHarness opt to pkg/agent - Add Session.HarnessSession map for multi-turn resume tokens - Add pkg/harness: HarnessAdapter, ACPAdapter, canonical 14-event discriminated union, EventSink, ToolExecutor, PermissionRequester, registry with token ownership guard - Add pkg/harness/replay: Recorder for fixture generation - Add teamloader harness branch: skip model/toolset resolution for harness-backed agents - Update agent-schema.json with HarnessConfig and PermissionPolicyConfig

- pkg/runtime/harness_delegation.go: runHarnessForwarding, runHarnessCollecting, translateSink (canonical→runtime events), collectingSink, runAdapter/runAdapterACP with panic recovery (FR-NEW-10), runtimePermissionRequester, noopToolExecutor - pkg/runtime/agent_delegation.go: branch runForwarding/runCollecting on HasHarness() - pkg/harness/claude/: Adapter, Config, NDJSON translator, 3 test fixtures, 7 tests - pkg/harness/harness_test.go: registry, token ownership, event types, error codes (13 tests) - All 20 new tests pass; pre-existing failures unchanged

- pkg/harness/sandbox: path confinement (ErrEscape, symlink detection, non-existent write targets), env allowlist with sensitive key filtering - 9 sandbox tests: traversal, symlink escape, absolute outside, non-existent file, env filtering - examples/harness-team.yaml: cross-harness team with claude-code + codex subagents

Security (5 criticals addressed): - Wire sandbox.Resolve into ACP ReadTextFile/WriteTextFile (path traversal prevention) - Switch subprocess env to allowlist model (prevent credential leakage to harnesses) - Default permission requester to deny; require auto_allow + i_understand_the_risk - Validate Harness.Command for injection characters - Add PermissionPolicy.Mode enum validation Code review (5 findings addressed): - Fix SubSessionCompleted emitted on error path (now only on success) - Wire spec.Timeout into context.WithTimeout - Fix temp prompt file leak (defer os.Remove) - Add Args field to ToolCallStart; populate from Claude tool_use input - translateSink tracks active tool args through ToolCallStart/End pair ACP: - Terminal stubs return errors instead of fake success (prevents false agent reasoning) Config/schema: - Add version 9 and 10 to schema enum - Update TestParseExamples to allow harness-backed agents without model

- docs/configuration/agents/index.md: add harness: to schema reference, properties table, and new Harness-Backed Agents section with examples, permission policy, and known limitations - docs/configuration/overview/index.md: update current version from 8 to 10 - CHANGELOG.md: add Unreleased section for cross-harness orchestration feature

Bug 1: panic in AgentsInfo when harness agent has empty models slice - agent.Model() now returns nil for harness-backed agents instead of calling rand.Intn(0) - Added len(a.models)==0 guard for non-harness agents with no models - Added regression test TestHarnessAgentModelReturnsNil - Relaxed NewLocalRuntime check to allow harness root agents Bug 2: harness root agent rejected by NewLocalRuntime - Added runHarnessRoot() path in loop.go: when RunStream resolves a harness-backed agent, dispatch directly through the harness path instead of the model loop - Removed the HasHarness() error guard from NewLocalRuntime Bug 3: harness adapters not registered (init() never called) - Added pkg/harness/all/all.go with blank imports of all 5 adapters - Blank-imported pkg/harness/all from pkg/teamloader (always in binary dependency chain) Also: - Added ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY, GITHUB_TOKEN, NODE_PATH to env allowlist (harnesses need their API keys to function) - Removed version: "10" from examples/harness-team.yaml (examples must not pin a version per teamloader test convention)

- Update codex exec invocation: --ask-for-approval never + --sandbox <mode> replaced by --dangerously-bypass-approvals-and-sandbox (codex 0.130.0+) - Change --cd to -C for working directory flag - Add message-based error code inference for top-level error events (codex error events carry detail in message, not code field) 401/Unauthorized -> auth_failed, 429/rate_limit -> rate_limited - Update codex tests to match new flag format

Cost tracking: - Store harness run cost on the final assistant message (chat.Message.Cost) so OwnCost()/TotalCost() pick it up when parent walks sub-sessions - Store token counts on sub-session via SetUsage() so they persist through SubSessionCompletedEvent -> AddSubSession - Root harness path also attaches cost to the final message - Verified: cost appears in token_usage event AND in persisted sub-session AgentInfo from RunStart: - Add Model field to harness.RunStart so adapters can surface the model name - Claude adapter populates Model from system/init event - Codex adapter populates Model from thread.started event - translateSink emits AgentInfo(agentName, model) on RunStart so sidebar shows the harness model name immediately Codex adapter flags (v0.130.0): - Replace --ask-for-approval never + --sandbox <mode> with --dangerously-bypass-approvals-and-sandbox - Replace --cd with -C - Add message-based error code inference (401 -> auth_failed) - Update tests to match new flag format Adapter registration: - Add pkg/harness/all/all.go with blank imports of all 5 adapters - Blank-import from pkg/teamloader so adapters register in any binary Auth: - Add ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY, GITHUB_TOKEN, NODE_PATH to env allowlist in both claude and acp adapters UAT verified: - Config loads cleanly, dry-run works - Root harness agent: writes files, produces output, cost tracked - Subagent delegation: claude-coder and codex-coder both work - Cost: token_usage event fires with correct cost, persisted to sub-session - Error cases: missing binary, bad type, model+harness conflict, injection attempt, bad permission mode -- all give clear actionable errors - Config validation: sub_agents on harness agent rejected

…complete)

Claude Code streaming: - Add --include-partial-messages to default invocation - Parse stream_event wrapper, unwrap Anthropic SSE events - Emit TextStart/TextDelta/TextEnd per content_block_delta text_delta - Emit ReasoningStart/Delta/End per thinking_delta - Emit ToolCallStart/ArgsDelta/End from content_block events - Dedupe in translateAssistant: skip blocks already streamed by index - Config opt-out: include_partial_messages: false in harness.config - Update capabilities: TextDeltas: true, StreamingArgs: true - Add TestTranslateStreamPartialMessages with real fixture Codex cost display: - Add CostUnknown bool to harness.UsageSummary - Codex sets CostUnknown: true when cost_usd is absent - translateSink emits cost=-1 sentinel when CostUnknown - Sidebar: costUnknown flag + formatTotalCost() shows -- not $0.00 - Persisted session cost stays 0 (sentinel not written to store)

…corruption Codex (and potentially Claude Code) spawn bash subprocesses to execute shell commands. Without Setpgid, these children share docker-agent's process group and can interact with the terminal, corrupting TUI state. Setting SysProcAttr.Setpgid=true puts each harness subprocess in its own process group, isolating its children from the TUI's terminal.

docker-agent · 2026-05-14T16:52:49Z

❌ PR Review Failed — The review agent encountered an error and could not complete the review. View logs.

mcavage-docker added 24 commits May 13, 2026 11:00

gm: Phase 2 -- Codex CLI harness adapter

916141f

gm: Phase 2 -- OpenCode CLI harness adapter

0c3dbd4

gm: Phase 2 -- ACP adapter (Copilot + OpenClaw)

60f442e

gm: merge Phase 2 Codex adapter

f580bc8

gm: merge Phase 2 OpenCode adapter

ae90477

gm: merge Phase 2 ACP adapter

1b89bfc

gm: partial -- add stream_event types and state to claude adapter (in…

57c3cda

…complete)

gm: align pkg/harness types to github.com/rumpl/harness

79f4a6c

gm: rewrite claude adapter to implement rumpl/harness.Provider

7aea896

gm: rewrite codex adapter to implement rumpl/harness.Provider

b3f43bb

gm: fix replay/record.go for new harness.Event type

9305a19

gm: fix copilot + openclaw adapters for new harness types

33cd48c

gm: fix harness_delegation.go for new harness types

529c48e

gm: go mod tidy after adding github.com/rumpl/harness

5fafa39

mcavage-docker requested a review from a team as a code owner May 14, 2026 16:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross-harness orchestration: run Claude Code, Codex, OpenCode, Copilot, and OpenClaw as subagents#2796

Cross-harness orchestration: run Claude Code, Codex, OpenCode, Copilot, and OpenClaw as subagents#2796
mcavage-docker wants to merge 24 commits into
docker:mainfrom
mcavage-docker:gm/cross-harness-orchestration

mcavage-docker commented May 14, 2026

Uh oh!

docker-agent commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mcavage-docker commented May 14, 2026

Summary

What this does

Supported harnesses

Key details

Testing

Example

Uh oh!

docker-agent commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants