Foundational Harness Improvements by bajajra · Pull Request #489 · mpfaffenberger/code_puppy

bajajra · 2026-06-19T12:40:14Z

Summary

This PR delivers seven harness-level improvements that move Code Puppy closer to the interaction and reliability while preserving its plugin-first architecture:

Cache-stable prompt composition
First-class project trust and core permissions
Append-only tree-branching sessions
Explicit outer agent-loop controller
Language Server Protocol tools
Typed, observable subagent task queues
Managed background shell and agent tasks

Why this work matters

repeated prompts were difficult for providers to cache efficiently;
project-local Python could be imported before an explicit trust decision;
file and shell authorization depended too heavily on optional plugin behavior;
sessions had only a linear active history;
continuation decisions were embedded in runtime control flow;
code navigation relied on text search rather than semantic symbol identity;
parallel agents lacked a typed observable queue;
long-running shell and agent work lacked a unified background-task surface.

This PR strengthens those foundations without replacing the current Pydantic AI execution layer or the existing plugin system.

1. Cache-stable prompt composition

What changed

Prompt construction is now divided into an explicit stable prefix and dynamic suffix.

The stable prefix contains authored agent instructions and stable project rules. The dynamic suffix contains context that can change independently, including callback-provided runtime context, working directory information, permission guidance, memory recall, and per-agent identity.

Anthropic-compatible payloads split these sections into separate system blocks and place cache_control on the stable block. The internal boundary marker is removed before the request reaches the provider. Legacy prompts without a boundary retain the previous whole-system caching behavior.

Dynamic fragments are cached in-process and invalidated when conversation context is explicitly cleared or compacted, when the working directory changes, or when an agent invalidates its dynamic context.

2. First-class project trust and core permissions

What changed

Code Puppy now persists project trust decisions by canonical project path. Before importing project-local executable resources, the loader requires an explicit trust decision.

The trust boundary covers:

.code_puppy/plugins/
.code_puppy/agents/
.code_puppy/skills/

Unknown projects fail closed in non-interactive environments. Interactive sessions prompt before loading project code. Decisions are stored in ~/.code_puppy/trust.json with restricted permissions. The CODE_PUPPY_TRUST_PROJECT environment variable provides an explicit automation override.

The plugin adds:

/trust status
/trust project
/trust revoke

Core permission modes are now available independently of the existing file-permission plugin:

ask: approve both shell commands and file mutations
acceptEdits: allow file mutations, continue prompting for shell commands
auto: allow both shell commands and file mutations

New installations default to ask. Existing installations retain compatibility through the legacy yolo_mode fallback when no explicit permission_mode exists.

3. Append-only tree-branching sessions

What changed

The new tree-session plugin records conversation entries in append-only JSONL. Each entry has a unique ID, parent ID, serialized message payload, optional label, and integrity fingerprint. An active cursor selects the current conversation path.

The plugin synchronizes existing message history into the tree after agent runs and reconstructs active history by following parent pointers.

Commands:

/tree renders the current branch structure
/tree label <ID> <TEXT> labels a checkpoint
/fork <ID> moves the active conversation to an earlier entry and creates an in-place branch on the next turn
/fork <ID> <NAME> extracts the path through an entry into a separate session tree

The current pickle/session mechanisms remain available; this is an additive compatibility layer rather than a destructive storage migration.

Future scope opened

Interactive tree navigator and branch switching UI
Branch diffing and comparison summaries
Merge/cherry-pick semantics between conversation branches
Branch-aware token accounting
Named checkpoints and bookmarks
Export/import of selected branches
Visual integration with the status panel
DBOS workflow IDs associated with tree entries
Automatic branch creation before high-risk operations

Important boundary

/fork <ID> <NAME> creates the separate session tree but does not automatically switch the active CLI session. The implementation preserves message paths; it is not a Git replacement and does not independently snapshot filesystem state.

4. Explicit outer agent-loop controller

What changed

Continuation logic is represented by an explicit state machine:

CREATED
RUNNING
FOLLOW_UP
COMPLETED
CANCELLED
FAILED

Actions are explicit:

STEER
HOOK_RETRY
STOP

The controller records outer model calls, queued steering continuations, and plugin-requested retries. It validates legal state transitions and centralizes the priority rule that user steering is processed before automated hook retries.

The existing safety caps remain in place: queued steering is bounded and plugin retries respect max_hook_retries.

User improvement

This commit intentionally preserves the existing steering experience rather than introducing a new command. Users still press Ctrl+T, select now or queue, and provide steering text. The immediate benefit is reliability: continuation behavior is isolated, bounded, and independently testable rather than spread across local counters and nested runtime branches.

Future scope opened

/loop status with model calls, tool calls, retries, and state
Per-run model-call and tool-call budgets
Token, cost, and wall-clock deadlines
Pause/resume checkpoints
Durable loop restoration through DBOS
Per-tool quotas and circuit breakers
Visible continuation/retry reasons
Policy-driven stop conditions

Important boundary

Pydantic AI still owns the inner model/tool/model ReAct cycle. This controller owns the outer continuation decision after a Pydantic run. Individual tool calls are not yet counted, and the existing outer exception/cancellation machinery still handles several live failure paths.

5. Language Server Protocol tools

What changed

The LSP plugin implements an asynchronous stdio JSON-RPC client with Content-Length framing and the standard initialize/initialized/shutdown lifecycle. Servers are configured in ~/.code_puppy/lsp_servers.json, selected by file extension, started on demand, reused within a working directory, and shut down gracefully.

The plugin contributes five read-only semantic tools:

lsp_definition
lsp_references
lsp_hover
lsp_diagnostics
lsp_workspace_symbols

/lsp status reports configured server names. URI results are normalized into local paths, documents are opened before text-document requests, and published diagnostics are retained by document URI.

Tools are only advertised when at least one language server is configured, avoiding dead tool schemas for users who do not opt in.

User improvement

Code Puppy can now distinguish semantic symbol identity from textual name matches. This improves navigation and edit planning in repositories with aliases, overloads, inheritance, re-exports, or common symbol names.

It also reduces context usage: an exact definition, type signature, or reference list is usually smaller and more reliable than reading every grep match.

Future scope opened

Semantic rename
Code actions and quick fixes
Formatting and organize-imports operations
Call and type hierarchy
Workspace edits
Language-aware completion
Multi-root workspace support
Smarter per-language workspace-symbol routing
Diagnostic stabilization/wait policies
Automatic language-server discovery and installation guidance

Important boundary

Users must install and configure a language server, but they do not start it manually. The current transport is command-based stdio only. Results and quality depend on the selected server. The tools do not replace tests, and the current surface does not perform semantic refactors.

6. Typed, observable subagent task queues

What changed

The subagent-task plugin wraps the existing _invoke_agent_impl execution backend with typed Pydantic contracts and an observable queue.

A request includes:

agent name
prompt
optional session ID
optional model override
arbitrary metadata

Each task receives a UUID and transitions through:

queued
running
succeeded
failed
cancelled

Tools:

submit_agent_tasks
list_agent_tasks
wait_agent_tasks
cancel_agent_tasks

Batches use a bounded semaphore. max_parallel defaults to four and is clamped between one and thirty-two. Callers can wait for aggregation or request immediate task IDs and inspect the work later.

Records and structured results are persisted in ~/.code_puppy/subagent_tasks.json. Tasks found queued or running after a process restart are marked failed with an interruption reason rather than silently disappearing.

User improvement

Parallel-agent work becomes observable and controllable. The primary agent can submit independent investigations, limit concurrency, continue other work, wait for selected tasks, aggregate results, and diagnose failures from consistent records.

Future scope opened

Priority queues and dependencies/DAGs
Per-agent concurrency pools
Retry policies and backoff
Partial-result streaming
Task ownership and delegation graphs
Cost/token budgets per task
Cross-process workers
DBOS-backed resumption instead of interruption marking
UI panels for queue inspection
Typed domain-specific result schemas

Important boundary

Persistence currently preserves metadata and completed results, not active asyncio execution. A process restart marks unfinished tasks as failed. Cancellation is asynchronous, so callers may need to list or wait again to observe the terminal state.

7. Managed background shell and agent tasks

What changed

The background-task plugin adds a unified lifecycle for long-running shell commands and subagent work.

Request kinds:

shell: command, optional CWD, optional timeout
agent: agent name, prompt, optional session/model overrides

States:

queued
running
succeeded
failed
cancelled
interrupted

Tools:

start_background_task
list_background_tasks
wait_background_task
read_background_task_output
cancel_background_task

The manager returns a task ID immediately, persists metadata, writes durable output logs, captures process IDs, enforces timeouts, terminates cancelled subprocesses, and emits completion notifications through the callback/message system. Background shell starts pass through the new core permission boundary.

On startup, records that were queued or running are marked interrupted because process ownership cannot safely be assumed after restart.

User improvement

The agent can start a slow build, test suite, log follower, or independent investigation without blocking the foreground conversation. Users can ask for status, retrieve output, wait later, or cancel work that is no longer relevant.

Future scope opened

DBOS-backed resumable background execution
Reattachment to surviving subprocesses
Remote workers and distributed queues
Incremental output streaming and follow mode
Dependency graphs and scheduled jobs
Resource limits for CPU, memory, and output size
Retry policies
Per-project background-task dashboards
Artifact capture and result attachments

Important boundary

Current local background tasks persist metadata and logs but do not resume execution after the Code Puppy process exits. They are marked interrupted. This is distinct from the existing DBOS agent wrapper, which checkpoints supported agent interactions.

Compatibility and migration

Existing installations without an explicit permission_mode retain the legacy yolo_mode fallback.
New installations default to permission_mode=ask.
Existing linear/pickle session storage remains available; tree JSONL is additive.
Existing invoke_agent and invoke_agent_with_model tools remain available.
Existing steering UX remains unchanged.
LSP tools are absent unless a server configuration exists.
Background and task metadata use the existing Code Puppy state directory conventions.
Plugin files remain below the repository's 600-line limit.

Validation

Focused feature validation:

39 passed

add coding-agent parity foundations

bajajra added 8 commits June 18, 2026 21:22

feat: add cache-stable prompt composition

67c3304

feat: enforce project trust and permissions

1ecc4be

feat: add branching session trees

1749877

refactor: add explicit agent loop controller

e6d3c55

feat: add language server tools

634fc0a

feat: add typed subagent task queue

90cae10

feat: add managed background tasks

570325a

Merge pull request #1 from bajajra/codex/roadmap-parity

bfebf83

add coding-agent parity foundations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Foundational Harness Improvements#489

Foundational Harness Improvements#489
bajajra wants to merge 8 commits into
mpfaffenberger:mainfrom
bajajra:main

bajajra commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bajajra commented Jun 19, 2026

Summary

Why this work matters

1. Cache-stable prompt composition

What changed

2. First-class project trust and core permissions

What changed

3. Append-only tree-branching sessions

What changed

Future scope opened

Important boundary

4. Explicit outer agent-loop controller

What changed

User improvement

Future scope opened

Important boundary

5. Language Server Protocol tools

What changed

User improvement

Future scope opened

Important boundary

6. Typed, observable subagent task queues

What changed

User improvement

Future scope opened

Important boundary

7. Managed background shell and agent tasks

What changed

User improvement

Future scope opened

Important boundary

Compatibility and migration

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant