Skip to content

Foundational Harness Improvements#489

Open
bajajra wants to merge 8 commits into
mpfaffenberger:mainfrom
bajajra:main
Open

Foundational Harness Improvements#489
bajajra wants to merge 8 commits into
mpfaffenberger:mainfrom
bajajra:main

Conversation

@bajajra

@bajajra bajajra commented Jun 19, 2026

Copy link
Copy Markdown

Summary

This PR delivers seven harness-level improvements that move Code Puppy closer to the interaction and reliability while preserving its plugin-first architecture:

  1. Cache-stable prompt composition
  2. First-class project trust and core permissions
  3. Append-only tree-branching sessions
  4. Explicit outer agent-loop controller
  5. Language Server Protocol tools
  6. Typed, observable subagent task queues
  7. Managed background shell and agent tasks

Why this work matters

  • repeated prompts were difficult for providers to cache efficiently;
  • project-local Python could be imported before an explicit trust decision;
  • file and shell authorization depended too heavily on optional plugin behavior;
  • sessions had only a linear active history;
  • continuation decisions were embedded in runtime control flow;
  • code navigation relied on text search rather than semantic symbol identity;
  • parallel agents lacked a typed observable queue;
  • long-running shell and agent work lacked a unified background-task surface.

This PR strengthens those foundations without replacing the current Pydantic AI execution layer or the existing plugin system.


1. Cache-stable prompt composition

What changed

Prompt construction is now divided into an explicit stable prefix and dynamic suffix.

The stable prefix contains authored agent instructions and stable project rules. The dynamic suffix contains context that can change independently, including callback-provided runtime context, working directory information, permission guidance, memory recall, and per-agent identity.

Anthropic-compatible payloads split these sections into separate system blocks and place cache_control on the stable block. The internal boundary marker is removed before the request reaches the provider. Legacy prompts without a boundary retain the previous whole-system caching behavior.

Dynamic fragments are cached in-process and invalidated when conversation context is explicitly cleared or compacted, when the working directory changes, or when an agent invalidates its dynamic context.


2. First-class project trust and core permissions

What changed

Code Puppy now persists project trust decisions by canonical project path. Before importing project-local executable resources, the loader requires an explicit trust decision.

The trust boundary covers:

  • .code_puppy/plugins/
  • .code_puppy/agents/
  • .code_puppy/skills/

Unknown projects fail closed in non-interactive environments. Interactive sessions prompt before loading project code. Decisions are stored in ~/.code_puppy/trust.json with restricted permissions. The CODE_PUPPY_TRUST_PROJECT environment variable provides an explicit automation override.

The plugin adds:

  • /trust status
  • /trust project
  • /trust revoke

Core permission modes are now available independently of the existing file-permission plugin:

  • ask: approve both shell commands and file mutations
  • acceptEdits: allow file mutations, continue prompting for shell commands
  • auto: allow both shell commands and file mutations

New installations default to ask. Existing installations retain compatibility through the legacy yolo_mode fallback when no explicit permission_mode exists.


3. Append-only tree-branching sessions

What changed

The new tree-session plugin records conversation entries in append-only JSONL. Each entry has a unique ID, parent ID, serialized message payload, optional label, and integrity fingerprint. An active cursor selects the current conversation path.

The plugin synchronizes existing message history into the tree after agent runs and reconstructs active history by following parent pointers.

Commands:

  • /tree renders the current branch structure
  • /tree label <ID> <TEXT> labels a checkpoint
  • /fork <ID> moves the active conversation to an earlier entry and creates an in-place branch on the next turn
  • /fork <ID> <NAME> extracts the path through an entry into a separate session tree

The current pickle/session mechanisms remain available; this is an additive compatibility layer rather than a destructive storage migration.

Future scope opened

  • Interactive tree navigator and branch switching UI
  • Branch diffing and comparison summaries
  • Merge/cherry-pick semantics between conversation branches
  • Branch-aware token accounting
  • Named checkpoints and bookmarks
  • Export/import of selected branches
  • Visual integration with the status panel
  • DBOS workflow IDs associated with tree entries
  • Automatic branch creation before high-risk operations

Important boundary

/fork <ID> <NAME> creates the separate session tree but does not automatically switch the active CLI session. The implementation preserves message paths; it is not a Git replacement and does not independently snapshot filesystem state.

4. Explicit outer agent-loop controller

What changed

Continuation logic is represented by an explicit state machine:

  • CREATED
  • RUNNING
  • FOLLOW_UP
  • COMPLETED
  • CANCELLED
  • FAILED

Actions are explicit:

  • STEER
  • HOOK_RETRY
  • STOP

The controller records outer model calls, queued steering continuations, and plugin-requested retries. It validates legal state transitions and centralizes the priority rule that user steering is processed before automated hook retries.

The existing safety caps remain in place: queued steering is bounded and plugin retries respect max_hook_retries.

User improvement

This commit intentionally preserves the existing steering experience rather than introducing a new command. Users still press Ctrl+T, select now or queue, and provide steering text. The immediate benefit is reliability: continuation behavior is isolated, bounded, and independently testable rather than spread across local counters and nested runtime branches.

Future scope opened

  • /loop status with model calls, tool calls, retries, and state
  • Per-run model-call and tool-call budgets
  • Token, cost, and wall-clock deadlines
  • Pause/resume checkpoints
  • Durable loop restoration through DBOS
  • Per-tool quotas and circuit breakers
  • Visible continuation/retry reasons
  • Policy-driven stop conditions

Important boundary

Pydantic AI still owns the inner model/tool/model ReAct cycle. This controller owns the outer continuation decision after a Pydantic run. Individual tool calls are not yet counted, and the existing outer exception/cancellation machinery still handles several live failure paths.


5. Language Server Protocol tools

What changed

The LSP plugin implements an asynchronous stdio JSON-RPC client with Content-Length framing and the standard initialize/initialized/shutdown lifecycle. Servers are configured in ~/.code_puppy/lsp_servers.json, selected by file extension, started on demand, reused within a working directory, and shut down gracefully.

The plugin contributes five read-only semantic tools:

  • lsp_definition
  • lsp_references
  • lsp_hover
  • lsp_diagnostics
  • lsp_workspace_symbols

/lsp status reports configured server names. URI results are normalized into local paths, documents are opened before text-document requests, and published diagnostics are retained by document URI.

Tools are only advertised when at least one language server is configured, avoiding dead tool schemas for users who do not opt in.

User improvement

Code Puppy can now distinguish semantic symbol identity from textual name matches. This improves navigation and edit planning in repositories with aliases, overloads, inheritance, re-exports, or common symbol names.

It also reduces context usage: an exact definition, type signature, or reference list is usually smaller and more reliable than reading every grep match.

Future scope opened

  • Semantic rename
  • Code actions and quick fixes
  • Formatting and organize-imports operations
  • Call and type hierarchy
  • Workspace edits
  • Language-aware completion
  • Multi-root workspace support
  • Smarter per-language workspace-symbol routing
  • Diagnostic stabilization/wait policies
  • Automatic language-server discovery and installation guidance

Important boundary

Users must install and configure a language server, but they do not start it manually. The current transport is command-based stdio only. Results and quality depend on the selected server. The tools do not replace tests, and the current surface does not perform semantic refactors.


6. Typed, observable subagent task queues

What changed

The subagent-task plugin wraps the existing _invoke_agent_impl execution backend with typed Pydantic contracts and an observable queue.

A request includes:

  • agent name
  • prompt
  • optional session ID
  • optional model override
  • arbitrary metadata

Each task receives a UUID and transitions through:

  • queued
  • running
  • succeeded
  • failed
  • cancelled

Tools:

  • submit_agent_tasks
  • list_agent_tasks
  • wait_agent_tasks
  • cancel_agent_tasks

Batches use a bounded semaphore. max_parallel defaults to four and is clamped between one and thirty-two. Callers can wait for aggregation or request immediate task IDs and inspect the work later.

Records and structured results are persisted in ~/.code_puppy/subagent_tasks.json. Tasks found queued or running after a process restart are marked failed with an interruption reason rather than silently disappearing.

User improvement

Parallel-agent work becomes observable and controllable. The primary agent can submit independent investigations, limit concurrency, continue other work, wait for selected tasks, aggregate results, and diagnose failures from consistent records.

Future scope opened

  • Priority queues and dependencies/DAGs
  • Per-agent concurrency pools
  • Retry policies and backoff
  • Partial-result streaming
  • Task ownership and delegation graphs
  • Cost/token budgets per task
  • Cross-process workers
  • DBOS-backed resumption instead of interruption marking
  • UI panels for queue inspection
  • Typed domain-specific result schemas

Important boundary

Persistence currently preserves metadata and completed results, not active asyncio execution. A process restart marks unfinished tasks as failed. Cancellation is asynchronous, so callers may need to list or wait again to observe the terminal state.


7. Managed background shell and agent tasks

What changed

The background-task plugin adds a unified lifecycle for long-running shell commands and subagent work.

Request kinds:

  • shell: command, optional CWD, optional timeout
  • agent: agent name, prompt, optional session/model overrides

States:

  • queued
  • running
  • succeeded
  • failed
  • cancelled
  • interrupted

Tools:

  • start_background_task
  • list_background_tasks
  • wait_background_task
  • read_background_task_output
  • cancel_background_task

The manager returns a task ID immediately, persists metadata, writes durable output logs, captures process IDs, enforces timeouts, terminates cancelled subprocesses, and emits completion notifications through the callback/message system. Background shell starts pass through the new core permission boundary.

On startup, records that were queued or running are marked interrupted because process ownership cannot safely be assumed after restart.

User improvement

The agent can start a slow build, test suite, log follower, or independent investigation without blocking the foreground conversation. Users can ask for status, retrieve output, wait later, or cancel work that is no longer relevant.

Future scope opened

  • DBOS-backed resumable background execution
  • Reattachment to surviving subprocesses
  • Remote workers and distributed queues
  • Incremental output streaming and follow mode
  • Dependency graphs and scheduled jobs
  • Resource limits for CPU, memory, and output size
  • Retry policies
  • Per-project background-task dashboards
  • Artifact capture and result attachments

Important boundary

Current local background tasks persist metadata and logs but do not resume execution after the Code Puppy process exits. They are marked interrupted. This is distinct from the existing DBOS agent wrapper, which checkpoints supported agent interactions.


Compatibility and migration

  • Existing installations without an explicit permission_mode retain the legacy yolo_mode fallback.
  • New installations default to permission_mode=ask.
  • Existing linear/pickle session storage remains available; tree JSONL is additive.
  • Existing invoke_agent and invoke_agent_with_model tools remain available.
  • Existing steering UX remains unchanged.
  • LSP tools are absent unless a server configuration exists.
  • Background and task metadata use the existing Code Puppy state directory conventions.
  • Plugin files remain below the repository's 600-line limit.

Validation

Focused feature validation:

39 passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant