Foundational Harness Improvements#489
Open
bajajra wants to merge 8 commits into
Open
Conversation
add coding-agent parity foundations
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR delivers seven harness-level improvements that move Code Puppy closer to the interaction and reliability while preserving its plugin-first architecture:
Why this work matters
This PR strengthens those foundations without replacing the current Pydantic AI execution layer or the existing plugin system.
1. Cache-stable prompt composition
What changed
Prompt construction is now divided into an explicit stable prefix and dynamic suffix.
The stable prefix contains authored agent instructions and stable project rules. The dynamic suffix contains context that can change independently, including callback-provided runtime context, working directory information, permission guidance, memory recall, and per-agent identity.
Anthropic-compatible payloads split these sections into separate system blocks and place
cache_controlon the stable block. The internal boundary marker is removed before the request reaches the provider. Legacy prompts without a boundary retain the previous whole-system caching behavior.Dynamic fragments are cached in-process and invalidated when conversation context is explicitly cleared or compacted, when the working directory changes, or when an agent invalidates its dynamic context.
2. First-class project trust and core permissions
What changed
Code Puppy now persists project trust decisions by canonical project path. Before importing project-local executable resources, the loader requires an explicit trust decision.
The trust boundary covers:
.code_puppy/plugins/.code_puppy/agents/.code_puppy/skills/Unknown projects fail closed in non-interactive environments. Interactive sessions prompt before loading project code. Decisions are stored in
~/.code_puppy/trust.jsonwith restricted permissions. TheCODE_PUPPY_TRUST_PROJECTenvironment variable provides an explicit automation override.The plugin adds:
/trust status/trust project/trust revokeCore permission modes are now available independently of the existing file-permission plugin:
ask: approve both shell commands and file mutationsacceptEdits: allow file mutations, continue prompting for shell commandsauto: allow both shell commands and file mutationsNew installations default to
ask. Existing installations retain compatibility through the legacyyolo_modefallback when no explicitpermission_modeexists.3. Append-only tree-branching sessions
What changed
The new tree-session plugin records conversation entries in append-only JSONL. Each entry has a unique ID, parent ID, serialized message payload, optional label, and integrity fingerprint. An active cursor selects the current conversation path.
The plugin synchronizes existing message history into the tree after agent runs and reconstructs active history by following parent pointers.
Commands:
/treerenders the current branch structure/tree label <ID> <TEXT>labels a checkpoint/fork <ID>moves the active conversation to an earlier entry and creates an in-place branch on the next turn/fork <ID> <NAME>extracts the path through an entry into a separate session treeThe current pickle/session mechanisms remain available; this is an additive compatibility layer rather than a destructive storage migration.
Future scope opened
Important boundary
/fork <ID> <NAME>creates the separate session tree but does not automatically switch the active CLI session. The implementation preserves message paths; it is not a Git replacement and does not independently snapshot filesystem state.4. Explicit outer agent-loop controller
What changed
Continuation logic is represented by an explicit state machine:
CREATEDRUNNINGFOLLOW_UPCOMPLETEDCANCELLEDFAILEDActions are explicit:
STEERHOOK_RETRYSTOPThe controller records outer model calls, queued steering continuations, and plugin-requested retries. It validates legal state transitions and centralizes the priority rule that user steering is processed before automated hook retries.
The existing safety caps remain in place: queued steering is bounded and plugin retries respect
max_hook_retries.User improvement
This commit intentionally preserves the existing steering experience rather than introducing a new command. Users still press
Ctrl+T, selectnoworqueue, and provide steering text. The immediate benefit is reliability: continuation behavior is isolated, bounded, and independently testable rather than spread across local counters and nested runtime branches.Future scope opened
/loop statuswith model calls, tool calls, retries, and stateImportant boundary
Pydantic AI still owns the inner model/tool/model ReAct cycle. This controller owns the outer continuation decision after a Pydantic run. Individual tool calls are not yet counted, and the existing outer exception/cancellation machinery still handles several live failure paths.
5. Language Server Protocol tools
What changed
The LSP plugin implements an asynchronous stdio JSON-RPC client with Content-Length framing and the standard initialize/initialized/shutdown lifecycle. Servers are configured in
~/.code_puppy/lsp_servers.json, selected by file extension, started on demand, reused within a working directory, and shut down gracefully.The plugin contributes five read-only semantic tools:
lsp_definitionlsp_referenceslsp_hoverlsp_diagnosticslsp_workspace_symbols/lsp statusreports configured server names. URI results are normalized into local paths, documents are opened before text-document requests, and published diagnostics are retained by document URI.Tools are only advertised when at least one language server is configured, avoiding dead tool schemas for users who do not opt in.
User improvement
Code Puppy can now distinguish semantic symbol identity from textual name matches. This improves navigation and edit planning in repositories with aliases, overloads, inheritance, re-exports, or common symbol names.
It also reduces context usage: an exact definition, type signature, or reference list is usually smaller and more reliable than reading every grep match.
Future scope opened
Important boundary
Users must install and configure a language server, but they do not start it manually. The current transport is command-based stdio only. Results and quality depend on the selected server. The tools do not replace tests, and the current surface does not perform semantic refactors.
6. Typed, observable subagent task queues
What changed
The subagent-task plugin wraps the existing
_invoke_agent_implexecution backend with typed Pydantic contracts and an observable queue.A request includes:
Each task receives a UUID and transitions through:
queuedrunningsucceededfailedcancelledTools:
submit_agent_taskslist_agent_taskswait_agent_taskscancel_agent_tasksBatches use a bounded semaphore.
max_paralleldefaults to four and is clamped between one and thirty-two. Callers can wait for aggregation or request immediate task IDs and inspect the work later.Records and structured results are persisted in
~/.code_puppy/subagent_tasks.json. Tasks found queued or running after a process restart are marked failed with an interruption reason rather than silently disappearing.User improvement
Parallel-agent work becomes observable and controllable. The primary agent can submit independent investigations, limit concurrency, continue other work, wait for selected tasks, aggregate results, and diagnose failures from consistent records.
Future scope opened
Important boundary
Persistence currently preserves metadata and completed results, not active asyncio execution. A process restart marks unfinished tasks as failed. Cancellation is asynchronous, so callers may need to list or wait again to observe the terminal state.
7. Managed background shell and agent tasks
What changed
The background-task plugin adds a unified lifecycle for long-running shell commands and subagent work.
Request kinds:
shell: command, optional CWD, optional timeoutagent: agent name, prompt, optional session/model overridesStates:
queuedrunningsucceededfailedcancelledinterruptedTools:
start_background_tasklist_background_taskswait_background_taskread_background_task_outputcancel_background_taskThe manager returns a task ID immediately, persists metadata, writes durable output logs, captures process IDs, enforces timeouts, terminates cancelled subprocesses, and emits completion notifications through the callback/message system. Background shell starts pass through the new core permission boundary.
On startup, records that were queued or running are marked
interruptedbecause process ownership cannot safely be assumed after restart.User improvement
The agent can start a slow build, test suite, log follower, or independent investigation without blocking the foreground conversation. Users can ask for status, retrieve output, wait later, or cancel work that is no longer relevant.
Future scope opened
Important boundary
Current local background tasks persist metadata and logs but do not resume execution after the Code Puppy process exits. They are marked interrupted. This is distinct from the existing DBOS agent wrapper, which checkpoints supported agent interactions.
Compatibility and migration
permission_moderetain the legacyyolo_modefallback.permission_mode=ask.invoke_agentandinvoke_agent_with_modeltools remain available.Validation
Focused feature validation: