FEAT-011 (1/?): spec, plan, US1 MVP for app.* socket namespace by brettheap · Pull Request #19 · opensoft/AgentTower

brettheap · 2026-05-19T14:39:11Z

Summary

Adds the FEAT-011 spec (3 clarify rounds, 12 binding decisions on Session 2026-05-19), plan, research, data-model, 3 contract files, quickstart, task list (94 tasks), and 24 requirements-quality checklists for the local app.* socket namespace — a host-only, versioned façade over the existing FEAT-002..FEAT-010 service layer for a future packaged desktop control panel.
Implements the US1 bootstrap chain end-to-end: app.preflight → app.hello → app.readiness → app.dashboard are all callable through the existing FEAT-002 dispatcher. New sub-package src/agenttower/app_contract/ (11 modules, zero changes to existing FEAT-002..010 modules except a 9-line dispatcher merge).
13 of 94 tasks complete (~14%). Remaining work (US2 adopt, US3 operator actions, US4 degraded states, US5 version drift, polish + proper contract/integration tests under tests/contract/ and tests/integration/) is explicitly deferred — see tasks.md for the full list.

What's in the four commits

Commit	Scope
`247076c`	Spec (FR-001..FR-045 + 11 sub-FRs) + 16 round-1 requirements-quality checklists
`2f3bf75`	Clarify rounds 2 + 3 (added FR-014a hints, FR-021a/b orderings + direction syntax, FR-029a/b update + log.detach, FR-030c scan.status, FR-034a details registry, FR-042 host-only, FR-003a payload caps, FR-016a degraded_scan, FR-020b cursor format, FR-034b unknown_method, FR-024a filter exact-match) + plan + research + data-model + contracts/{app-methods,error-codes,closed-sets}.md + quickstart + tasks.md (94 tasks) + analyze remediation closing 4 actionable findings + 17 new SCs (SC-011..SC-027) + 7 new checklists
`aa13224`	Foundational scaffolding: `app_contract/` package skeleton with `versioning`, `errors` (26-entry FR-034 closed set + FR-034a `details` registry), `envelope`, `sessions`, `host_only`, `preflight`, `hello`, `dispatcher` modules; DISPATCH merge in `socket_api/methods.py`; 23 smoke tests
`8cb0c81`	US1 visible half: `readiness` (6 subsystem probes + hint emission + state aggregation), `dashboard` (counts across 7 surfaces + recents + hints), `view_models` (compact builders); 19 more smoke tests; 4 dispatch lock-in tests updated for the new 39-entry table

What works end-to-end today

The full Story 1 bootstrap chain is functional:

Method	What it returns	FRs verified
`app.preflight`	Diagnostic envelope, no session required, host-only gate	FR-011, FR-042
`app.hello`	Token, IDs, daemon/schema/contract versions, supported_minor_range, host_user_id, `capability_flags = {}`, state=ok; major-mismatch guard with both versions in `details`	FR-010, FR-036, FR-039, FR-042
`app.readiness`	State ∈ {ready, degraded, unavailable} + 6 subsystem rows (docker, tmux_discovery, sqlite, jsonl, routing_worker, log_attachment_workers) + `hints[]` array	FR-012, FR-013, FR-014, FR-014a, FR-045
`app.dashboard`	Counts across 7 surfaces (containers, panes, agents, log_attachments, events, queue, routes) + recents (events/queue/routes) + `hints[]`	FR-015, FR-016, FR-017, FR-018, FR-045

Hints: closed v1.0 registry of 6 codes emitted based on state (docker_unavailable_hint, start_bench_container, register_first_agent, attach_logs, enable_first_route); severity ∈ {info, warning, action_required}.

What's deferred to follow-up PRs

US2 Adopt (12 tasks): app.scan.{containers,panes,status}, app.pane.{list,detail}, app.agent.{list,detail,register_from_pane}, plus the ScanRegistry + IdempotencyStore.
US3 Operator actions (23 tasks): full read surfaces + mutations (app.agent.update, app.log.attach/detach, app.send_input with idempotency, app.queue.{approve,delay,cancel}, app.route.{add,remove,update}), shared pagination/ordering/filtering plumbing.
US4 Degraded states (6 tasks), US5 Version drift (7 tasks), Polish (16 tasks: SC-001/003/006/008/010/023/025 dedicated tests, unit tests, docs, final validation).
T030 SC-002 ≤500ms latency benchmark for app.dashboard — not measured in this PR.
Proper tests/contract/ and tests/integration/ layout — the 42 tests in this PR live under tests/unit/ as a smoke suite covering envelope shape, dispatcher wiring, all 4 handlers, and the FR-034/FR-034a closed-set invariants. Splitting them into the structured layout from plan.md §Project Structure is a follow-up.
Full view models for all 7 entities — only compact builders for events/queue/routes (needed by app.dashboard recents) shipped here. The full PaneViewModel / AgentViewModel / etc. land with US2/US3.
app.scan.status retention + IdempotencyStore — modules exist as plan artifacts only; no implementation yet.
JSONL audit emission for app-driven mutations (FR-044) — not exercised yet because no mutation handlers exist; will land with US2/US3.

Constitution check

Principle	Status	Evidence
I. Local-First Host Control	✅	FR-003 forbids any non-Unix listener; FR-042 makes `app.*` host-only; no new persisted state (sessions / scan registry / idempotency are all in-memory per FR-006 / FR-030c / FR-031a)
II. Container-First MVP	✅	FEAT-011 is post-MVP per the brief but still container-targeted (US2 adopt workflow promotes tmux panes inside bench containers)
III. Safe Terminal Input	✅	`app.send_input` (US3, not in this PR) will ride the FEAT-009 queue + permission gate per FR-031; not exercised in this PR's code
IV. Observable and Scriptable	✅	Legacy CLI methods unchanged (FR-002); `origin = "app"` audit attribution planned via FR-044 / SC-008; smoke suite asserts `app.dashboard` is side-effect-free (FR-045)
V. Conservative Automation	✅	Façade only; no new workflow decisions; FR-004 same service layer

No complexity-tracking entries; no constitution conflicts.

Test plan

New smoke suite passes — PYTHONPATH=src pytest tests/unit/test_app_contract_smoke.py -v should report 42 / 42 passed. Covers FR-001/002/010/011/012/013/014a/015/016/017/018/033/034/034a/036/039/042/045 invariants.
Dispatch lock-in tests updated and passing — PYTHONPATH=src pytest tests/unit/test_dispatch_table_stability.py tests/unit/test_socket_api_methods.py tests/integration/test_feat005_no_real_container.py -v. DISPATCH is now 39 entries (was 35); EXPECTED_ORDER has 4 new app.* keys appended.
No FEAT-001..010 regressions caused by this PR. Three pre-existing failures (documented in commit 8cb0c81) are NOT introduced by FEAT-011: test_status_default_output_six_lines (FEAT-008/009/010 added status fields; test expects 6 lines, daemon emits 14), test_backcompat_status_json_keeps_feat002_through_008_keys (asserts schema_version 7, FEAT-010 bumped to 8), test_cli_scan_panes_inactive_cascade.test_inactive_container_with_only_inactive_prior_panes_still_touches_them (pre-existing flaky millisecond-timestamp race). These deserve a separate cleanup PR.
Manual handshake smoke: with a running daemon, a synthetic NDJSON socket client should be able to send {"method": "app.preflight"} → success envelope; {"method": "app.hello"} → success envelope with capability_flags = {} and app_contract_version = "1.0"; {"method": "app.readiness", "app_session_token": "<token>"} → state envelope with 6 subsystem rows; {"method": "app.dashboard", "app_session_token": "<token>"} → counts/recents/hints envelope. No subprocess invocation required (SC-001 invariant).
Closed-set / regex invariants — 26 closed-set codes, all matching ^[a-z][a-z0-9_]*$; FR-034a per-code details registry validated by validate_details() (12 codes with structured details).
Spec ↔ implementation coverage — every implemented FR is cited in commit messages and code comments; the 13 ticked tasks in tasks.md are the authoritative scope list for this PR.

Reviewer guide

Where to look in this PR:

To review…	Read…
Contract shape decisions	`specs/011-app-backend-contract/spec.md` (FRs + SCs + Clarifications Session 2026-05-19)
Architecture	`specs/011-app-backend-contract/plan.md`
Per-method wire format	`specs/011-app-backend-contract/contracts/app-methods.md`
Closed-set codes + per-code details	`specs/011-app-backend-contract/contracts/error-codes.md`
All other closed enumerations	`specs/011-app-backend-contract/contracts/closed-sets.md`
Implementation map	`specs/011-app-backend-contract/tasks.md` (94 tasks; 13 ticked [X])
Dispatcher merge (the one existing-module touch)	`src/agenttower/socket_api/methods.py` (last 9 lines)
Foundational primitives	`src/agenttower/app_contract/{versioning,errors,envelope,sessions,host_only}.py`
Handlers	`src/agenttower/app_contract/{preflight,hello,readiness,dashboard}.py`
Tests	`tests/unit/test_app_contract_smoke.py` (42 tests)

🤖 Generated with Claude Code

Summary by Sourcery

Introduce the initial FEAT-011 app.* socket namespace contract (specs and in-memory model) and wire its bootstrap/dashboard readiness handlers into the existing dispatcher while extending tests and documentation for the expanded dispatch table and new app contract surface.

New Features:

Add a host-only app.* socket namespace with app.preflight, app.hello, app.readiness, and app.dashboard handlers as a façade over the existing service layer.
Define the FEAT-011 app contract versioning, closed-set enums, error-code registry, and in-memory session/scan/idempotency primitives for the new app backend contract.

Enhancements:

Document the full FEAT-011 app backend contract, including method surfaces, closed sets, data model, tasks, and quickstart walkthrough in the specs tree.
Extend the socket API dispatcher to merge the new app.* methods into the existing dispatch table without altering legacy behavior.

Documentation:

Add comprehensive FEAT-011 specification, plan, research notes, data model, contracts (methods, error codes, closed sets), quickstart, and quality checklists to describe the new app backend contract and its evolution rules.

Tests:

Introduce a FEAT-011 smoke/unit test suite for the foundational app_contract slice covering envelopes, error invariants, host-only gating, sessions, readiness, dashboard, and versioning behavior.
Update dispatch table stability and integration tests to account for the four new app.* methods and the increased dispatch table size and ordering guarantees.

Establishes the local app backend contract for a packaged desktop control panel. Adds the `app.*` socket namespace as a façade over the existing FEAT-002..010 service layer — local-only, opaque per-process session tokens, MAJOR.MINOR contract versioning, and adopt-existing-panes as the only path from discovered pane to registered agent (managed session creation is deferred to FEAT-013). Clarify session (2026-05-19) locks five contract decisions: list-pagination default 50 / cap 200, last-write-wins on entity updates, `idempotency_key` on `app.send_input` only, empty `capability_flags` at v1.0, and a 30s hard cap on `app.scan.*` `wait=true` with `scan_timeout` plus `scan_id`. Generates 16 domain-specific requirements-quality checklists (434 items total) covering contract surface, sessions, bootstrap, readiness, dashboard, read surfaces, adopt, mutations, errors, versioning, security, observability, performance, concurrency/reliability, scope/assumptions, and success criteria measurability. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…tion Adds Phase 0 + Phase 1 design artifacts for FEAT-011 (Local App Backend Contract). Includes: - Two more `/speckit.clarify` rounds (12 binding decisions on Session 2026-05-19) tightening: `app.scan.status` shape, `hints[]` contract, `error.details` registry, state/role_priority orderings, `app.*` host-only namespace (FR-042 rewrite), order_by direction syntax, `agent.update` clearable fields, `log.detach` idempotency, filter exact-match at v1.0, removal of `scan_state.expired`, and 1 MiB/8 MiB payload caps. - `/speckit.plan` output: plan.md (~140 lines), research.md (10 decisions), data-model.md (entities + view models + closed sets), contracts/ (app-methods, error-codes, closed-sets), quickstart.md. - 7 new requirements-quality checklists covering plan/research/ data-model/contracts/quickstart quality and cross-artifact consistency + clarify propagation. - `/speckit.tasks` output: tasks.md with 94 dependency-ordered tasks across 8 phases (Setup, Foundational, US1..US5, Polish). - Gap-fix sweep adding FR-016a (degraded_scan), FR-020b (cursor_next format), FR-034b (unknown_method), and 17 new measurable success criteria (SC-011..SC-027) covering every FR added during the three clarify rounds. - `/speckit.analyze` remediation: closed all 4 actionable findings (added T084 SC-023 payload-caps test, T085 SC-025 cursor opacity test, updated T083 for SC-003+SC-021, extended T094 with FR-043 schema diff). Final SC coverage is 27/27. CLAUDE.md pointer updated to FEAT-011 plan. No source code changes; this is design/spec work only. Implementation follows in subsequent commits per the MVP-first incremental strategy documented in tasks.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Lands the foundational layer of the FEAT-011 app backend contract: the `app.*` socket namespace exists end-to-end, `app.preflight` and `app.hello` are callable, and the closed-set / envelope / host-only infrastructure every downstream handler depends on is in place. Tasks completed (per tasks.md): T002 Dispatcher wiring — merges APP_DISPATCH into FEAT-002's DISPATCH at the end of socket_api/methods.py (the only existing-module touch in this commit, per FR-002 / FR-004). T004 app_contract.versioning — APP_CONTRACT_VERSION = "1.0", SUPPORTED_MINOR_RANGE, parse_major_minor, is_major_compatible, capability_flags = {}, all closed-set enums (readiness state, subsystem status, subsystem names, hint severity, hint codes, container state, agent roles, role_priority, queue states, state_priority, scan state, scan kind, mutation origin). T005 app_contract.errors — 26-entry FR-034 closed code set; per-code FR-034a `details` registry; ContractViolation + validate_details() guard for malformed failures; FR-034 regex enforcement. T006 Closed-set enums (delivered in versioning.py per task note). T007 app_contract.envelope — success() / failure() / internal_error() builders stamping app_contract_version and validating details. T008 app_contract.host_only — is_host_peer() wrapper around the existing FEAT-002 `_peer_is_host_process` + `_request_peer_pid` per FR-042 "MUST reuse the same mechanism". T010 app_contract.sessions — in-memory SessionRegistry with uuid v4 hex tokens (36 chars), monotonic app_session_id, get/create/ invalidate, test seam (set_registry). T024 app_contract.preflight — app.preflight handler (no session, host-only gate, FR-011 success envelope). T025 app_contract.hello — app.hello handler (FR-010 full field set, FR-036 major-mismatch guard with both versions in details, client_id/client_version/client_app_contract_major validation with field+reason per FR-034a). Plus a 23-test smoke suite (tests/unit/test_app_contract_smoke.py) that exercises: - DISPATCH wiring (FR-001) and legacy-method preservation (FR-002). - FR-034 regex + 26-entry count + DETAILS_REQUIRED_KEYS membership. - Envelope shape invariants (FR-033) — success result-defaults-to-{}, failure rejects unknown codes / non-object details / missing required keys (ContractViolation surfaces). - app.preflight + app.hello happy/major-mismatch/validation/host-only paths. - Session token uniqueness and id monotonicity. - Versioning parse/compatibility helpers. Honest scope (the user asked for "Phases 1+2+3 = 30 tasks MVP"; this commit delivers 9 of those 30): - T001 (full skeleton, 17 modules) is partial — 9 modules created, 8 deferred (readiness, dashboard, reads, mutations, scans, idempotency, view_models, audit). - T003 (mypy/ruff config) — deferred; new sub-package picks up the existing pyproject.toml settings. - T009 (1 MiB payload gate) — deferred; FEAT-002's existing 64 KiB request cap binds first, so FR-003a's 1 MiB ceiling never effectively triggers at MVP scale. Future tightening will need to modify server.py MAX_REQUEST_BYTES. - T011, T012, T013 (IdempotencyStore, ScanRegistry, JSONL audit helper) — needed for US2/US3, not US1 happy path. - T014 (full dispatcher gate chain) — partial; host-only and session gates currently live inside each handler, not at a central dispatcher entry. Sufficient for two handlers; centralization is a follow-up. - T015 (view_models for all 7 entities) — deferred. - T016-T018 (test fixtures as separate files) — fixtures currently inline in the smoke test. - T019-T023 (proper contract / integration tests under tests/contract/ and tests/integration/) — deferred; smoke coverage stands in. - T026-T030 (subsystem probes, app.readiness, app.dashboard, hint emission, SC-002 benchmark) — deferred. US1 is therefore NOT yet user-visibly delivered; only the handshake half of US1 works. The pre-existing FEAT-001..010 test suite runs 313 passed / 1 skipped / 1 failed; the one failure is a pre-existing flaky timestamp-ordering test in tests/integration/test_cli_scan_panes_inactive_cascade.py (asserts `last_scanned > last_scanned_before` and gets two same-scan- millisecond timestamps out of order). The failure is unrelated to this commit's 6-line dispatcher merge. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Lands the visible half of User Story 1: app.readiness probes all 6 subsystems and aggregates state per FR-012/FR-014, and app.dashboard returns counts + recents + structured hints across all 7 surfaces per FR-015..FR-018. Combined with the prior commit's app.preflight + app.hello (commit aa13224), the full US1 bootstrap chain preflight → hello → readiness → dashboard is callable end-to-end. Tasks completed in this commit: T026 Subsystem probes (app_contract/readiness.py) — 6 probes: docker (via discovery_service.list_containers), tmux_discovery (via pane_service presence), sqlite (SELECT 1 probe), jsonl (events_file stat + parent dir check), routing_worker (delivery_worker thread liveness), log_attachment_workers (log_service presence). Each returns a SubsystemRow with status ∈ {ok, degraded, unavailable} per FR-012. T027 Hint registry + emission (app_contract/readiness.py): emit_hints() maps current state (subsystem statuses + 5 summary counts) to the closed v1.0 hint registry (FR-014a): docker_unavailable_hint, start_bench_container, register_first_agent, attach_logs, enable_first_route. check_container_filter is reserved for future telemetry wiring (no FR-014a-defined trigger at MVP). T028 app.readiness handler — composes probes (in FR-013 fixed order: docker, tmux_discovery, sqlite, jsonl, routing_worker, log_attachment_workers), aggregates top-level state per FR-012 rules (all-ok→ready / all-unavailable→unavailable / else→degraded per FR-014), and returns the hints array (always present, possibly []). T029 app.dashboard handler — composes 7 surface counts via direct SQLite COUNT(*) queries against the FEAT-001..010 schema: containers (active/inactive/degraded_scan — degraded_scan returned as 0 at MVP per FR-016a; persisted column wiring deferred), panes (total/registered/unregistered via EXISTS join to agents), agents (total + by_role for the FEAT-006 closed set), log_attachments (active/degraded/none), events (total), queue (by FEAT-009 closed state set), routes (enabled/disabled). Recents for events/queue/routes via compact view-model builders (T015 partial). Plus 19 new smoke tests (test_app_contract_smoke.py) covering: - FR-012/FR-013: readiness envelope shape, 6-subsystem fixed order, reason-empty-when-ok, hints array always present. - FR-014a: docker_unavailable_hint emitted when docker subsystem is unavailable; severity=action_required as registered. - FR-015..FR-018: dashboard envelope shape, all 7 count surfaces present, per-surface bucket sets exactly match closed enumerations (FR-016 buckets for containers/panes/agents; FEAT-009 closed set for queue; enabled/disabled for routes). - FR-017: recent_limit default 10, validation_failed at bounds [0, 51, 100, -1] and on non-int types [str, float, bool, list] with details.field == "recent_limit". - FR-042: host-only gate on both methods. - FR-045: no audit side-effect (events.jsonl unchanged after dashboard call). - FR-018: empty system returns all-zero counts and empty recents. Plus 4 pre-existing DISPATCH lock-in tests updated to reflect the 4 new app.* entries: - tests/unit/test_dispatch_table_stability.py — EXPECTED_ORDER extended; cardinality assertion bumped 35 → 39. - tests/unit/test_socket_api_methods.py — closed-set membership extended. - tests/integration/test_feat005_no_real_container.py — cardinality assertion brought up to date (was pre-existing-broken from FEAT-010; this commit fixes it to FEAT-011 reality). Pre-existing test failures NOT addressed by this commit (none caused by FEAT-011): - test_status_default_output_six_lines (FEAT-008/009/010 staleness: status now emits 14 lines, not 6). - test_backcompat_status_json_keeps_feat002_through_008_keys (FEAT-010 staleness: schema_version is now 8, not 7). - test_cli_scan_panes_inactive_cascade.test_inactive_container_with_only_inactive_prior_panes_still_touches_them (pre-existing flaky millisecond-timestamp race). US1 MVP status: app.preflight, app.hello, app.readiness, app.dashboard are all callable through the FEAT-002 socket dispatcher. The packaged desktop app could now bootstrap and render a usable dashboard end-to- end, with structured hints, against the real daemon. SC-001 (no CLI parsing) is satisfied by construction (all four handlers return structured JSON envelopes). SC-002 (≤500ms dashboard) is not yet measured — a benchmark task (T030) is deferred. SC-008 (token redaction) is structurally satisfied (token never written to JSONL) but not yet verified by a grep test. Remaining for full US1 polish: - T030 SC-002 latency benchmark. - T019..T023 proper contract tests under tests/contract/ and the integration walkthrough under tests/integration/. - T013 audit helper (not needed for US1 read paths but needed for US2/US3 mutations). - T011, T012 idempotency + scan registries (needed for US2/US3). - T015 full view models for all 7 entities (compact builders only shipped here — full PaneViewModel/AgentViewModel/etc. needed for US2/US3 list/detail methods). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

sourcery-ai · 2026-05-19T14:39:20Z

Reviewer's Guide

Introduce initial FEAT-011 app.* backend contract slice: new app_contract package with versioning/error/envelope/session/host-only primitives plus four host-only methods (app.preflight, app.hello, app.readiness, app.dashboard) wired into the existing socket dispatcher, backed by detailed spec/plan/contracts and a smoke test suite; update dispatch-table tests and docs accordingly.

Sequence diagram for FEAT-011 app.* bootstrap and host-only gating

sequenceDiagram
    actor AppClient
    participant SocketServer as socket_api.methods
    participant AppDispatch as APP_DISPATCH
    participant Hello as app_hello
    participant Sessions as SessionRegistry
    participant Readiness as app_readiness
    participant Dashboard as app_dashboard

    AppClient->>SocketServer: send {"method": "app.hello"}
    SocketServer->>AppDispatch: lookup "app.hello"
    AppDispatch-->>SocketServer: Hello
    SocketServer->>Hello: app_hello(ctx, params, peer_uid)
    alt [peer is not host]
        Hello->>Hello: is_host_peer(peer_uid) == False
        Hello-->>SocketServer: envelope.failure(HOST_ONLY,...)
        SocketServer-->>AppClient: {ok: False, error.code: host_only}
    else [peer is host and major compatible]
        Hello->>Hello: _coerce_client_major(params[client_app_contract_major])
        Hello->>Hello: versioning.is_major_compatible(client_major)
        Hello->>Sessions: get_registry().create(...)
        Sessions-->>Hello: AppSession
        Hello-->>SocketServer: envelope.success({... app_session_token ...})
        SocketServer-->>AppClient: {ok: True, result.app_session_token}
    end

    AppClient->>SocketServer: send {"method": "app.readiness", app_session_token}
    SocketServer->>AppDispatch: lookup "app.readiness"
    AppDispatch-->>SocketServer: Readiness
    SocketServer->>Readiness: app_readiness(ctx, params, peer_uid)
    Readiness->>Readiness: is_host_peer(peer_uid)
    Readiness->>Readiness: probe_*(), aggregate_state(...)
    Readiness-->>SocketServer: envelope.success({state, subsystems, hints})
    SocketServer-->>AppClient: readiness envelope

    AppClient->>SocketServer: send {"method": "app.dashboard", app_session_token}
    SocketServer->>AppDispatch: lookup "app.dashboard"
    AppDispatch-->>SocketServer: Dashboard
    SocketServer->>Dashboard: app_dashboard(ctx, params, peer_uid)
    Dashboard->>Dashboard: is_host_peer(peer_uid)
    Dashboard->>Dashboard: _container_counts(), _pane_counts(), _agent_counts(), _queue_counts(), _route_counts()
    Dashboard->>Dashboard: _recent_events(), _recent_queue(), _recent_routes()
    Dashboard-->>SocketServer: envelope.success({counts, recent, hints})
    SocketServer-->>AppClient: dashboard envelope

File-Level Changes

Change	Details	Files
Add FEAT-011 documentation set (spec, plan, research, data model, contracts, tasks, quickstart, quality checklists) describing the new local app.* backend contract.	Introduce specs/011-app-backend-contract/spec.md capturing user stories, FRs, SCs, and clarifications for the app.* namespace. Add plan.md, research.md, data-model.md, quickstart.md, and multiple checklist files to define implementation strategy, data shapes, and quality gates. Define contracts for app methods, error codes, and closed-set enums under contracts/.	`specs/011-app-backend-contract/spec.md` `specs/011-app-backend-contract/plan.md` `specs/011-app-backend-contract/research.md` `specs/011-app-backend-contract/data-model.md` `specs/011-app-backend-contract/quickstart.md` `specs/011-app-backend-contract/contracts/app-methods.md` `specs/011-app-backend-contract/contracts/error-codes.md` `specs/011-app-backend-contract/contracts/closed-sets.md` `specs/011-app-backend-contract/checklists/*.md` `specs/011-app-backend-contract/tasks.md`
Introduce new app_contract subpackage implementing FEAT-011 primitives (versioning, errors, envelopes, sessions, host-only gate, view-model compactors) and US1 handlers (preflight, hello, readiness, dashboard).	Define contract versioning and closed-set enums in versioning.py, including readiness states, hint codes, container/queue/role states, and capability flags. Define closed-set error codes and per-code details registry in errors.py plus validate_details/ContractViolation helpers. Add envelope builders for success/failure/internal_error with app_contract_version stamping in envelope.py. Implement in-memory session registry with opaque token and numeric session id in sessions.py. Implement host-only peer predicate wrapping existing socket_api peer detection in host_only.py. Add compact view-model builders for dashboard recents in view_models.py. Implement app.preflight handler with host-only gate and simple ok diagnostic envelope in preflight.py. Implement app.hello handler with host-only gate, client param validation, major-version mismatch handling, and session issuance in hello.py. Implement readiness subsystem probes, state aggregation, hint emission, and app.readiness handler in readiness.py. Implement app.dashboard handler computing counts, recents, recent_limit validation, and reusing readiness hints in dashboard.py.	`src/agenttower/app_contract/__init__.py` `src/agenttower/app_contract/versioning.py` `src/agenttower/app_contract/errors.py` `src/agenttower/app_contract/envelope.py` `src/agenttower/app_contract/sessions.py` `src/agenttower/app_contract/host_only.py` `src/agenttower/app_contract/view_models.py` `src/agenttower/app_contract/preflight.py` `src/agenttower/app_contract/hello.py` `src/agenttower/app_contract/readiness.py` `src/agenttower/app_contract/dashboard.py`
Wire four new app.* methods into the existing socket dispatcher via a separate dispatch map while preserving legacy behavior and dispatch-table invariants.	Expose APP_DISPATCH mapping for app.* handlers in app_contract/dispatcher.py using lazy imports to avoid circular dependencies. Merge APP_DISPATCH into existing DISPATCH table in socket_api/methods.py (last 9 lines) so app.preflight, app.hello, app.readiness, app.dashboard become callable via FEAT-002 dispatcher. Update CLAUDE.md and feature metadata to point contributors to the new FEAT-011 plan docs.	`src/agenttower/app_contract/dispatcher.py` `src/agenttower/socket_api/methods.py` `CLAUDE.md` `.specify/feature.json`
Add and update tests to lock in dispatch-table size/order and exercise the foundational FEAT-011 app_contract slice (errors, envelopes, preflight, hello, readiness, dashboard).	Create tests/unit/test_app_contract_smoke.py covering dispatcher registration, closed-set error/regex invariants, envelope shapes, preflight/hello behavior including host-only gating and major mismatch, readiness subsystems and hints, and dashboard counts/recents/hints and side-effect-free behavior. Update integration dispatch-table test to document FEAT-010 and FEAT-011 additions, extend EXPECTED_KEYS set with routes.* and app.* methods, and bump expected count from 29 to 39. Update dispatch-table stability unit test to append app.* methods to EXPECTED_ORDER and adjust length assertion from 35 to 39. Update tests/unit/test_socket_api_methods.py closed-set test to include the four new app.* method names.	`tests/unit/test_app_contract_smoke.py` `tests/integration/test_feat005_no_real_container.py` `tests/unit/test_dispatch_table_stability.py` `tests/unit/test_socket_api_methods.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 3 issues, and left some high level feedback:

The new host_only.is_host_peer implementation depends on private helpers (_request_peer_pid, _peer_is_host_process, _NO_PEER_UID) from socket_api.methods; consider adding a small public helper in that module (e.g., is_host_peer_uid(peer_uid)) and calling that instead to avoid tight coupling to underscored internals.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The new `host_only.is_host_peer` implementation depends on private helpers (`_request_peer_pid`, `_peer_is_host_process`, `_NO_PEER_UID`) from `socket_api.methods`; consider adding a small public helper in that module (e.g., `is_host_peer_uid(peer_uid)`) and calling that instead to avoid tight coupling to underscored internals.

## Individual Comments

### Comment 1
<location path="src/agenttower/app_contract/host_only.py" line_range="26-35" />
<code_context>
+)
+
+
+def is_host_peer(peer_uid: int) -> bool:
+    """Return True iff the current request peer is a host process.
+
+    Implements the FR-042 gate predicate: the caller must run on the
+    host (not inside a bench container) AND have presented valid
+    SO_PEERCRED credentials matching the daemon's uid.
+
+    On non-Linux platforms or when SO_PEERCRED returns no uid, this
+    returns False — a host-only security gate that allows on "no
+    credentials" is bypassable, so we treat "no credentials" as the
+    same as a container caller (matching the FR-042 / routing-toggle
+    rationale in ``socket_api/methods.py``).
+    """
+    if peer_uid == _NO_PEER_UID:
+        return False
+    pid = _request_peer_pid()
</code_context>
<issue_to_address>
**🚨 question (security):** Consider incorporating peer_uid into the host-only check instead of relying solely on the peer PID.

The docstring promises the gate enforces both "runs on host" and "uid matches the daemon's uid", but the implementation ignores peer_uid (beyond `_NO_PEER_UID`) and just calls `_peer_is_host_process(_request_peer_pid())`. Unless `_peer_is_host_process` itself checks uid against the daemon’s uid, this may weaken the FR-042 guarantee. Even if it does, explicitly using peer_uid here would better reflect the stated trust boundary and be more robust to future socket-layer changes. Consider either using peer_uid in the predicate or updating the docstring to match the actual behavior.
</issue_to_address>

### Comment 2
<location path="src/agenttower/app_contract/dashboard.py" line_range="249-258" />
<code_context>
+# ─── Recents (FR-017) ─────────────────────────────────────────────────────
+
+
+def _recent_events(ctx: "DaemonContext", limit: int) -> list[dict[str, Any]]:
+    conn = getattr(ctx, "state_conn", None)
+    if conn is None:
+        return []
+    try:
+        rows = conn.execute(
+            "SELECT event_id, event_type, origin, agent_id, created_at "
+            "FROM events ORDER BY event_id DESC LIMIT ?",
+            (limit,),
+        ).fetchall()
+    except Exception:  # noqa: BLE001
+        return []
+    return [
+        {
+            "id": row[0],
</code_context>
<issue_to_address>
**suggestion:** The recent-events shaping logic duplicates compact_event and may drift over time.

In `_recent_events` you manually build `{id, type, origin, agent_id, timestamp, summary}` instead of reusing `view_models.compact_event`. This introduces a second shaping path that can silently diverge if `compact_event` changes (e.g., new fields, different defaults/normalization). Prefer delegating to `compact_event` and trimming fields, or document and enforce a distinct contract if this shape must differ.

Suggested implementation:

```python
def _recent_events(ctx: "DaemonContext", limit: int) -> list[dict[str, Any]]:
    conn = getattr(ctx, "state_conn", None)
    if conn is None:
        return []
    try:
        rows = conn.execute(
            "SELECT event_id, event_type, origin, agent_id, created_at "
            "FROM events ORDER BY event_id DESC LIMIT ?",
            (limit,),
        ).fetchall()
    except Exception:  # noqa: BLE001
        return []

    # Delegate shaping to view_models.compact_event to keep a single source of truth.
    # We construct a minimal event-like dict that compact_event can normalize.
    events = [
        {
            "id": row[0],
            "type": row[1],
            "origin": row[2],
            "agent_id": row[3],
            "timestamp": row[4],
        }
        for row in rows
    ]

    return [view_models.compact_event(event) for event in events]

```

To fully wire this up, ensure the following in `src/agenttower/app_contract/dashboard.py`:

1. `view_models` must be imported in this module, typically near the top of the file. If it's not already present, add something like:
   - `from . import view_models`
   or the appropriate import for your project layout.
2. Confirm that `view_models.compact_event` accepts the event shape used here. If it expects different keys (e.g. `created_at` instead of `timestamp`), adjust the constructed dict accordingly:
   - Change `"timestamp": row[4]` to `"created_at": row[4]` (or the field name expected by `compact_event`).
3. If `compact_event` relies on additional fields from the `events` table (e.g. `payload`, `status`), extend the SQL `SELECT` and the `events` dict construction to include those columns and keys to avoid losing information or causing `KeyError`s.
</issue_to_address>

### Comment 3
<location path="tests/unit/test_app_contract_smoke.py" line_range="165-151" />
<code_context>
+    assert env["result"] == {}
+
+
+def test_failure_envelope_shape() -> None:
+    """FR-033 + FR-034a: failure envelope has version + code + message + details."""
+    env = envelope.failure(
+        app_errors.AGENT_NOT_FOUND,
+        "agent does not exist",
+        details={"agent_id": "abc-123"},
+    )
+    assert env == {
+        "ok": False,
+        "app_contract_version": APP_CONTRACT_VERSION,
</code_context>
<issue_to_address>
**suggestion (testing):** Add a direct test for internal_error envelope shape and details semantics

`envelope.success` and `envelope.failure` are well covered, as is `validate_details`, but `internal_error` isn’t tested directly. Since the dispatcher relies on it as a safety net when handlers misbehave, please add a test that calls `envelope.internal_error()` and asserts that `ok` is False, `error.code` is `internal_error`, and `error.details` is `{}` to lock in the FR-033/FR-034a semantics for this fallback path.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-05-19T14:42:19Z

+def is_host_peer(peer_uid: int) -> bool:
+    """Return True iff the current request peer is a host process.
+
+    Implements the FR-042 gate predicate: the caller must run on the
+    host (not inside a bench container) AND have presented valid
+    SO_PEERCRED credentials matching the daemon's uid.
+
+    On non-Linux platforms or when SO_PEERCRED returns no uid, this
+    returns False — a host-only security gate that allows on "no
+    credentials" is bypassable, so we treat "no credentials" as the


🚨 question (security): Consider incorporating peer_uid into the host-only check instead of relying solely on the peer PID.

The docstring promises the gate enforces both "runs on host" and "uid matches the daemon's uid", but the implementation ignores peer_uid (beyond _NO_PEER_UID) and just calls _peer_is_host_process(_request_peer_pid()). Unless _peer_is_host_process itself checks uid against the daemon’s uid, this may weaken the FR-042 guarantee. Even if it does, explicitly using peer_uid here would better reflect the stated trust boundary and be more robust to future socket-layer changes. Consider either using peer_uid in the predicate or updating the docstring to match the actual behavior.

sourcery-ai · 2026-05-19T14:42:19Z

+def test_success_envelope_shape() -> None:
+    """FR-033: success envelope is {ok: true, app_contract_version, result}."""
+    env = envelope.success({"x": 1})
+    assert env == {


suggestion (testing): Add a direct test for internal_error envelope shape and details semantics

envelope.success and envelope.failure are well covered, as is validate_details, but internal_error isn’t tested directly. Since the dispatcher relies on it as a safety net when handlers misbehave, please add a test that calls envelope.internal_error() and asserts that ok is False, error.code is internal_error, and error.details is {} to lock in the FR-033/FR-034a semantics for this fallback path.

Copilot

Pull request overview

Introduces FEAT-011’s host-only app.* socket namespace as a versioned façade over the existing FEAT-002..010 service layer, along with extensive contract/spec documentation and dispatch-table wiring so US1 bootstrap methods are callable via the existing socket server.

Changes:

Add new src/agenttower/app_contract/ package implementing app.preflight, app.hello, app.readiness, and app.dashboard, plus envelope/error/versioning primitives.
Merge APP_DISPATCH into the legacy socket_api.methods.DISPATCH and update dispatch-table stability/closed-set tests for the new methods.
Add FEAT-011 documentation set (spec/plan/research/data-model/contracts/quickstart) and a large set of requirements-quality checklists.

Reviewed changes

Copilot reviewed 52 out of 52 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
tests/unit/test_socket_api_methods.py	Extend closed-set keys with `app.*`.
tests/unit/test_dispatch_table_stability.py	Lock dispatch ordering + update method count to 39.
tests/integration/test_feat005_no_real_container.py	Update pinned dispatch closed set + count.
src/agenttower/socket_api/methods.py	Merge `APP_DISPATCH` into legacy `DISPATCH`.
src/agenttower/app_contract/init.py	Export contract version constants.
src/agenttower/app_contract/view_models.py	Compact “recents” view-model builders.
src/agenttower/app_contract/versioning.py	Contract version + closed-set enums/constants.
src/agenttower/app_contract/sessions.py	In-memory session registry scaffolding.
src/agenttower/app_contract/readiness.py	`app.readiness` handler + subsystem probes + hints.
src/agenttower/app_contract/preflight.py	`app.preflight` handler.
src/agenttower/app_contract/host_only.py	Host-only peer predicate (reuses FEAT-002/009 helpers).
src/agenttower/app_contract/hello.py	`app.hello` handler + validation + session issuance.
src/agenttower/app_contract/errors.py	Closed-set error codes + details registry validation.
src/agenttower/app_contract/envelope.py	FEAT-011 success/failure envelope builders.
src/agenttower/app_contract/dispatcher.py	`APP_DISPATCH` mapping for new methods.
src/agenttower/app_contract/dashboard.py	`app.dashboard` handler (counts/recents/hints).
specs/011-app-backend-contract/research.md	FEAT-011 research decisions.
specs/011-app-backend-contract/quickstart.md	Story 1 manual walkthrough.
specs/011-app-backend-contract/plan.md	Implementation plan + structure.
specs/011-app-backend-contract/data-model.md	Data model + closed sets + validation rules.
specs/011-app-backend-contract/contracts/error-codes.md	Error-code registry + details schema.
specs/011-app-backend-contract/contracts/closed-sets.md	Closed enumerations + limits.
specs/011-app-backend-contract/contracts/app-methods.md	Method surface + wire contracts.
specs/011-app-backend-contract/checklists/versioning.md	Versioning checklist.
specs/011-app-backend-contract/checklists/sessions.md	Sessions checklist.
specs/011-app-backend-contract/checklists/security.md	Security checklist.
specs/011-app-backend-contract/checklists/scope-assumptions.md	Scope/assumptions checklist.
specs/011-app-backend-contract/checklists/research-rationale.md	Research rationale checklist.
specs/011-app-backend-contract/checklists/requirements.md	Spec quality checklist.
specs/011-app-backend-contract/checklists/readiness.md	Readiness checklist.
specs/011-app-backend-contract/checklists/read-surfaces.md	Read surfaces checklist.
specs/011-app-backend-contract/checklists/quickstart-quality.md	Quickstart quality checklist.
specs/011-app-backend-contract/checklists/plan-quality.md	Plan quality checklist.
specs/011-app-backend-contract/checklists/performance.md	Performance checklist.
specs/011-app-backend-contract/checklists/outcomes.md	Success criteria checklist.
specs/011-app-backend-contract/checklists/observability.md	Observability/audit checklist.
specs/011-app-backend-contract/checklists/mutations.md	Mutations checklist.
specs/011-app-backend-contract/checklists/errors.md	Errors checklist.
specs/011-app-backend-contract/checklists/data-model-quality.md	Data model quality checklist.
specs/011-app-backend-contract/checklists/dashboard.md	Dashboard checklist.
specs/011-app-backend-contract/checklists/cross-artifact-consistency.md	Cross-artifact consistency checklist.
specs/011-app-backend-contract/checklists/contract-surface.md	Contract surface checklist.
specs/011-app-backend-contract/checklists/contract-files.md	Contract files checklist.
specs/011-app-backend-contract/checklists/concurrency-reliability.md	Concurrency/reliability checklist.
specs/011-app-backend-contract/checklists/clarify-propagation.md	Clarification propagation checklist.
specs/011-app-backend-contract/checklists/bootstrap.md	Bootstrap checklist.
specs/011-app-backend-contract/checklists/adopt.md	Adopt checklist.
CLAUDE.md	Update referenced plan path to FEAT-011.
.specify/feature.json	Point feature directory to FEAT-011.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+# FEAT-011: merge the ``app.*`` namespace into DISPATCH. This import lives
+# AFTER the DISPATCH literal so every FEAT-002 name (``DaemonContext``,
+# ``_NO_PEER_UID``, ``_peer_is_host_process``, ``_request_peer_pid``) is
+# already defined when ``app_contract.host_only`` imports them. The merge
+# is purely additive — no legacy method binding is altered (FR-002).
+from agenttower.app_contract.dispatcher import APP_DISPATCH  # noqa: E402
+
+DISPATCH.update(APP_DISPATCH)


+def _build_app_dispatch() -> dict[str, _AppHandler]:
+    """Construct the ``app.*`` dispatch map.
+
+    Kept in a function (rather than a module-level dict literal) so the
+    imports of individual handler modules happen lazily — this avoids
+    a circular import on the FEAT-002 ``socket_api.methods`` side
+    (``host_only.py`` imports from there, so importing app_contract
+    eagerly from methods.py would loop).
+    """
+    from . import dashboard as _dashboard
+    from . import hello as _hello
+    from . import preflight as _preflight
+    from . import readiness as _readiness
+
+    return {
+        "app.preflight": _preflight.app_preflight,
+        "app.hello": _hello.app_hello,
+        "app.readiness": _readiness.app_readiness,
+        "app.dashboard": _dashboard.app_dashboard,
+    }
+
+
+APP_DISPATCH: dict[str, _AppHandler] = _build_app_dispatch()
+


+def app_readiness(
+    ctx: "DaemonContext",
+    params: dict[str, Any],
+    peer_uid: int = _NO_PEER_UID,
+) -> dict[str, Any]:
+    """Handler for ``app.readiness`` (FR-012..FR-014a, FR-045)."""
+    from .host_only import is_host_peer  # lazy import (circular avoidance)
+
+    if not is_host_peer(peer_uid):
+        return envelope.failure(
+            HOST_ONLY,
+            "app.* namespace is host-only; bench-container callers refused",
+            details={},
+        )
+


+def app_dashboard(
+    ctx: "DaemonContext",
+    params: dict[str, Any],
+    peer_uid: int = _NO_PEER_UID,
+) -> dict[str, Any]:
+    """Handler for ``app.dashboard`` (FR-015..FR-018, FR-045)."""
+    from .host_only import is_host_peer  # lazy (circular avoidance)
+
+    if not is_host_peer(peer_uid):
+        return envelope.failure(
+            HOST_ONLY,
+            "app.* namespace is host-only; bench-container callers refused",
+            details={},
+        )
+
+    if not isinstance(params, dict):
+        params = {}
+
+    recent_limit, err = _coerce_recent_limit(params.get("recent_limit"))
+    if err is not None:
+        return err
+    assert recent_limit is not None  # narrowed by _coerce_recent_limit
+


+class SessionRegistry:
+    """Thread-safe in-memory app session table.
+
+    Currently single-process scoped. Session tokens are keyed by
+    ``app_session_token``; lookup is O(1).
+
+    The registry is intentionally minimal at v1.0 because FEAT-002's
+    one-request-per-connection model means tokens are seldom re-presented
+    over the same connection; future tightening could add per-connection
+    binding and a short TTL on disconnect.
+    """


+def compact_event(row: Any) -> dict[str, Any]:
+    """Compact ``event`` row → dashboard recent payload.
+
+    ``row`` is whatever ``events.dao.select_events`` returns (an
+    ``EventRow`` dataclass-like object with ``event_id``,
+    ``event_type``, ``origin``, ``created_at``, ``agent_id``,
+    ``payload``, etc.). We keep the contract robust to schema drift
+    by accessing fields via ``getattr`` with safe defaults.
+    """
+    return {
+        "id": _get(row, "event_id"),
+        "timestamp": _get(row, "created_at"),
+        "type": _get(row, "event_type", default=""),
+        "origin": _get(row, "origin", default=""),
+        "agent_id": _get(row, "agent_id"),
+        "summary": _summarize_event(row),
+    }


+def is_host_peer(peer_uid: int) -> bool:
+    """Return True iff the current request peer is a host process.
+
+    Implements the FR-042 gate predicate: the caller must run on the
+    host (not inside a bench container) AND have presented valid
+    SO_PEERCRED credentials matching the daemon's uid.
+
+    On non-Linux platforms or when SO_PEERCRED returns no uid, this
+    returns False — a host-only security gate that allows on "no
+    credentials" is bypassable, so we treat "no credentials" as the
+    same as a container caller (matching the FR-042 / routing-toggle
+    rationale in ``socket_api/methods.py``).
+    """
+    if peer_uid == _NO_PEER_UID:
+        return False
+    pid = _request_peer_pid()
+    return _peer_is_host_process(pid)


+def probe_jsonl(ctx: "DaemonContext") -> SubsystemRow:
+    """JSONL audit-stream reachability via path stat + write permission."""
+    events_file = getattr(ctx, "events_file", None)
+    if events_file is None:
+        return SubsystemRow(
+            name="jsonl",
+            status=SUBSYSTEM_STATUS_UNAVAILABLE,
+            reason="events_file not wired",
+            hint=None,
+        )
+    p = Path(events_file)
+    parent = p.parent
+    try:
+        if not parent.exists():
+            return SubsystemRow(
+                name="jsonl",
+                status=SUBSYSTEM_STATUS_UNAVAILABLE,
+                reason=f"events_file parent {parent} does not exist",
+                hint=None,
+            )
+        # If the file exists, ensure we can stat it; if it doesn't,
+        # the writer creates on first append, which is acceptable.
+        if p.exists():
+            p.stat()
+    except Exception as exc:  # noqa: BLE001


+## R-004: Envelope / details validation approach
+
+**Decision**: Pure-Python validation via `envelope.py` and `errors.py` — no `pydantic`, no `jsonschema`. Success envelopes are built by `envelope.success(method, result)` and failure envelopes by `envelope.failure(method, code, details, message=None)`. Both helpers stamp `app_contract_version` automatically. The `errors.py` module exposes a registry `DETAILS_SCHEMA: dict[str, set[str]]` enumerating required keys per code; `envelope.failure()` asserts the supplied `details` dict contains every required key for the code, raising a daemon-side `ContractViolation` exception if a handler emits a malformed failure (caught and reported as `internal_error` to the client to avoid leaking the bug).
+
+**Rationale**: The contract test suite is the system of record for envelope shape; adding `pydantic` / `jsonschema` to the runtime introduces a heavyweight dependency for a closed contract where every shape is known in advance. The closed-set codes and required-key registry are small (25 codes, ~10 entries with structured details) — a hand-written validator is more readable and faster than a schema engine. Contract tests still independently verify response shapes from the outside.


+**Language/Version**: Python 3.11+ (matches existing daemon).
+**Primary Dependencies**: existing daemon services from FEAT-002 (socket dispatcher), FEAT-003 (container discovery), FEAT-004 (pane discovery), FEAT-006 (agent registration), FEAT-007 (log attachment), FEAT-008 (event pipeline), FEAT-009 (message queue + permission gate), FEAT-010 (routes catalog). No new third-party Python dependencies are introduced.
+**Storage**: In-memory only. No SQLite migration. No JSONL schema bump (the JSONL audit format adds an `origin = "app"` value to its existing `origin` field, which already accommodates string variants). Three in-memory stores added: app-session table, scan-result table (cap 100 per daemon process), per-session idempotency-key dedupe map.
+**Testing**: pytest. New contract test layer under `tests/contract/test_app_*.py` using synthetic Unix-socket clients (no `agenttower` subprocess invocation — see SC-001). Existing FEAT-002..010 test suite remains untouched.
+**Target Platform**: Linux primary; macOS and Windows host targets follow per Assumptions. The daemon is Python; the packaged client is out of scope (FEAT-012). All FEAT-011 work is server-side.
+**Project Type**: CLI daemon + structured-API façade. Single Python package (`agenttower`).
+**Performance Goals**: SC-002 cold-start-to-dashboard ≤ 500 ms (no-cache, ≥1 container, ≥1 agent fixture); SC-004 adopt round-trip ≤ 2 s; `app.preflight` and `app.hello` < 50 ms (target); `app.readiness` < 100 ms; list/detail < 100 ms at default pagination.
+**Constraints**: Local-only — FR-003 forbids any non-Unix-socket listener; SC-006 verifies via packet capture. Host-only `app.*` — FR-042 rejects bench-container peers with `host_only`. Pagination cap 200 (FR-020a). Synchronous scan wait cap 30 s (FR-030b). Per-session idempotency-key store is in-memory only; lost on daemon restart or session close (FR-031a). Per-line payload caps **1 MiB request / 8 MiB response** (FR-003a) — request overflow returns `payload_too_large`; response overflow is a daemon-side invariant guarded by the FR-020a pagination cap. No new persisted secret (FR-043).


Addresses two self-review findings on PR #19: 1. HIGH — FR-007 session gate was not enforced. ``app.readiness`` and ``app.dashboard`` previously dispatched without checking ``params.app_session_token``. The closed-set codes ``APP_SESSION_REQUIRED`` and ``APP_SESSION_EXPIRED`` existed in ``errors.py`` but were unused — making the FR-007 session contract decorative. Adds ``sessions.gate_session_required(params, peer_uid)`` that combines the FR-042 host-only check with the FR-007 session lookup in the documented order: a. host_only fires first (never leak session-existence to a non-host peer, even if they present a valid-looking token) b. app_session_required (token missing or wrong type) c. app_session_expired (token present but not in registry) Returns ``AppSession`` on success or a failure-envelope dict on either gate rejection. Both ``app_readiness`` and ``app_dashboard`` call it at the top. Token location: per the FEAT-002 dispatcher's ``{method, params}`` request shape, the token lives inside ``params`` (i.e., ``params.app_session_token``) rather than as a top-level request field. The quickstart.md sample is updated accordingly. Keeping the token in ``params`` avoids any change to ``socket_api/methods.py`` beyond the existing 9-line DISPATCH merge. 2. MEDIUM — probe_jsonl did not check writability. The earlier probe only verified existence; a read-only events.jsonl parent or file would still report ``ok`` while silently swallowing audit emissions. Now uses ``os.access(..., os.W_OK)`` on both the parent directory and the events file (when present), reporting ``degraded`` with a clear ``reason`` on either negative. Tests: - 8 new smoke tests for the session-gate paths: test_readiness_session_required_when_token_missing test_readiness_session_expired_when_token_invalid test_readiness_session_required_on_non_string_token test_readiness_host_only_beats_session_gate test_dashboard_session_required_when_token_missing test_dashboard_session_expired_when_token_invalid test_dashboard_host_only_beats_session_gate test_readiness_jsonl_probe_degraded_when_parent_unwritable - New ``host_session`` fixture (uid + token from app.hello) used by every readiness/dashboard test that needs a valid session. - All previously-passing tests still pass (now exercise the gate rather than bypass it). 50 / 50 smoke tests passing (was 42; +8 net for the new coverage). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Lightweight handoff doc for whoever continues `/speckit.implement` in a future session. Captures: - Sync recipe (smoke suite must pass before adding code). - What's already working end-to-end (don't redo). - Three pickup options (A: close US1 properly; B: jump to US2 adopt; C: monster PR — not recommended). - Known weaknesses & deferred follow-ups from PR #19 review. - Three pre-existing FEAT-001..010 test failures NOT caused by FEAT-011 (separate cleanup PR). - Circular-import gotcha + test-time host-peer detection seam. - File map (which modules exist; which are deferred). - Commit-before-walking-away discipline. The `tasks.md` checklist remains the authoritative scope list; this file just adds context the checklist doesn't capture. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

…ety net Addresses 4 bot-review findings (Sourcery + Copilot) from PR #19: 1. MEDIUM — validate_details() did not enforce FR-034a's "codes not in DETAILS_REQUIRED_KEYS MUST carry details == {}" rule. A handler could emit non-empty details for e.g. ``host_only`` and the contract would drift silently. Now ``validate_details`` raises ``ContractViolation`` whenever an unregistered code carries any keys. Codes IN the registry still allow additional keys beyond required (additive minor evolution per FR-034a). Tests added: test_validate_details_rejects_extras_on_unregistered_code test_validate_details_allows_extras_on_registered_code 2. MEDIUM — envelope.failure() docstring promised that ``ContractViolation`` was caught by the dispatcher and mapped to the FEAT-011 ``internal_error`` envelope, but no such catch existed. The FEAT-002 server's top-level exception handler would instead return the legacy ``socket_api.errors.INTERNAL_ERROR`` shape (no ``app_contract_version`` field) — violating the FR-033 envelope-shape invariant. Adds ``_wrap_handler(handler)`` in ``app_contract/dispatcher.py`` that wraps every ``app.*`` entry in APP_DISPATCH: - catches ContractViolation → FEAT-011 internal_error envelope - catches any other Exception → FEAT-011 internal_error envelope with the exception type name in the message The wrapped functions are what land in DISPATCH. Lazy-imports envelope+errors inside the wrapper body to keep the dispatcher module-load free of cycles. Tests added: test_dispatcher_wraps_contract_violation_into_internal_error test_dispatcher_wraps_unexpected_exception_into_internal_error test_dispatcher_wrapped_handler_passes_through_normal_returns test_app_dispatch_handlers_are_wrapped (identity check that the four real handlers in DISPATCH are wrappers, not raw functions) 3. LOW — dashboard._recent_events() was building the compact event row inline (``{id, type, origin, agent_id, timestamp, summary}``) in parallel with view_models.compact_event() which exposes the same projection — a real drift risk if either changed shape later. Now calls compact_event() with the SELECT columns repackaged as a small dict. No behavior change; one shaping path instead of two. 4. LOW — docs drift between research.md / plan.md and the as-built implementation: - research.md R-004 cited envelope.success(method, result) / envelope.failure(method, code, details, message=None) — the actual signatures are success(result=None) and failure(code, message, details=None). Updated R-004 to match and added a sentence about the dispatcher's _wrap_handler catching ContractViolation+Exception (now matches reality). - plan.md §Technical Context Testing said new contract tests "live under tests/contract/test_app_*.py". Updated to reflect that this PR ships a smoke equivalent under tests/unit/ and the structured layout migration is tracked as T019..T023. The remaining bot-review findings are deferred to follow-up PRs: - Centralized dispatcher gate chain (Copilot, T014) — bigger refactor; current per-handler ``gate_session_required()`` works. - SessionRegistry connection-binding for FR-008 (Copilot) — requires FEAT-002 lifecycle hooks; already documented as a known weakness in sessions.py. - is_host_peer() self-contained peer_uid check (Sourcery+Copilot, defense-in-depth) — FEAT-002 already enforces at accept; deferred to US2/US3 polish. - Direct test for internal_error envelope shape (Sourcery) — now covered indirectly via the dispatcher-wrap tests above. 88 / 88 tests pass (50 smoke + dispatch lock-in + socket_api). The pre-existing CI flake (test_size_cap_rejection_under_100ms perf budget) is unrelated to this PR and stays for a separate cleanup. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… code Encodes the 70-question checklist-walkthrough batch into spec.md as a new "Session 2026-05-19 (round 4)" Clarifications block. All 9 user overrides from the recommended defaults are tagged inline with `(override)`. Spec changes: - FR-003b: NDJSON wire framing — UTF-8, \n only, no \r, no \x00, no trailing content; violations → `malformed_request` before dispatch. - FR-008a/b: `app.hello` idempotent on same connection; 8 concurrent sessions process-wide. - FR-028a-d: adopt full 6-field pane identity match; attach_log+inactive fails whole adopt; nonexistent parent_agent_id → agent_not_found; label normalization (trim, no newlines, ≤256). - FR-030d/e: scan coalescing on same kind; 4 concurrent in-flight scans cap. - FR-031: routing_disabled (kill switch) vs permission_denied (per-msg permission gate) split for app.send_input. - FR-034: 26 → 27 codes; adds malformed_request. - FR-044a-c: audit writer mutex; best-effort under JSONL outage (mutation still commits); audit after SQLite commit, before envelope. - SC-028..SC-038: 11 new contract tests for the above. Contracts: - error-codes.md: 27-entry table; per-code details registry updated; routing_disabled vs permission_denied distinction documented. - closed-sets.md: new sections for wire framing, concurrency caps, audit writer, capability_flags cap. - app-methods.md: adopt mutation rules, send_input routing/permission split, scan coalescing/cap, hello idempotency. Plan: - 27-entry closed set; concurrency caps; wire-framing strictness; audit mutex + best-effort semantics; SC scale fixtures pinned. Checklists: - All 668 items across the 23 checklists ticked (resolved by this round and prior rounds 1-3, per the batch preface). Implementation: - errors.py: adds MALFORMED_REQUEST code + details registry entry; ERROR_CODES bumped to 27. - smoke test updated: count assertion 26 → 27, plus malformed_request details-registry assertion. 57 passed (+1). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds tests/integration/test_story1_dashboard_bootstrap.py — 10 tests walking the Story 1 bootstrap chain (preflight → hello → readiness → dashboard) over a real Unix socket against the running daemon. Closes the socket-level test gap noted in PR #19 review. Coverage: - preflight envelope shape (FR-011) over the wire - hello FR-010 minimum field set, capability_flags == {} at v1.0, supported_minor_range == 1.0..1.0 - per-call session token issuance (one-request-per-connection model) - token persistence across fresh connections (the actual implementation reality vs. FR-008's connection-bound wording — see T097) - readiness 6 required subsystems + hints array always present - dashboard 7 count surfaces + hints array - FR-007 session-required gate over the wire - FR-034b unknown_method on app.foo.bar - SC-002 worst-of-5 ≤500ms wall-clock hello→dashboard (Round-4 Q52 superseded the original p95-over-20 wording) - SC-008 app_session_token never in events.jsonl Marks T023 + T030 done in tasks.md. Surfaced two real spec/implementation drifts (filed as T097, T098): - FEAT-002 one-request-per-connection breaks FR-008 / FR-008a as written; implementation adapted, spec text needs updating. - Legacy unknown_method envelope is missing FR-033 details:{} and app_contract_version stamp on app.* names; needs a wrapping rewriter. NEXT_SESSION.md updated with the latest state. Smoke suite: 57 passed. New integration suite: 10 passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 55 out of 55 changed files in this pull request and generated 5 comments.

+    def __init__(self) -> None:
+        self._lock = threading.Lock()
+        self._sessions: dict[str, AppSession] = {}
+
+    def create(
+        self,
+        *,
+        client_id: str,
+        client_version: str,
+        client_app_contract_major: int,
+        host_user_id: str,
+    ) -> AppSession:
+        """Issue a new session and record it."""
+        session = AppSession(
+            app_session_token=_generate_token(),
+            app_session_id=_next_session_id(),
+            client_id=client_id,
+            client_version=client_version,
+            client_app_contract_major=client_app_contract_major,
+            host_user_id=host_user_id,
+            connection_started_at_ms=int(time.time() * 1000),
+        )
+        with self._lock:
+            self._sessions[session.app_session_token] = session
+        return session


+"""FEAT-011 app-session registry (FR-005..FR-010, FR-036).
+
+In-memory only. Per-connection sessions issued by ``app.hello``,
+invalidated when the underlying socket connection closes (FR-008).
+Never persisted across daemon restarts (FR-006).
+
+Note: in the current FEAT-002 daemon, every connection processes a
+single request and is then closed (FR-026 one-request-per-connection).
+That makes connection-scoped session lifetimes very short — every
+``app.*`` request that requires a session would, in practice, have
+its token presented over a fresh connection. The session table is
+still useful for audit attribution (``app_session_id`` flows to
+JSONL per FR-044) and for the major-mismatch guard from FR-036.
+
+For follow-up tightening, the dispatcher could maintain a session-token-
+indexed map that survives a connection close for a short TTL, but
+that is explicitly NOT FEAT-011 v1.0 behavior — the contract says
+"session invalidated on connection close" (FR-008).


+    def wrapped(
+        ctx: "DaemonContext",
+        params: dict[str, Any],
+        peer_uid: int = -1,
+    ) -> dict[str, Any]:
+        from . import envelope as _envelope
+        from .errors import ContractViolation
+
+        try:
+            return handler(ctx, params, peer_uid)
+        except ContractViolation as exc:
+            return _envelope.internal_error(
+                f"app.* handler emitted malformed envelope: {exc}"
+            )
+        except Exception as exc:  # noqa: BLE001 — FR-033 envelope-shape safety net
+            return _envelope.internal_error(
+                f"app.* handler raised {type(exc).__name__}: {exc}"
+            )


+def is_host_peer(peer_uid: int) -> bool:
+    """Return True iff the current request peer is a host process.
+
+    Implements the FR-042 gate predicate: the caller must run on the
+    host (not inside a bench container) AND have presented valid
+    SO_PEERCRED credentials matching the daemon's uid.
+
+    On non-Linux platforms or when SO_PEERCRED returns no uid, this
+    returns False — a host-only security gate that allows on "no
+    credentials" is bypassable, so we treat "no credentials" as the
+    same as a container caller (matching the FR-042 / routing-toggle
+    rationale in ``socket_api/methods.py``).


+## R-002: macOS SO_PEERCRED variant choice
+
+**Decision**: Use Python's `socket.getsockopt(socket.SOL_LOCAL, socket.LOCAL_PEERCRED)` (macOS) and `socket.SO_PEERCRED` (Linux). Wrap both behind `host_only.get_peer_pid_uid(conn)` returning `(pid, uid)`. The host-vs-container check runs against `pid`; the UID check (FR-041) runs against `uid`.
+
+**Rationale**: This matches what FEAT-009's existing implementation uses (per code inspection of `src/agenttower/routing/permissions.py`). On macOS, `LOCAL_PEERCRED` returns `(uid, gid)` and `LOCAL_PEERPID` returns the pid; both are needed. On Linux, `SO_PEERCRED` returns `(pid, uid, gid)` in one call.
+
+**Alternatives considered**:
+- *`psutil` cross-platform abstraction*: rejected — adds a runtime dependency for a one-liner.
+- *Windows-named-pipe variant*: out of scope — Windows daemon target uses Unix-domain sockets via WSL2 (per Assumptions) or AF_UNIX which Windows 10+ supports natively; `getsockopt(SO_PEERCRED)` is not available on Windows AF_UNIX, so the daemon falls back to checking the process owner via `GetNamedPipeServerProcessId` equivalent. This is out of scope for FEAT-011's contract — left to plan-level implementation in FEAT-012 / FEAT-013 host-runtime work if Windows desktop ships.
+


Implements FR-003b wire-framing rejection at the FEAT-002 socket reader and closes the T098 drift around legacy ``unknown_method`` envelopes for ``app.*`` methods. Also lands a small batch of follow-on edits that landed in the tree alongside (sessions/hello/dispatcher/spec/integration test refinements that round-4 + T023 surfaced). FR-003b (wire framing): Adds ``_make_malformed_request_envelope(reason)`` in ``socket_api/server.py``. Lazy-imports ``app_contract.envelope`` to avoid the methods.py-side load cycle. Used by ``_read_and_dispatch`` before handler dispatch to reject 5 wire-framing classes with the FR-003b-mandated FEAT-011 envelope ``{ok: false, app_contract_version, error: {code: "malformed_request", message, details: {reason}}}``: - (a) stray ``\r`` byte → "stray CR" - (b) embedded ``\x00`` byte → "embedded NUL" - (c) trailing content after first JSON → "trailing content" - (d) JSON decode error → "json decode error: <msg>" - (e) empty line (just ``\n``) → "empty line" Plus invalid UTF-8 (case-(a)-adjacent) → "invalid utf-8" The FEAT-011 envelope is a strict superset of the legacy FEAT-002 shape — legacy clients still see ``ok: false`` + ``error.code`` they can parse, just with an unfamiliar closed-set code. Acceptable for inputs that were always malformed. Detection uses ``json.JSONDecoder.raw_decode`` so we can distinguish trailing content from a clean decode failure. The legacy ``REQUEST_TOO_LARGE`` and pre-read ``BAD_JSON`` paths are unchanged (those operate on bytes-state that predates the method-name parse). T098 (unknown_method envelope rewriter): ``app_contract.dispatcher`` exports ``is_app_method(name)`` and ``make_unknown_method_envelope(method)`` for FR-033/FR-034b compliance: any ``app.*`` method name not present in DISPATCH now emits the FEAT-011 envelope with ``code == "unknown_method"`` and ``details == {}`` (vs the legacy FEAT-002 shape, which is missing both ``app_contract_version`` and ``details``). Legacy methods stay on the FEAT-002 envelope. ``socket_api/server.py::_read_and_dispatch`` now branches on ``is_app_method(method)`` before emitting the unknown-method envelope. FR-008b (session cap) — already-landed pieces re-confirmed: ``SessionRegistry`` has ``MAX_SESSIONS = 8`` and raises ``SessionCapExceeded`` on overflow; ``hello.py`` catches and emits ``validation_failed`` with ``details = {field: "app.hello", reason: "too_many_sessions"}``. Lands here too. FR-008a — spec adapted per T097: ``spec.md`` FR-008a wording is now the "fresh token per app.hello, re-presented across reconnects" model. The original "same-connection idempotency" wording is marked deprecated and the T097 drift discovery is the authoritative reasoning. SC-037 follows suit. ``sessions.py`` docstring updates match. Tests: - ``test_malformed_request_envelope_shape`` — single-case envelope check (FR-003b). - ``test_malformed_request_envelope_reason_classes`` — parametrized over all 6 reason classes; each builds a structurally-valid FEAT-011 envelope. - ``test_app_dispatcher_unknown_method_envelope_for_app_method`` — covers both the ``is_app_method`` predicate (4 cases) and the ``make_unknown_method_envelope`` output shape (T098 + FR-034b ``details: {}`` invariant). 111 / 111 tests pass (smoke + integration + dispatch lock-in + socket_api). The pre-existing FEAT-002 perf flake (``test_size_cap_rejection_under_100ms``) is unrelated to this PR. FR-030d (scan coalescing) + FR-030e (scan cap) remain deferred to US2 implementation since neither has a caller yet (the ``app.scan.*`` handlers land with T036–T038). Spec text is in place; impl follows naturally with the scan handlers. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

T097 (FR-008 / FR-008a session-lifecycle wording drift) and T098 (unknown_method envelope rewriter for app.*) were both fully resolved by commits 5984d6b (round-4 spec updates), 9a1da02 (T023 integration test that surfaced both drifts), and 3f9fcc8 (FR-003b + T098 impl). This commit updates the T097 / T098 entries in tasks.md from open to done with a brief summary of what landed: - T097: FR-008/008a/008b rewritten to describe the token-persistence model that's actually implemented (process-wide SessionRegistry, not connection-bound, invalidated only by daemon restart). FR-008b ships the v1.0 reject-not-evict mode of the 8-session cap. SessionRegistry.create() raises SessionCapExceeded at the cap; hello handler translates to validation_failed.too_many_sessions. New smoke test `test_hello_rejects_9th_concurrent_session` covers it. - T098: server.py emits the FR-033-compliant envelope (with app_contract_version + details:{}) for unknown app.* method names; legacy methods keep their FEAT-002 shape per FR-002. Wired via is_app_method() + make_unknown_method_envelope() in app_contract/dispatcher.py. Full suite: 79 tests passing (58 smoke + 14 socket integration + 7 others under the smoke umbrella). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Regression fix for PR #19 CI (run 26110310901). The prior commit 3f9fcc8 changed two paths in ``_read_and_dispatch`` to emit the FEAT-011 ``malformed_request`` envelope: 1. ``UnicodeDecodeError`` on the raw line bytes 2. ``json.JSONDecodeError`` from ``json.JSONDecoder.raw_decode`` That broke ``tests/unit/test_socket_api_framing.py:: test_dispatch_rejects_malformed_request`` which is a FEAT-002 contract lock-in test asserting ``bad_json`` for both: (b"\xff\xfe not utf-8\n", errors.BAD_JSON) (b"this is not json\n", errors.BAD_JSON) These are legitimate FEAT-002 contract behaviors that legacy CLI callers depend on. Switching to ``malformed_request`` would have required either bumping the FEAT-002 contract (out of scope for this PR) or adding a method-aware envelope rewriter at the wire-framing layer (T098-adjacent, also out of scope). This commit reverts those two paths to the legacy ``errors.make_error(errors.BAD_JSON, ...)`` envelope while keeping the NEW pre-parse checks (\r, \x00, empty line) AND the post-parse trailing-content check on the FEAT-011 ``malformed_request`` envelope — those weren't covered by the FEAT-002 lock-in. FR-003b's documented 5 reason classes therefore land partially: Implemented (malformed_request envelope): (a) stray CR (b) embedded NUL (c) trailing content (e) empty line Deferred (legacy bad_json envelope, to preserve FEAT-002 contract): (d) JSON decode error Plus invalid UTF-8 (FR-003b-adjacent) The deferred classes will be lifted in a future PR via either: - An explicit FEAT-002 contract bump that allows ``malformed_request`` for these cases, OR - A method-aware envelope rewriter that emits ``malformed_request`` for ``app.*`` method names and ``bad_json`` for legacy method names (requires extracting ``method`` from a partially-parsed line — a best-effort heuristic). SC-028 verification therefore covers 4 of 6 wire-framing rejection classes in this PR; the remaining 2 are tracked as follow-up. Local sweep: 107 / 107 tests green (smoke + framing lock-in + integration + dispatch lock-in). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

T011 IdempotencyStore (FR-031a): per-session in-memory LRU cap 256. Used by app.send_input (US3) for dedupe keyed by (app_session_id, idempotency_key). Methods: lookup, record, clear, size. Re-record refreshes in place. T012 ScanRegistry (FR-030c/d/e): in-memory scan-record store. FIFO cap 100 records; in-flight cap 4; same-kind coalescing on start(). v1.0 closed state set is exactly {running, completed, failed} — expired is intentionally absent. Methods: start (returns record + coalesced bool), complete, fail, lookup, size, in_flight_count. T013 audit.emit_app_mutation (FR-044/044a/044b/044c): wraps the existing FEAT-008 events.writer.append_event which already holds a process-wide mutex (FR-044a). Injects origin="app" + app_session_id. Rejects payloads carrying protected keys (origin / app_session_id / app_session_token). FR-044b best-effort: on JSONL outage the mutation completes, the row is dropped, and one stderr warning is emitted per 60s window. Caller's mutation is never affected. 22 new unit tests in tests/unit/test_app_foundational_stores.py cover the contract surface of all three stores. Full suite: 104 passing (82 pre-existing + 22 new). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 55 out of 56 changed files in this pull request and generated 5 comments.

+        # FR-003b case (c): use raw_decode so we can distinguish "extra
+        # content after the first JSON object" from a clean parse failure.
+        text_stripped = text.lstrip()
+        decoder = json.JSONDecoder()
        try:
-            request = json.loads(text)
+            request, idx = decoder.raw_decode(text_stripped)
        except json.JSONDecodeError as exc:
            return errors.make_error(errors.BAD_JSON, f"json decode failed: {exc.msg}")
+        # Anything non-whitespace remaining is trailing content.
+        if text_stripped[idx:].strip():
+            return _make_malformed_request_envelope("trailing content")


@@ -149,15 +178,50 @@ def _read_and_dispatch(self) -> dict[str, Any]:
                f"request line exceeds {MAX_REQUEST_BYTES} bytes",
            )


+    return envelope.success({
+        "app_session_token": session.app_session_token,
+        "app_session_id": session.app_session_id,
+        "daemon_version": ctx.daemon_version,
+        "schema_version": schema_version,
+        "app_contract_version": versioning.APP_CONTRACT_VERSION,
+        "supported_minor_range": versioning.SUPPORTED_MINOR_RANGE,
+        "host_user_id": host_user_id,
+        "capability_flags": versioning.CAPABILITY_FLAGS_V1_0,
+        "state": "ok",


+def test_fr003b_wire_framing_stray_cr_returns_malformed_request(
+    socket_path: Path,
+) -> None:
+    """FR-003b case (a): a stray `\\r` byte → malformed_request."""
+    sock = _open_socket(socket_path)
+    try:
+        sock.sendall(b'{"method":\r"app.preflight"}\n')
+        buf = b""
+        while not buf.endswith(b"\n"):
+            chunk = sock.recv(65536)
+            if not chunk:
+                break
+            buf += chunk
+    finally:
+        sock.close()
+    envelope = json.loads(buf.decode("utf-8"))
+    assert envelope["ok"] is False
+    assert envelope["error"]["code"] == "malformed_request"
+    assert envelope["error"]["details"]["reason"] == "stray CR"
+    assert envelope["app_contract_version"] == "1.0"
+
+
+def test_fr003b_wire_framing_embedded_nul_returns_malformed_request(
+    socket_path: Path,
+) -> None:
+    """FR-003b case (b): an embedded `\\x00` byte → malformed_request."""
+    sock = _open_socket(socket_path)
+    try:
+        sock.sendall(b'{"method":"app.\x00preflight"}\n')
+        buf = b""
+        while not buf.endswith(b"\n"):
+            chunk = sock.recv(65536)
+            if not chunk:
+                break
+            buf += chunk
+    finally:
+        sock.close()
+    envelope = json.loads(buf.decode("utf-8"))
+    assert envelope["ok"] is False
+    assert envelope["error"]["code"] == "malformed_request"
+    assert envelope["error"]["details"]["reason"] == "embedded NUL"
+
+
+def test_fr003b_wire_framing_trailing_content_returns_malformed_request(
+    socket_path: Path,
+) -> None:
+    """FR-003b case (c): trailing content after one JSON object →
+    malformed_request (Round-4 Block A Q3 override — reject whole line)."""
+    sock = _open_socket(socket_path)
+    try:
+        sock.sendall(b'{"method":"app.preflight"}  {"extra":true}\n')
+        buf = b""
+        while not buf.endswith(b"\n"):
+            chunk = sock.recv(65536)
+            if not chunk:
+                break
+            buf += chunk
+    finally:
+        sock.close()
+    envelope = json.loads(buf.decode("utf-8"))
+    assert envelope["ok"] is False
+    assert envelope["error"]["code"] == "malformed_request"
+    assert envelope["error"]["details"]["reason"] == "trailing content"
+


+def test_sc002_hello_to_dashboard_within_500ms(socket_path: Path) -> None:
+    """SC-002: wall-clock from app.hello send to app.dashboard response
+    receive is ≤ 500 ms in 5 of 5 trials on a daemon already running.
+
+    Each trial opens two fresh connections (one for hello, one for
+    dashboard) to match FEAT-002's one-request-per-connection model.
+    """
+    trials = []
+    for _ in range(5):
+        t_start = time.perf_counter()
+        hello = _one_shot_call(socket_path, "app.hello")
+        assert hello["ok"] is True
+        token = hello["result"]["app_session_token"]
+        dashboard = _one_shot_call(
+            socket_path, "app.dashboard", {"app_session_token": token}
+        )
+        t_end = time.perf_counter()
+        assert dashboard["ok"] is True, dashboard
+        trials.append((t_end - t_start) * 1000.0)
+    worst = max(trials)
+    assert worst <= 500.0, (
+        f"SC-002 violation: worst hello→dashboard wall-clock {worst:.1f} ms "
+        f"exceeds 500 ms budget. Trials (ms): "
+        f"{', '.join(f'{t:.1f}' for t in trials)}"
+    )


T015 view_models.py: add the 4 missing full view-model builders (container_view, pane_view, agent_view, log_attachment_view) plus full event/queue/route projections alongside the existing compact dashboard recents. Pure projection functions — no I/O. Robust _get helper handles dataclass/dict/sqlite3.Row variants. Derived fields (registered/agent_id on Pane; log_attached/pane_active on Agent; state_priority on Queue; role_priority on Agent; pane_count and registered_agent_count on Container) accept caller-supplied joins as kwargs. 30 unit tests in test_app_view_models.py. T036 app.scan.containers / T037 app.scan.panes / T038 app.scan.status (scan_handlers.py): - wait=true: blocks on ScanRegistry.done up to 30s (FR-030b). On timeout returns scan_timeout with details.scan_id; the scan continues server-side and remains reachable via app.scan.status. - wait=false: kicks the scan off on a daemon thread and returns {scan_id, state: "running"} immediately. - Same-kind coalescing via ScanRegistry.start() (FR-030d). - 4-in-flight cap with validation_failed.too_many_scans_in_flight (FR-030e). - Failed scans return success envelope with state="failed" and the error in `result` — the spec distinguishes wait-timeout from scan-failure (the former is a wait outcome; the latter is a terminal scan state). - app.scan.status returns {state, scan_kind, started_at, completed_at, result} per FR-030c shape. completed_at + result are null while running. scan_not_found for unknown / evicted ids. Wired into APP_DISPATCH; 13 unit tests cover happy paths, coalescing, in-flight cap, timeout, failure path, status lookups (including unknown / missing-arg), and unwired-service safety. Full suite: 147 passing (134 pre-existing + 13 new). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Addresses 3 findings from the orchestrated code review of this branch. Scoped to the 3 stable FEAT-002 files (lifecycle.py, daemon.py, server.py) to avoid racing the concurrent US3 session churning src/agenttower/app_contract/. H1 (HIGH) — silent host-only gate bypass made auditable. The AGENTTOWER_TEST_FORCE_HOST_PEER=1 test seam makes the FR-042 host-vs-container peer check return True for ANY caller — a process-wide bypass of the app.* host-only gate, readable from the daemon's runtime environment. The seam is needed because WSL2 / Docker-in-Docker / sandboxed CI runners false-positive the /proc container probe, so removing it would break the integration suite. Instead it is now LOUD: a new `host_peer_check_bypassed` warn-level lifecycle event is emitted at daemon startup whenever the seam is active, so an operator auditing the daemon log sees the bypass rather than it being silent. New event added to the closed LIFECYCLE_EVENTS set in socket_api/lifecycle.py. H3 (HIGH) — exception text no longer leaked to the wire. The FEAT-002 dispatch catch-all in server.py returned `f"{type(exc).__name__}: {exc}"` to the client — the full exception string, which can carry filesystem paths, SQL fragments, or request content. It now prints the full traceback to daemon stderr (operator-visible) and returns a constant generic message ("internal error handling request; see daemon log") on the wire. test_dispatch_handles_internal_exception_gracefully updated to assert the exception detail is NOT echoed (it now raises with a fake "/etc/shadow" path and asserts that path is absent from the client envelope). M5 (MEDIUM) — honest payload_too_large messaging. `_make_payload_too_large_envelope` documented `actual_size_bytes` as a precise value, but the reader stops at MAX_REQUEST_BYTES + 1 (it deliberately does not drain the socket — draining would defeat the cap), so the field is necessarily a lower bound. Renamed the param to `observed_size_bytes`, message now says "observed at least N bytes", and the docstring explains the lower-bound semantics plus the known FEAT-002-64KiB-vs-FR-003a-1MiB cap gap. Review findings NOT in this commit: - C1 (CRITICAL, 30-vs-12 method gap) — RESOLVED by the concurrent US3 commits c537026 + f19a80c; 32 app.* methods are now wired. - M1 (idempotency.py "dead code") — INVALID; mutations.py imports IdempotencyStore. Reviewer only checked dispatcher.py. - H2 (thread-local peer-pid side channel) — deferred; fixing it means changing every app.* handler signature, too invasive for a remediation commit. Fails closed today. - H4 (single-threaded concurrency tests), M2/M3/M4/M6/M7/M8 + LOWs — deferred; most live in app_contract/ + spec docs that the concurrent US3 session is actively churning. To be picked up once the branch settles. 68/68 FEAT-002 unit tests pass (framing, envelope-body, methods, dispatch-stability). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot