Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
457d5c2
FEAT-014: spec, plan, contracts, checklists, tasks (post-analyze reme…
May 24, 2026
ada6dbc
FEAT-014: post-analyze fixes (M1, M2) + remediation-verification chec…
May 24, 2026
f1c2747
AGENTS+CLAUDE: detect "Already In devBench" so Rule 2 doesn't gate in…
May 24, 2026
f921b12
FEAT-014: fold Clarifications R1 (16 Q/A + 7 notes)
May 24, 2026
5e19dac
FEAT-014: extend T003/T004/T010/T017/T024 for post-R1 FR coverage
May 24, 2026
0ec2d5c
FEAT-014: fold Clarifications R2 (7 Q/A + 7 notes); FR-028 added; +7 …
May 24, 2026
b1ee654
FEAT-014: extend T022 for FR-012 daemon-side symmetric forward compat…
May 24, 2026
36977f9
FEAT-014: document intentionality of the 4 LOW carry-over findings (U…
May 24, 2026
dbd1f3e
FEAT-014 MVP: PaneState + AgentState buckets, v1.1 bump (T001-T009 mi…
May 24, 2026
45e8ee1
FEAT-014: link _compute_pane_state_buckets FR-019 caveat to issue #28
May 25, 2026
47ad73b
FEAT-014: fold R3 + close analyze HIGH (F-019-PB-1) + 3 MEDIUM drift …
May 25, 2026
9f0ad69
FEAT-014: fold swarm code-review M1-M5 polish
May 25, 2026
768e2ca
FEAT-014: fold post-implement analyze residue (D-DRIFT-3 + R3 wording…
May 25, 2026
13759b1
FEAT-014: tick all 13 release-gate checklists (324/324 satisfied @ 76…
May 25, 2026
1e443ee
FEAT-014 T010: skip-counter ring buffer unit tests (TDD red phase)
May 25, 2026
0513ae4
FEAT-014 T011: US2 wire-level dashboard contract tests (TDD red)
May 25, 2026
f06023c
FEAT-014: fold post-T010/T011 analyze MEDIUMs (API shape + constant c…
May 25, 2026
0593ffa
FEAT-014 T013+T014+T015: US2 production wiring (route-skip telemetry)
May 25, 2026
7794b2a
FEAT-014 T016+T021+T022: US3 recommendation tests (RED) + US4 version…
May 25, 2026
37099bd
FEAT-014: fold post-T016 analyze MEDIUMs (T019 API surface + return t…
May 25, 2026
30e8429
FEAT-014 T017+T019+T023: US3 production + US4 SC-004 v1.0-compat regr…
May 25, 2026
3c6bf34
FEAT-014 T020+T025: US3 dashboard wiring + SC-005 omnibus shape test
May 25, 2026
8900e18
FEAT-014: fold L-T020-CLOCK (refreshed_at timestamp race)
May 25, 2026
2a9347f
FEAT-014 T026: cross-reference v1.1 from FEAT-011 contract docs
May 25, 2026
6032a9e
FEAT-014 T027: quickstart walkthrough validated — all 7 steps PASS
May 25, 2026
b982a8c
Issue #27: integration tests v1.0-hardcoded version-string cleanup
May 25, 2026
4f314e5
FEAT-014 T006+T012+T018: integration scenarios over real socket
May 25, 2026
4e242b1
FEAT-014 T024: SC-006 p95 latency + degraded waiver + FR-027 budget-miss
May 25, 2026
54fe774
Merge remote-tracking branch 'origin/main' into 014-app-dashboard-ext…
May 25, 2026
0dfa407
FEAT-014: fold full-branch swarm review (B1 BLOCK + M1-M8 + LOWs)
May 25, 2026
07959b8
FEAT-014: fold PR #29 review findings (P0 CI + P1 + Sourcery polish)
May 25, 2026
72d0217
FEAT-014: fold P1 dashboard-v1_1.md FR-019 strict-eq drift (follow-up…
May 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .specify/feature.json
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
{
"feature_directory": "specs/011-app-backend-contract"
"feature_directory": "specs/014-app-dashboard-extensions"
}
48 changes: 48 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,3 +127,51 @@ implementation and record the resulting issue links or IDs in the handoff.
MVP deployment is host-daemon first: `agenttowerd` runs on the host, bench
containers run thin `agenttower` clients over a mounted Unix socket, and there
is no network listener in MVP.

## Cross-Feature Spec Dir Editing

A feature PR ordinarily edits only its own `specs/<NNN>-<slug>/` directory.
The one allowed exception is **additive cross-reference breadcrumbs**: a
contract-evolution feature MAY add a small subsection to a prior feature's
`specs/<MMM>-<slug>/contracts/*.md` for the sole purpose of pointing readers
at the new evolution. Rules:

- The added subsection MUST be purely additive (a new `## App Contract
Evolution — vX.Y (FEAT-NNN)` heading or similar). It MUST NOT rewrite,
reflow, or delete any prior text in the file.
- The subsection MUST be a pointer to the evolving feature's own
`contracts/` directory, not a re-statement of the new contract content.
- If a feature would need to *modify* (not just append to) a prior feature's
spec dir, it MUST be split into two PRs: the current feature's PR
(self-contained), and a follow-up PR owned by the prior feature's lineage
that does the modification.

The canonical contract docs always live in the feature that introduced the
contract version. Prior-feature spec dirs get pointers, not duplicates.

## Detecting "Already In devBench" (Rule 2 satisfied)

When a Claude or Codex session starts INSIDE the project's devBench container
(as opposed to the host shell), the host-path rule's prescription to "run
codebase commands inside the devBench container" is already satisfied — no
separate routing step (no `docker exec`, no `devbench` wrapper) is required.
Detect "in devBench" deterministically by checking ALL of:

1. `/.dockerenv` exists — any Docker container.
2. `REMOTE_CONTAINERS=true` env var is set — VS Code devcontainer family.
3. The project workspace is mounted at `/workspace/…` — the devBench layout.

When all three signals hold, run codebase commands directly: `python3`,
`pytest`, `pip`, the project's CLI, etc. From inside the container, the
in-container shell IS the runner; the host-path concerns Rule 1 protects
against don't apply, because the container's filesystem layout is stable
across host machines, WSL configurations, and mount-point reshuffles.

When ANY of the three signals is missing, treat the current shell as a host
shell: either route the codebase command through the appropriate devBench
invocation (typical from the host: `docker exec <bench-name> …`) or stop and
ask the user how to route before defaulting to the host.

In this repo's devBench the bench is named `py-Bench`. That name is not
visible from inside the container, so do NOT try to verify it programmatically
— verify "in devBench" via the three structural signals above.
50 changes: 49 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<!-- SPECKIT START -->
For additional context about technologies to be used, project structure,
shell commands, and other important information, read the current plan:
`specs/011-app-backend-contract/plan.md`.
`specs/014-app-dashboard-extensions/plan.md`.
<!-- SPECKIT END -->

# AgentTower Agent Context
Expand Down Expand Up @@ -102,3 +102,51 @@ implementation and record the resulting issue links or IDs in the handoff.
MVP deployment is host-daemon first: `agenttowerd` runs on the host, bench
containers run thin `agenttower` clients over a mounted Unix socket, and there
is no network listener in MVP.

## Cross-Feature Spec Dir Editing

A feature PR ordinarily edits only its own `specs/<NNN>-<slug>/` directory.
The one allowed exception is **additive cross-reference breadcrumbs**: a
contract-evolution feature MAY add a small subsection to a prior feature's
`specs/<MMM>-<slug>/contracts/*.md` for the sole purpose of pointing readers
at the new evolution. Rules:

- The added subsection MUST be purely additive (a new `## App Contract
Evolution — vX.Y (FEAT-NNN)` heading or similar). It MUST NOT rewrite,
reflow, or delete any prior text in the file.
- The subsection MUST be a pointer to the evolving feature's own
`contracts/` directory, not a re-statement of the new contract content.
- If a feature would need to *modify* (not just append to) a prior feature's
spec dir, it MUST be split into two PRs: the current feature's PR
(self-contained), and a follow-up PR owned by the prior feature's lineage
that does the modification.

The canonical contract docs always live in the feature that introduced the
contract version. Prior-feature spec dirs get pointers, not duplicates.

## Detecting "Already In devBench" (Rule 2 satisfied)

When a Claude or Codex session starts INSIDE the project's devBench container
(as opposed to the host shell), the host-path rule's prescription to "run
codebase commands inside the devBench container" is already satisfied — no
separate routing step (no `docker exec`, no `devbench` wrapper) is required.
Detect "in devBench" deterministically by checking ALL of:

1. `/.dockerenv` exists — any Docker container.
2. `REMOTE_CONTAINERS=true` env var is set — VS Code devcontainer family.
3. The project workspace is mounted at `/workspace/…` — the devBench layout.

When all three signals hold, run codebase commands directly: `python3`,
`pytest`, `pip`, the project's CLI, etc. From inside the container, the
in-container shell IS the runner; the host-path concerns that Rule 1
protects against don't apply, because the container's filesystem layout is stable
across host machines, WSL configurations, and mount-point reshuffles.

When ANY of the three signals is missing, treat the current shell as a host
shell: either route the codebase command through the appropriate devBench
invocation (typical from the host: `docker exec <bench-name> …`) or stop and
ask the user how to route before defaulting to the host.

In this repo's devBench the bench is named `py-Bench`. That name is not
visible from inside the container, so do NOT try to verify it programmatically
— verify "in devBench" via the three structural signals above.
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,9 @@ packages = ["src/agenttower"]
# the run (e.g., unit-only for the SonarQube workflow).
testpaths = ["tests"]
addopts = "-ra"
markers = [
"v1_1: FEAT-014 v1.1-additive assertion. Deselected by T023's SC-004 v1.0-compat regression via `pytest -m 'not v1_1'`. See tasks.md §Notes 'v1.1 marker rule'.",
]

[tool.coverage.run]
source = ["src/agenttower"]
Expand Down
51 changes: 51 additions & 0 deletions specs/011-app-backend-contract/contracts/app-methods.md
Original file line number Diff line number Diff line change
Expand Up @@ -457,3 +457,54 @@ Poll a previously-issued scan (FR-030c).
- 13 operator mutations: `app.agent.update`, `app.log.attach`, `app.log.detach`, `app.send_input`, `app.queue.{approve,delay,cancel}` (3), `app.route.{add,remove,update}` (3), `app.scan.{containers,panes,status}` (3)

Total: 2 + 2 + 14 + 1 + 13 = **32**. All required at v1.0; `capability_flags = {}` reflects "every method is mandatory" (FR-039). The wired `DISPATCH` table is therefore 35 legacy FEAT-002..010 methods + 32 `app.*` = 67 entries (pinned by `tests/unit/test_dispatch_table_stability.py`).

---

## App Contract Evolution — v1.1 (FEAT-014)

> **Additive breadcrumb only.** This subsection is added per the
> AGENTS.md §Cross-Feature Spec Dir Editing exception. The canonical
> v1.1 contract docs live in FEAT-014's own spec dir — see the pointers
> below. Nothing above this subsection has been rewritten, reflowed, or
> deleted; FEAT-011's v1.0 contract specification is unchanged.

**FEAT-014** extends `app.dashboard` as an **additive v1.1 minor** (FR-013 /
FR-014 of FEAT-014). No new method is introduced; no v1.0 field is renamed,
retyped, or removed. Capability flags remain empty (FR-015). The bump is:

- `app_contract_version`: `"1.0"` → `"1.1"`
- `supported_minor_range.max`: `"1.0"` → `"1.1"`

**Additive fields on `app.dashboard` success result** (full per-field shape
+ types + nullability + cross-references in the FEAT-014 wire-shape doc
linked below):

- `counts.panes.by_state` — 4-key `PaneState` closed set alongside the v1.0
`{total, registered, unregistered}` panes counts.
- `counts.agents.by_state` — 5-key `AgentState` closed set alongside the
v1.0 `{total, by_role}` agents counts.
- `counts.routes.recently_skipped_count` + `counts.routes.recently_skipped_window_ms`
— process-local sliding-window count of recent FEAT-010 route-skip events
with fixed 300_000 ms window (Clarifications Q6 of FEAT-014).
- `recommended_next_action` — server-computed operator next-step object
(7-code closed set; first-match precedence per FR-010 of FEAT-014).
- `recommended_next_action_refreshed_at` — ISO-8601 UTC ms wall clock
timestamp paired with `recommended_next_action` (paired-null on compute
failure per FR-021 of FEAT-014).

**Canonical FEAT-014 contract docs** (do NOT duplicate or restate here —
read the originals):

- Wire shape, per-field type + nullability + range, paired-null invariant,
failure-mode response shape, latency-budget waiver semantics:
`specs/014-app-dashboard-extensions/contracts/dashboard-v1_1.md`
- New closed-set values (`PaneState`, `AgentState`, `RecommendationCode`,
`TargetKind` v1.1 addition, `RecommendationTimestamp`, `AppContractVersion`)
with per-set Future Evolution governance:
`specs/014-app-dashboard-extensions/contracts/closed-sets-v1_1.md`

**For v1.0 clients**: every field added above is silently ignorable —
FR-012 of FEAT-014 mandates clients ignore unknown closed-set values and
unknown response fields. The v1.0 contract test suite passes unchanged
against a v1.1-advertising daemon (SC-004 of FEAT-014; the regression is
implemented in `tests/unit/test_v1_0_compat.py`).
8 changes: 8 additions & 0 deletions specs/011-app-backend-contract/contracts/closed-sets.md
Original file line number Diff line number Diff line change
Expand Up @@ -380,3 +380,11 @@ Examples: `created_at`, `created_at:asc`, `created_at:desc`. Any other suffix
## Filter Operator Vocabulary at v1.0 (FR-024a)

Exact match only. No `<`, `>`, `<=`, `>=`, `~`, `LIKE`, regex, IN-list, or set-membership operators on filter fields. Time ranges use the paired `since` / `until` unix-ms integer parameters separately from filter exact match. A v1.0 filter value containing operator-like syntax → `validation_failed.details.field == "<offending field>"`.

---

## App Contract Evolution — v1.1 (FEAT-014)

> **Additive breadcrumb only** — per the AGENTS.md §Cross-Feature Spec Dir Editing exception. FEAT-011's v1.0 closed sets above are unchanged.

FEAT-014 introduces new v1.1 closed sets (`PaneState`, `AgentState`, `RecommendationCode`) and extends one existing v1.0 set (`TargetKind` gains the value `subsystem`). The canonical enumeration with per-set Future Evolution governance lives in `specs/014-app-dashboard-extensions/contracts/closed-sets-v1_1.md` — do NOT restate the values here. v1.0 clients silently ignore unknown values per FR-012 of FEAT-014.
61 changes: 61 additions & 0 deletions specs/014-app-dashboard-extensions/checklists/api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# API Requirements Quality Checklist: App Dashboard Extensions v1.1

**Purpose**: Audit requirements quality for the dashboard contract surface — response shape, fields, semantics, evolution rules, idempotency, and determinism.
**Created**: 2026-05-24
**Feature**: [spec.md](../spec.md)

## Contract Surface Completeness

- [X] CHK001 - Are all v1.1 additive fields enumerated by name at one canonical location in the spec? [Completeness, Spec §FR-001..§FR-009]
- [X] CHK002 - Is the response envelope structure (top-level keys around `counts`, `recommended_next_action`, `recommended_next_action_refreshed_at`) specified as either a referenced contract doc or an inline schema sketch? [Completeness, Gap]
- [X] CHK003 - Are required vs optional vs nullable distinctions made for every v1.1 field? [Completeness, Spec §FR-003, §FR-011, §FR-021]
- [X] CHK004 - Is the type of every numeric count (`integer`, signed/unsigned, bounds) specified? [Completeness, Spec §FR-003, §FR-007]
- [X] CHK005 - Are unit conventions for all duration fields stated (e.g., `_ms` suffix → milliseconds)? [Clarity, Spec §FR-007]

## Response Field Clarity

- [X] CHK006 - Are the closed-set values for `recommended_next_action.code` listed exhaustively in FR-010 and matched in the Clarifications precedence note? [Completeness, Spec §FR-010, §Clarifications]
- [X] CHK007 - Is the closed set for `target.kind` listed exhaustively, including the v1.1 addition `subsystem`? [Completeness, Spec §FR-011]
- [X] CHK008 - Is the meaning of `target.id` per `target.kind` documented (e.g., for `target.kind == subsystem` what string forms are valid)? [Ambiguity, Spec §FR-011]
- [x] CHK009 - Are `title` and `detail` distinguishable in purpose so two different writers would generate the same prose for the same condition? [Clarity, Spec §FR-011] [EDIT-applied: contracts/closed-sets-v1_1.md §Per-code title/detail Templates]

## Idempotency & Determinism

- [X] CHK010 - Is `app.dashboard` declared as a read-side, side-effect-free request? [Completeness, Gap]
- [X] CHK011 - Is "recomputed on every call" reconciled with a same-input-same-output determinism guarantee, so two concurrent callers see the same code when underlying state is unchanged? [Clarity, Spec §Clarifications Q8]
- [X] CHK012 - Is the precedence list specified as a strict total order so first-match resolution is unambiguous even for novel combinations of matching conditions? [Clarity, Spec §FR-010]
- [X] CHK013 - Is the case where multiple `target` candidates exist for a single code (e.g., multiple unadopted panes, multiple degraded subsystems) resolved by a documented selection rule? [Gap, Spec §FR-010, §FR-011]

## Compatibility & Evolution

- [X] CHK014 - Is the additive-minor rule explicitly stated as a requirement on both the daemon side (always-emit) and the client side (ignore-unknown)? [Completeness, Spec §FR-012, §FR-014, §Clarifications Q10]
- [X] CHK015 - Is the v1.0→v1.1 contract version bump described in terms of an advertised supported minor range, not only a new value? [Clarity, Spec §FR-013]
- [X] CHK016 - Are removal/renaming prohibitions on v1.0 fields, methods, and error codes stated as MUST NOT, not soft preferences? [Completeness, Spec §FR-014]
- [X] CHK017 - Is the absence of a new capability flag stated as a deliberate requirement, not an oversight? [Completeness, Spec §FR-015]

## Error Envelope & Exception Flow

- [X] CHK018 - Are dashboard read errors (e.g., method-level failure) specified separately from "recommendation compute failed" (which is success with nulls)? [Clarity, Spec §FR-021]
- [x] CHK019 - Is the response shape when the daemon advertises only v1.0 but is asked for v1.1 fields specified, or is it impossible by handshake construction? [Gap, Spec §FR-013] [EDIT-applied: contracts/dashboard-v1_1.md §Versioning Behavior now covers v1.0-daemon + v1.1-aware client]

## Scenario Coverage

- [X] CHK020 - Are primary-path requirements (healthy daemon, mixed state) covered by US1/US2/US3? [Coverage, Spec §US1, §US2, §US3]
- [X] CHK021 - Are alternate-path requirements (v1.0 client against v1.1 daemon) covered by US4? [Coverage, Spec §US4]
- [X] CHK022 - Are exception/error requirements (compute failure, degraded subsystem, no containers) explicitly covered in FRs and acceptance scenarios? [Coverage, Spec §FR-021, §US3]
- [X] CHK023 - Are recovery requirements (post-restart skip counter, post-restart recommendation state) covered? [Coverage, Spec §FR-008, §Clarifications Q8]
- [X] CHK024 - Are non-functional requirements (latency budget) defined for the new fields specifically, not just inherited generally from v1.0? [Coverage, Spec §SC-006]

## Documentation Quality

- [X] CHK025 - Is FR-016 specific enough to verify (which docs file, which sections, which closed-set value definitions must be added)? [Measurability, Spec §FR-016]

## Plan & Design Alignment (re-verify 2026-05-24)

- [X] CHK026 - Does dashboard-v1_1.md document every v1.1 field with type, nullability, range/format, and a cross-reference to the FR or Clarifications source? [Completeness, Contracts dashboard-v1_1.md]
- [X] CHK027 - Is the per-code `target` rule table in data-model.md §RecommendedNextAction reflected in dashboard-v1_1.md §Field-by-Field, giving a wire-test author one source of truth instead of two? [Consistency, Data Model, Contracts dashboard-v1_1.md]
- [X] CHK028 - Does dashboard-v1_1.md's "no new error codes" statement match plan.md's "no new error code" constraint? [Consistency, Contracts dashboard-v1_1.md §Error Behavior, Plan §Constraints]
- [X] CHK029 - Are the v1.0 fields shown in dashboard-v1_1.md (for context only) consistent with FEAT-011's actual `app.dashboard` v1.0 contract, so future drift wouldn't mislead an implementer? [Risk, Contracts dashboard-v1_1.md]
- [X] CHK030 - Does closed-sets-v1_1.md mark v1.1 additions (specifically `subsystem` in TargetKind) so a future v1.2 reader can see what was added and when? [Clarity, Contracts closed-sets-v1_1.md §TargetKind]
- [X] CHK031 - Is the precedence-order table in closed-sets-v1_1.md §RecommendationCode in the same numeric order (1..7) as FR-010 and the Clarifications precedence note? [Consistency, Contracts closed-sets-v1_1.md]
- [X] CHK032 - Is the determinism guarantee from Research §CC reflected in the contract docs (so two implementers cannot disagree about whether concurrent calls may diverge)? [Coverage, Research §CC, Contracts dashboard-v1_1.md]
Loading
Loading