GCWing · limityan · Apr 28, 2026 · Apr 28, 2026 · Apr 30, 2026 · May 8, 2026
diff --git a/agent-runtime-budget-governance-design.md b/agent-runtime-budget-governance-design.md
diff --git a/context-reliability-architecture.md b/context-reliability-architecture.md
diff --git a/docs/deep-review-design.md b/docs/deep-review-design.md
diff --git a/docs/deep-review-phase2-addendum.md b/docs/deep-review-phase2-addendum.md
@@ -0,0 +1,194 @@
+# Deep Review Phase 2 Addendum Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Close the semantic gaps found after comparing the latest Deep Review implementation with `docs/deep-review-design.md` and `docs/deep-review-phase2-plan.md`.
+
+**Architecture:** Keep the existing prompt-driven DeepReview orchestrator, but move the riskiest guarantees into deterministic runtime boundaries where the current documents overstate completion. The addendum does not expand Deep Review into a new scheduler architecture; it tightens cache correctness, launch backpressure, retry semantics, observability, and documentation truthfulness.
+
+**Tech Stack:** Rust core (`bitfun-core`), TypeScript React frontend (`src/web-ui`), Vitest, Cargo tests.
+
+---
+
+## Source Documents
+
+- `docs/deep-review-design.md`: original strategy-engine and reviewer-role design.
+- `docs/deep-review-phase2-plan.md`: Phase 2 implementation plan and status table.
+- This addendum only covers gaps discovered after the latest implementation. It does not replace either source document.
+
+## Current Truth Model
+
+| Area | Current code truth | Addendum stance |
+|---|---|---|
+| Conditional Frontend Reviewer | Diff/path classification and conditional manifest behavior exist | Keep as implemented; improve only if future conditional reviewers need the same manifest contract |
+| Custom review subagents | Required tooling is centralized and invalid agents are skipped with `invalid_tooling` | Keep as implemented; add regression coverage only when touching this area |
+| Dynamic concurrency | TaskTool has a hard cap safety net with structured rejection text | Keep queue/wait deferred unless hard rejection causes real launch failures |
+| Retry | Budget tracking and retry guidance exist | Keep retry prompt-guided; automatic backend redispatch remains deferred |
+| Partial timeout | Grace-period final-message capture exists | Keep limitation documented; do not claim arbitrary stream-fragment capture |
+| Incremental cache | Per-session cache struct, metadata field, `packet_id` read path, completed-reviewer write-through, and hit/miss report signals exist | Keep project-level cache deferred |
+| Token budget | File-split/max-file style guardrails exist | Defer prompt-byte enforcement until prompt-size estimation is available |
+| Shared context cache | Prompt-only | Keep deferred unless duplicate Read/GetFileDiff cost becomes a measured bottleneck |
+| Observability | User/report-facing reliability summaries exist for cache, skipped-reviewer, token-budget, partial-timeout, retry, and structured cap signals | Keep external telemetry deferred unless diagnostics require it |
+| First-run consent dialog | Dialog shows cost/time/lineup details and has missing skipped-reason locale keys | Simplify to key reminders, preserve strategy choice and skipped-reviewer warnings, and complete locale coverage |
+
+## Scope Boundaries
+
+In scope:
+
+- Correct documentation claims so they match current code.
+- Simplify first-run Deep Review consent so users see only the key launch reminders and all visible strings are localized.
+- Close incremental review cache correctness for per-session continuation.
+- Add a deterministic backpressure path for DeepReview reviewer launch caps if hard rejection proves too brittle.
+- Make retry status explicit: either implement automatic redispatch or keep the code/documentation as prompt-guided retry.
+- Add metrics/report surfaces for cap rejection, partial timeout, retry, cache hit/miss, skipped reviewers, and token-budget decisions.
+
+Out of scope:
+
+- Project-level cross-session review cache.
+- Programmatic shared Read/GetFileDiff cache.
+- Full prompt-byte budget enforcement.
+- Large diff summary generation as a mandatory pre-review step.
+- Replacing the prompt-driven DeepReview orchestrator with a full backend-owned DAG scheduler.
+
+## Risk Register
+
+| Risk | Scenario | Mitigation |
+|---|---|---|
+| Hard cap rejection causes missed reviewers | Weak orchestrator launches too many reviewers and does not recover from tool error | Add queue/wait behavior or keep rejection surfaced in report coverage notes |
+| Cache returns stale review output | Fingerprint misses strategy, roster, renamed path, or model changes | Use `packet_id` keys and invalidate on target files, reviewer roster, strategy, model, and relevant manifest changes |
+| Cache key mismatch | Writer stores by `packet_id`, reader checks `subagent_type` | Resolve packet id from Task description/run manifest before reading cache |
+| Retry loops grow token cost | Model retries broad scope repeatedly | Enforce max retries per role and require reduced scope plus `retry: true` |
+| Partial output is overtrusted | Timed-out reviewer emits incomplete analysis | Preserve `partial_timeout` and reliability signals; judge/report treats it as lower confidence |
+| Observability adds noise | Reports become too dense | Keep raw counters in metadata and show only summarized user-facing notes |
+| Privacy retention is unclear | Cached reviewer output stores sensitive code findings | Keep first closure per-session only; document retention and deletion semantics before project-level cache |
+
+## File Map
+
+Documentation:
+
+- `docs/deep-review-design.md`: source-of-truth design status and remaining work.
+- `docs/deep-review-phase2-plan.md`: reconciled Phase 2 implementation status.
+- `docs/deep-review-phase2-addendum.md`: this addendum and follow-up implementation plan.
+
+Rust core:
+
+- `src/crates/core/src/agentic/deep_review_policy.rs`: policy helpers, concurrency policy, retry budget, incremental cache helpers.
+- `src/crates/core/src/agentic/tools/implementations/task_tool.rs`: reviewer launch enforcement, cache hit lookup, retry guidance.
+- `src/crates/core/src/agentic/tools/implementations/code_review_tool.rs`: normalized reviewer packet metadata and completed-reviewer cache write-through boundary.
+- `src/crates/core/src/agentic/coordination/coordinator.rs`: subagent execution, timeout/partial-result handling, possible future queue/wait boundary.
+- `src/crates/core/src/service/session/types.rs`: session metadata fields for run manifest and cache data.
+- `src/crates/core/src/agentic/persistence/manager.rs`: session metadata persistence.
+
+Frontend:
+
+- `src/web-ui/src/shared/services/reviewTeamService.ts`: manifest construction, work packets, token budget plan, skipped reviewer reporting.
+- `src/web-ui/src/shared/services/reviewTargetClassifier.ts`: review target classification and conditional reviewer applicability.
+- `src/web-ui/src/shared/services/reviewSubagentCapabilities.ts`: custom review subagent tool contract.
+- `src/web-ui/src/flow_chat/utils/codeReviewReport.ts`: report formatting for packet, partial, cache, and reliability metadata.
+- `src/web-ui/src/flow_chat/components/DeepReviewConsentDialog.tsx`: user-facing run strategy override and launch summary.
+
+## Implementation Rounds
+
+### Round 1: Documentation Reconciliation
+
+**Goal:** Make the docs truthful before more code changes.
+
+- [x] Update `docs/deep-review-design.md` so completed/remaining work distinguishes complete runtime behavior from safety-net or prompt-guided behavior.
+- [x] Update `docs/deep-review-phase2-plan.md` so Phase B and Phase C are marked partial where the code has cap/guidance/read-path behavior but not full scheduler/retry/cache semantics.
+- [x] Add this addendum with scope boundaries, risk register, file map, and follow-up rounds.
+
+Verification:
+
+- `rg -n "PARTIAL|Safety net|write-through|queue|redispatch|prompt-guided" docs/deep-review-design.md docs/deep-review-phase2-plan.md docs/deep-review-phase2-addendum.md`
+- Expected: the three docs explicitly describe partial completion and do not claim queued dispatch, automatic retry, or cache write-through as complete.
+
+### Round 2: First-Run Consent Dialog I18n And Density
+
+**Goal:** Keep the first Deep Review confirmation dialog useful without making it feel like a full report.
+
+- [x] Replace the two cost/time fact cards and active reviewer chip list with a compact reminder area.
+- [x] Keep only these visible decisions and warnings:
+  - Deep Review can take longer and use more tokens than a standard review.
+  - The first pass is read-only.
+  - The selected run strategy can be changed before launch.
+  - Skipped reviewers are shown only when there are skipped reviewers.
+- [x] Keep strategy override behavior unchanged, including project-level persistence when the user confirms.
+- [x] Add missing locale keys for skipped reviewer reasons in `en-US`, `zh-CN`, and `zh-TW`.
+- [x] Update dialog tests so they assert compact summary behavior and no longer require active reviewer names to be shown.
+
+Verification:
+
+- `pnpm --dir src/web-ui run test:run -- src/flow_chat/components/DeepReviewConsentDialog.test.tsx`
+- Expected: the dialog still opens when skipped reviewers exist, skipped reason text is localized through locale keys, and strategy override persistence still works.
+
+### Round 3: Incremental Cache Correctness Closure
+
+**Goal:** Make per-session incremental review cache safe enough to claim as implemented.
+
+- [x] Add tests in `src/crates/core/src/agentic/deep_review_policy.rs` for cache invalidation on fingerprint mismatch and for packet keys that include split-review suffixes such as `reviewer:ReviewSecurity:group-1-of-3`.
+- [x] Add TaskTool tests in `src/crates/core/src/agentic/tools/implementations/task_tool.rs` for resolving cache keys from the Task description pattern `[packet reviewer:ReviewSecurity]` and from unique run-manifest work packets.
+- [x] Change TaskTool cache lookup from `subagent_type` to resolved `packet_id`, with no cache hit when multiple packets match and no packet id is present.
+- [x] Add a write-through boundary after `submit_code_review` normalizes reviewer packet metadata. Store only reviewers with `status=completed` and a non-empty `packet_id`.
+- [x] Preserve per-session scope only. Do not add project-level storage in this round.
+
+Verification:
+
+- `cargo test -p bitfun-core deep_review -- --nocapture`
+- Expected: cache helper tests pass, ambiguous packet lookup does not produce a false cache hit, and completed reviewer output can be written and read by the same packet id.
+
+### Round 4: Launch Backpressure And Retry Semantics
+
+**Goal:** Remove model-dependent ambiguity around concurrency cap and retry behavior without replacing the whole orchestrator.
+
+- [x] Add a focused DeepReview TaskTool test showing current cap rejection behavior for excess reviewer launches.
+- [x] Decide implementation mode:
+  - Preserved rejection for now and kept the existing structured `code`/`message` tool-error payload.
+  - Deferred bounded backpressure until real launch failures show hard rejection is too brittle.
+- [x] Keep `staggerSeconds` as a documented future scheduler field unless this round implements actual wait spacing.
+- [x] For retry, keep the current prompt-guided path unless structured partial coverage extraction is added. If automatic redispatch is added later, require `retry: true`, lower timeout, and reduced assigned scope.
+
+Verification:
+
+- `cargo test -p bitfun-core deep_review -- --nocapture`
+- `cargo test -p bitfun-core coordination -- --nocapture`
+- Expected: cap behavior is deterministic and tested; retry status is either prompt-guided and documented, or automatic redispatch is covered by tests.
+
+### Round 5: Observability, Report Surfacing, And UI Stability
+
+**Goal:** Make runtime tradeoffs visible without making the Deep Review report noisy.
+
+- [x] Add structured counters or report metadata for cache hit/miss, concurrency cap rejection/wait, partial timeout, retry guidance, skipped reviewers, and token-budget skipped extras.
+- [x] Update `src/web-ui/src/flow_chat/utils/codeReviewReport.ts` to summarize only meaningful reliability signals in exported Markdown.
+- [x] Keep dense details collapsed or metadata-only in UI components so report readability does not regress.
+- [x] Confirm all new user-facing strings are localized in `src/web-ui/src/locales/en-US/flow-chat.json`, `zh-CN`, and `zh-TW` when UI text is added.
+
+Verification:
+
+- `pnpm run lint:web`
+- `pnpm run type-check:web`
+- `pnpm --dir src/web-ui run test:run`
+- Expected: locale coverage remains complete, report formatting tests cover the new summary rows, and no layout text overflow is introduced.
+
+### Round 6: Final Verification And Documentation Freeze
+
+**Goal:** Prove code and docs still match after the addendum work.
+
+- [x] Re-read `docs/deep-review-design.md`, `docs/deep-review-phase2-plan.md`, and this addendum against the implemented code paths.
+- [x] Update only status wording if implementation choices differ from this plan.
+- [x] Run the smallest complete verification set for touched areas.
+
+Verification:
+
+- For Rust-only changes: `cargo test -p bitfun-core deep_review -- --nocapture`
+- For frontend/report changes: `pnpm run lint:web && pnpm run type-check:web && pnpm --dir src/web-ui run test:run`
+- For cross-boundary behavior: `cargo check --workspace --exclude bitfun-cli`
+
+Expected: no document claims exceed code behavior, and every intentionally deferred item remains marked deferred.
+
+## Completion Criteria
+
+- Existing docs no longer state that queued dispatch, automatic retry redispatch, cache write-through, or prompt-byte budget enforcement are complete unless code implements them.
+- Per-session incremental cache can only return data by a verified packet id and matching fingerprint.
+- Concurrency cap behavior is either user/report-visible as a skipped/rejected reviewer or is converted into bounded backpressure.
+- Retry behavior is explicitly either prompt-guided or automatic; there is no mixed wording.
+- User-facing additions remain localized and report density remains controlled.