Skip to content

test(skillopt): full-chain canary lifecycle E2E (#484)#504

Merged
jerryfane merged 2 commits into
mainfrom
test/484-canary-e2e
Jun 27, 2026
Merged

test(skillopt): full-chain canary lifecycle E2E (#484)#504
jerryfane merged 2 commits into
mainfrom
test/484-canary-e2e

Conversation

@jerryfane

Copy link
Copy Markdown
Owner

What this drives

A deterministic, NO-LLM, no-network full-chain integration E2E for the #484 canary lifecycle (internal/cli/canary_e2e_test.go). It exercises the canary lifecycle end to end in one flow:

  1. Fixture — install planner v1 (champion, current) and drive the REAL runCandidateNotify in canary mode so v2 becomes a live canary row behind the still-current champion (the prod promote path, not a hand-written CanaryPromote). Asserts the champion stays current.
  2. Routing leg — a real workflow.Mailbox{CanaryEnabled: true} with a deterministic injected CanaryRand. Each Mailbox.Enqueue resolves the template through templateSnapshot -> routeCanary; draws below the sample resolve the CANARY snapshot, draws at/above resolve the CHAMPION (asserted via the resolved commit on the persisted job payload).
  3. Attribution + rollback leg — the canary-routed job payloads are fed through the REAL canaryRegressionHarvester.Harvest() (the Promotion follow-on: canary + auto-rollback for auto-promote #484 decorator wrapping the genuine Mode A: automatic trace-harvested feedback for implement agents (verifiable outcomes → SkillOpt) #465 skillopt.OutcomeHarvester), NOT evaluate() directly. The inner harvester attributes each outcome to the canary version's OWN auto-trace run (skillopt.AutoTraceRunID(canaryID)) via the same UpsertFeedbackEvent path prod uses — the canary run is never hand-seeded. With regressing outcomes the canary is rejected, the prior champion stays the live current version, and candidate.rolled_back fires exactly once.
  4. Graduate mirror — healthy real-CI canary outcomes (via a tiny CombinedStatusReader stub) graduate the canary to current with candidate.auto_promoted once.
  5. Dedup — two racing Harvest() calls crossing min_samples emit rolled_back exactly once (the changed-bool gate over the idempotent RejectAgentTemplateVersion).

Why it fails if the integration breaks

The two seams the existing candidate_canary_test.go (calls evaluate() with hand-seeded canary feedback) and mailbox_test.go (routing in isolation) bypass are genuinely connected here: routing produces the persisted job payload whose resolved commit is fed back into the harvester. If routing stopped sending sampled traffic to the canary (or the harvest stopped attributing a canary-routed outcome to the canary version's run), the canary's auto-trace run would be empty — so assertCanaryFeedbackCount fails and the regression window holds (no rollback/graduate), failing the state and event-count assertions. Verified by a temporary mutation (forcing routeCanary to never route): the routing-split assertions fail immediately. Avoids a hollow test that would pass if routing or attribution were no-ops.

Test only

No production behavior changes. Uses only the existing injection seams (Mailbox.CanaryRand, the real harvester constructor) and a small offline CombinedStatusReader stub for the strong-positive merge band.

Relates #484.

Gate green in the worktree: go build ./..., go vet ./..., go test ./internal/cli/ ./internal/workflow/ ./internal/skillopt/ ./internal/db/, and go test -race ./internal/cli/ ./internal/workflow/.

🤖 Generated with Claude Code

jerryfane and others added 2 commits June 27, 2026 11:13
Drive the #484 canary lifecycle end to end with no LLM, no network, and
deterministic routing, closing the two integration seams the existing unit
tests bypass:

  1. Mailbox.Enqueue -> routeCanary actually sending sampled traffic to the
     canary version's snapshot (asserted via the resolved commit on the
     persisted job payload), and
  2. canaryRegressionHarvester.Harvest() wrapping the REAL #465 inner
     harvester, which attributes a canary-routed job's outcome to the canary
     version's OWN auto-trace run (skillopt.AutoTraceRunID(canaryID)) via the
     same UpsertFeedbackEvent path prod uses -- never hand-seeded.

Subtests cover rollback (regressing canary -> rejected, champion stays
current, candidate.rolled_back once), graduate (healthy canary -> current,
candidate.auto_promoted once), and the exactly-once rolled_back dedup under
two racing Harvest calls.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n CI (#484)

The full-chain canary E2E hand-wired both seams (Mailbox.CanaryEnabled and a
&canaryRegressionHarvester{...}), leaving the daemon's config->component glue —
canaryRoutingEnabled and daemonOutcomeHarvesterWithCanary — with no test
coverage, so a regression in that gating (forgotten wrap, wrong minSamples,
wrong gate) could ship silently despite the "full-chain" billing.

- Add the wired_through_daemon_config_assembly subtest: write a real [skillopt]
  canary config into a fixture home and drive routing via canaryRoutingEnabled
  and the harvester via daemonOutcomeHarvesterWithCanary, asserting the wrapper
  type + policy-sourced minSamples, the ON/OFF gate, and a store-observable
  rollback.
- Run internal/cli under -race in CI so the concurrent dedup subtest is actually
  exercised by the detector; document the MaxOpenConns(1) serialization that
  makes it reliable, and add a deterministic, scheduling-independent re-reject
  assertion pinning the idempotent exactly-once contract.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jerryfane jerryfane merged commit 7597c26 into main Jun 27, 2026
1 check passed
@jerryfane jerryfane deleted the test/484-canary-e2e branch June 27, 2026 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant