test(skillopt): full-chain canary lifecycle E2E (#484)#504
Merged
Conversation
Drive the #484 canary lifecycle end to end with no LLM, no network, and deterministic routing, closing the two integration seams the existing unit tests bypass: 1. Mailbox.Enqueue -> routeCanary actually sending sampled traffic to the canary version's snapshot (asserted via the resolved commit on the persisted job payload), and 2. canaryRegressionHarvester.Harvest() wrapping the REAL #465 inner harvester, which attributes a canary-routed job's outcome to the canary version's OWN auto-trace run (skillopt.AutoTraceRunID(canaryID)) via the same UpsertFeedbackEvent path prod uses -- never hand-seeded. Subtests cover rollback (regressing canary -> rejected, champion stays current, candidate.rolled_back once), graduate (healthy canary -> current, candidate.auto_promoted once), and the exactly-once rolled_back dedup under two racing Harvest calls. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n CI (#484) The full-chain canary E2E hand-wired both seams (Mailbox.CanaryEnabled and a &canaryRegressionHarvester{...}), leaving the daemon's config->component glue — canaryRoutingEnabled and daemonOutcomeHarvesterWithCanary — with no test coverage, so a regression in that gating (forgotten wrap, wrong minSamples, wrong gate) could ship silently despite the "full-chain" billing. - Add the wired_through_daemon_config_assembly subtest: write a real [skillopt] canary config into a fixture home and drive routing via canaryRoutingEnabled and the harvester via daemonOutcomeHarvesterWithCanary, asserting the wrapper type + policy-sourced minSamples, the ON/OFF gate, and a store-observable rollback. - Run internal/cli under -race in CI so the concurrent dedup subtest is actually exercised by the detector; document the MaxOpenConns(1) serialization that makes it reliable, and add a deterministic, scheduling-independent re-reject assertion pinning the idempotent exactly-once contract. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this drives
A deterministic, NO-LLM, no-network full-chain integration E2E for the #484 canary lifecycle (
internal/cli/canary_e2e_test.go). It exercises the canary lifecycle end to end in one flow:runCandidateNotifyin canary mode so v2 becomes a livecanaryrow behind the still-current champion (the prod promote path, not a hand-writtenCanaryPromote). Asserts the champion stays current.workflow.Mailbox{CanaryEnabled: true}with a deterministic injectedCanaryRand. EachMailbox.Enqueueresolves the template throughtemplateSnapshot -> routeCanary; draws below the sample resolve the CANARY snapshot, draws at/above resolve the CHAMPION (asserted via the resolved commit on the persisted job payload).canaryRegressionHarvester.Harvest()(the Promotion follow-on: canary + auto-rollback for auto-promote #484 decorator wrapping the genuine Mode A: automatic trace-harvested feedback for implement agents (verifiable outcomes → SkillOpt) #465skillopt.OutcomeHarvester), NOTevaluate()directly. The inner harvester attributes each outcome to the canary version's OWN auto-trace run (skillopt.AutoTraceRunID(canaryID)) via the sameUpsertFeedbackEventpath prod uses — the canary run is never hand-seeded. With regressing outcomes the canary is rejected, the prior champion stays the live current version, andcandidate.rolled_backfires exactly once.CombinedStatusReaderstub) graduate the canary to current withcandidate.auto_promotedonce.Harvest()calls crossingmin_samplesemitrolled_backexactly once (thechanged-bool gate over the idempotentRejectAgentTemplateVersion).Why it fails if the integration breaks
The two seams the existing
candidate_canary_test.go(callsevaluate()with hand-seeded canary feedback) andmailbox_test.go(routing in isolation) bypass are genuinely connected here: routing produces the persisted job payload whose resolved commit is fed back into the harvester. If routing stopped sending sampled traffic to the canary (or the harvest stopped attributing a canary-routed outcome to the canary version's run), the canary's auto-trace run would be empty — soassertCanaryFeedbackCountfails and the regression window holds (no rollback/graduate), failing the state and event-count assertions. Verified by a temporary mutation (forcingrouteCanaryto never route): the routing-split assertions fail immediately. Avoids a hollow test that would pass if routing or attribution were no-ops.Test only
No production behavior changes. Uses only the existing injection seams (
Mailbox.CanaryRand, the real harvester constructor) and a small offlineCombinedStatusReaderstub for the strong-positive merge band.Relates #484.
Gate green in the worktree:
go build ./...,go vet ./...,go test ./internal/cli/ ./internal/workflow/ ./internal/skillopt/ ./internal/db/, andgo test -race ./internal/cli/ ./internal/workflow/.🤖 Generated with Claude Code