test(skillopt): cross-repo pairwise round-trip E2E (#507/#508)#514
Merged
Conversation
Add a deterministic, offline, CI-runnable cross-repo round-trip E2E that catches a contract-shape mismatch between the gitmoot-skillopt fork's emitted blinded live-pairwise packet (#507) and the Go importer (#508, `gitmoot skillopt pairwise import`). The two were built against an assumed shared shape on each side and never exercised as one real round-trip. The fixture under internal/cli/testdata/pairwise_roundtrip/ is GENERATED by executing the fork's real emission code (run_pairwise_eval) with a stubbed, offline rollout (regen.py), so it reflects the fork's ACTUAL packet + secret-map shape rather than JSON hand-authored to match the Go parser. The Go test runs the real importer over that fixture and asserts: the import parses the fork shape; each pick unblinds to the correct champion/challenger with the correct option join (computed from the fork's real secret map); the distinct live-pairwise source tag is set; and re-import is idempotent. A second test proves the round-trip genuinely fails on a fork-side contract key drift. CI runs Go only; the committed fixture means no Python is needed at test time. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ersion (#507/#508) Address two review findings on the cross-repo pairwise round-trip E2E: - The two-item fixture was degenerate: with seed=42/run-1 both items placed champion on side B, so a Go importer that ignored the per-item secret map and hardcoded "B=champion" would still pass every assertion — the round-trip never forced per-item secret-map consultation. Regenerate the fork-emitted fixture with FOUR val items so placements become ['B','B','A','A'] (both champion-on-A and champion-on-B). Add a regen-time guard asserting both A/B placements are present so a future seed/run_id change can't silently revert to a hollow single-placement fixture. Verified the assertion is now load-bearing: a degenerate secret map fails val-3's winner check. - The frozen, Go-only fixture can't catch a future fork-side shape change in CI (regen.py never runs there). Pin the one drift axis we can: add TestSkillOptPairwiseRoundTripContractVersionPinned asserting the committed packet/secret-map contract_version equals skillopt.ContractVersion, so a bump on either side that leaves the fixture stale turns red. Soften the test doc comment and README to state the round-trip validates the frozen fork shape as of the last manual regen, not live fork-vs-Go drift. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this validates
A deterministic, offline, CI-runnable cross-repo round-trip E2E for the
live-pairwise flow. It closes a real gap: the fork's emitted blinded pairwise
packet (#507,
gitmoot-skillopt) and the Go importer (#508,gitmoot skillopt pairwise import) were each built/tested against an assumedshared shape and never run as one real producer -> consumer round-trip.
How
internal/cli/testdata/pairwise_roundtrip/isfork-GENERATED, not hand-authored:
regen.pyexecutes the fork's realemission code (
gitmoot_skillopt/pairwise.py:run_pairwise_eval) with adeterministic, offline stubbed rollout (no live LLM, no network) and copies
the emitted
pairwise-review.json+pairwise-secret-map.jsonverbatim (onlythe volatile absolute temp-path root of the admin/debug trace fields the Go
importer never reads is normalized to
/FIXTURE, for byte-stable regen).pairwise-picks.jsonis the authored reviewer input (A/B labels only — thefork does not emit picks).
expected.jsonis the ground-truth unblind outcomecomputed from the fork's REAL secret map.
skillopt_pairwise_roundtrip_test.goruns the REALrunSkillOptPairwiseImportover the fixture into a temp Store and asserts:
option join (champion option carries the promoted output, challenger the
candidate output) — fails on an inversion or wrong join;
live-pairwise/pairwise_valsetsource tags are set;...DetectsShapeDrift) renames a contract key in a copy of thefork fixture and asserts the import then fails and persists nothing — proving
the round-trip genuinely catches a contract-shape mismatch.
CI runs Go only; the committed fixture means no Python is required at test
time. Refresh recipe documented in
internal/cli/testdata/pairwise_roundtrip/README.md.Relates #77 #507 #508
🤖 Generated with Claude Code