Skip to content

test(skillopt): cross-repo pairwise round-trip E2E (#507/#508)#514

Merged
jerryfane merged 2 commits into
mainfrom
test/pairwise-roundtrip-e2e
Jun 27, 2026
Merged

test(skillopt): cross-repo pairwise round-trip E2E (#507/#508)#514
jerryfane merged 2 commits into
mainfrom
test/pairwise-roundtrip-e2e

Conversation

@jerryfane

Copy link
Copy Markdown
Owner

What this validates

A deterministic, offline, CI-runnable cross-repo round-trip E2E for the
live-pairwise flow. It closes a real gap: the fork's emitted blinded pairwise
packet (#507, gitmoot-skillopt) and the Go importer (#508,
gitmoot skillopt pairwise import) were each built/tested against an assumed
shared shape and never run as one real producer -> consumer round-trip.

How

  • The fixture under internal/cli/testdata/pairwise_roundtrip/ is
    fork-GENERATED, not hand-authored: regen.py executes the fork's real
    emission code (gitmoot_skillopt/pairwise.py:run_pairwise_eval) with a
    deterministic, offline stubbed rollout (no live LLM, no network) and copies
    the emitted pairwise-review.json + pairwise-secret-map.json verbatim (only
    the volatile absolute temp-path root of the admin/debug trace fields the Go
    importer never reads is normalized to /FIXTURE, for byte-stable regen).
  • pairwise-picks.json is the authored reviewer input (A/B labels only — the
    fork does not emit picks). expected.json is the ground-truth unblind outcome
    computed from the fork's REAL secret map.
  • skillopt_pairwise_roundtrip_test.go runs the REAL runSkillOptPairwiseImport
    over the fixture into a temp Store and asserts:
    • the import parses the fork's actual packet shape;
    • each pick unblinds to the correct champion/challenger, including the
      option join (champion option carries the promoted output, challenger the
      candidate output) — fails on an inversion or wrong join;
    • the distinct live-pairwise / pairwise_valset source tags are set;
    • re-import is idempotent (no double-count).
  • A second test (...DetectsShapeDrift) renames a contract key in a copy of the
    fork fixture and asserts the import then fails and persists nothing — proving
    the round-trip genuinely catches a contract-shape mismatch.

CI runs Go only; the committed fixture means no Python is required at test
time. Refresh recipe documented in
internal/cli/testdata/pairwise_roundtrip/README.md.

Relates #77 #507 #508

🤖 Generated with Claude Code

jerryfane and others added 2 commits June 27, 2026 21:13
Add a deterministic, offline, CI-runnable cross-repo round-trip E2E that
catches a contract-shape mismatch between the gitmoot-skillopt fork's emitted
blinded live-pairwise packet (#507) and the Go importer (#508,
`gitmoot skillopt pairwise import`). The two were built against an assumed
shared shape on each side and never exercised as one real round-trip.

The fixture under internal/cli/testdata/pairwise_roundtrip/ is GENERATED by
executing the fork's real emission code (run_pairwise_eval) with a stubbed,
offline rollout (regen.py), so it reflects the fork's ACTUAL packet + secret-map
shape rather than JSON hand-authored to match the Go parser. The Go test runs
the real importer over that fixture and asserts: the import parses the fork
shape; each pick unblinds to the correct champion/challenger with the correct
option join (computed from the fork's real secret map); the distinct
live-pairwise source tag is set; and re-import is idempotent. A second test
proves the round-trip genuinely fails on a fork-side contract key drift.

CI runs Go only; the committed fixture means no Python is needed at test time.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ersion (#507/#508)

Address two review findings on the cross-repo pairwise round-trip E2E:

- The two-item fixture was degenerate: with seed=42/run-1 both items placed
  champion on side B, so a Go importer that ignored the per-item secret map and
  hardcoded "B=champion" would still pass every assertion — the round-trip never
  forced per-item secret-map consultation. Regenerate the fork-emitted fixture
  with FOUR val items so placements become ['B','B','A','A'] (both champion-on-A
  and champion-on-B). Add a regen-time guard asserting both A/B placements are
  present so a future seed/run_id change can't silently revert to a hollow
  single-placement fixture. Verified the assertion is now load-bearing: a
  degenerate secret map fails val-3's winner check.

- The frozen, Go-only fixture can't catch a future fork-side shape change in CI
  (regen.py never runs there). Pin the one drift axis we can: add
  TestSkillOptPairwiseRoundTripContractVersionPinned asserting the committed
  packet/secret-map contract_version equals skillopt.ContractVersion, so a bump
  on either side that leaves the fixture stale turns red. Soften the test doc
  comment and README to state the round-trip validates the frozen fork shape as
  of the last manual regen, not live fork-vs-Go drift.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jerryfane jerryfane merged commit 06b9cdd into main Jun 27, 2026
1 check passed
@jerryfane jerryfane deleted the test/pairwise-roundtrip-e2e branch June 27, 2026 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant