feat(skillopt): ingest blinded live-pairwise packet → canonical feedback (#77b)#511
Merged
Conversation
…edback (#77b) Add `gitmoot skillopt pairwise import <packet-dir>`, the consumer side of the #77a seam: it ingests a REVIEWED blinded live-pairwise packet (per-item anonymized A/B options), the reviewer's per-item picks, and the SEPARATE secret map, UNBLINDS each pick back to champion vs challenger via the secret map ONLY, and writes canonical RankedFeedbackEvents through the existing Mode B recording path (recordSkillOptABPick -> ensureSkillOptABRunRows + upsertSkillOptABRankedEvent). - Distinct source tag (source=live-pairwise, feedback_source=pairwise_valset) so validation-set live-pairwise feedback is separable from single-prompt skillopt-ab. - Unblind comes solely from the secret map and cross-checks mapping vs champion/challenger labels; an inverted/garbage entry errors instead of silently flipping the preference (tested for both A->champion and A->challenger). - Idempotent: a stable per-item source_url keeps the (run_id,item_id,reviewer,source,source_url) conflict key identical across re-imports, so no double-count. - Manual promotion preserved: ingestion writes feedback only and never promotes. - Fail-safe per item: a missing/garbage secret entry or missing pick is reported per item without aborting the rest of the import. - Additive to contract_version=1: reuses eval_review_options/eval_artifacts/ ranked_feedback_events; the Mode B helpers are parameterized by itemID/source/feedback_source with existing callers passing the prior defaults. Closes #508 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The packet<->secret-map run_id cross-check guarded against unblinding a packet with a foreign mapping, but the reviewer picks file — the artifact that decides each winner — was never validated against the packet. UnblindPairwisePacket joins picks by item_id only, so a picks file from a different pairwise run whose items share generic ids (item-1, …) would silently unblind the WRONG reviewer preferences and persist them as canonical RankedFeedbackEvents for THIS run with exit 0. Add the symmetric run_id guard next to the existing secret-map check. Picks supplied as a bare map shape carry no run_id and skip the check. Adds TestRunSkillOptPairwiseImportForeignPicksRunID, which fails (exit 0, foreign preference persisted) without the guard. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What it does
Adds
gitmoot skillopt pairwise import <packet-dir>, the consumer side of the #77a seam. It ingests a REVIEWED blinded live-pairwise packet on review close:pairwise-review.json),pairwise-picks.json),pairwise-secret-map.json).For each item it unblinds the anonymous pick (A/B) back to champion vs challenger using the secret_map only, then writes a canonical
RankedFeedbackEventthrough the EXISTING Mode B recording path (recordSkillOptABPick→ensureSkillOptABRunRows+upsertSkillOptABRankedEvent).Invariants
mappingagainstchampion_label/challenger_label; an inverted/garbage entry errors instead of silently flipping the preference. Tested for bothA→championandA→challengerorientations (the test FAILS on an inversion), guarding against the Mode B: champion-challenger Thompson-sampling bandit for ask/research agents (pairwise preference → SkillOpt) #473-style label-shuffle inversion.source=live-pairwise,feedback_source=pairwise_valset, run id prefixskillopt-pairwise:— so validation-set live-pairwise feedback is separable from single-promptskillopt-ab.source_urlkeeps the(run_id,item_id,reviewer,source,source_url)conflict key identical across re-imports, so no double-count (tested: 3× import → 1 row).contract_version=1— reuseseval_review_options/eval_artifacts/ranked_feedback_events. The Mode B helpers are parameterized byitemID/source/feedback_source; existingskillopt ab+ judge callers pass the prior defaults (byte-identical behavior).Reuse
Consumes the #507 (#77a) packet contract exactly (
run_pairwise_evalblinded packet + separatesecret_map). Reuses the Mode B path rather than reimplementing recording.Closes #508