feat(skillopt): ingest blinded live-pairwise packet → canonical feedback (#77b) by jerryfane · Pull Request #511 · jerryfane/gitmoot

jerryfane · 2026-06-27T13:54:49Z

What it does

Adds gitmoot skillopt pairwise import <packet-dir>, the consumer side of the #77a seam. It ingests a REVIEWED blinded live-pairwise packet on review close:

the per-item anonymized A/B options (pairwise-review.json),
the reviewer's per-item picks (pairwise-picks.json),
and the SEPARATE secret unblinding map (pairwise-secret-map.json).

For each item it unblinds the anonymous pick (A/B) back to champion vs challenger using the secret_map only, then writes a canonical RankedFeedbackEvent through the EXISTING Mode B recording path (recordSkillOptABPick → ensureSkillOptABRunRows + upsertSkillOptABRankedEvent).

Invariants

Unblind via secret_map only — resolution comes solely from the secret map and cross-checks mapping against champion_label/challenger_label; an inverted/garbage entry errors instead of silently flipping the preference. Tested for both A→champion and A→challenger orientations (the test FAILS on an inversion), guarding against the Mode B: champion-challenger Thompson-sampling bandit for ask/research agents (pairwise preference → SkillOpt) #473-style label-shuffle inversion.
Distinct source tag — source=live-pairwise, feedback_source=pairwise_valset, run id prefix skillopt-pairwise: — so validation-set live-pairwise feedback is separable from single-prompt skillopt-ab.
Idempotent — a STABLE per-item source_url keeps the (run_id,item_id,reviewer,source,source_url) conflict key identical across re-imports, so no double-count (tested: 3× import → 1 row).
Manual promotion preserved — ingestion writes feedback ONLY; it never promotes or auto-promotes (tested: current version unchanged, no pending version created).
Fail-safe per item — a missing/garbage secret entry or a missing pick is reported per item without aborting the rest of the import (tested: 1 imported / 2 skipped, exit 1, good row persisted).
Additive to contract_version=1 — reuses eval_review_options/eval_artifacts/ranked_feedback_events. The Mode B helpers are parameterized by itemID/source/feedback_source; existing skillopt ab + judge callers pass the prior defaults (byte-identical behavior).

Reuse

Consumes the #507 (#77a) packet contract exactly (run_pairwise_eval blinded packet + separate secret_map). Reuses the Mode B path rather than reimplementing recording.

Closes #508

…edback (#77b) Add `gitmoot skillopt pairwise import <packet-dir>`, the consumer side of the #77a seam: it ingests a REVIEWED blinded live-pairwise packet (per-item anonymized A/B options), the reviewer's per-item picks, and the SEPARATE secret map, UNBLINDS each pick back to champion vs challenger via the secret map ONLY, and writes canonical RankedFeedbackEvents through the existing Mode B recording path (recordSkillOptABPick -> ensureSkillOptABRunRows + upsertSkillOptABRankedEvent). - Distinct source tag (source=live-pairwise, feedback_source=pairwise_valset) so validation-set live-pairwise feedback is separable from single-prompt skillopt-ab. - Unblind comes solely from the secret map and cross-checks mapping vs champion/challenger labels; an inverted/garbage entry errors instead of silently flipping the preference (tested for both A->champion and A->challenger). - Idempotent: a stable per-item source_url keeps the (run_id,item_id,reviewer,source,source_url) conflict key identical across re-imports, so no double-count. - Manual promotion preserved: ingestion writes feedback only and never promotes. - Fail-safe per item: a missing/garbage secret entry or missing pick is reported per item without aborting the rest of the import. - Additive to contract_version=1: reuses eval_review_options/eval_artifacts/ ranked_feedback_events; the Mode B helpers are parameterized by itemID/source/feedback_source with existing callers passing the prior defaults. Closes #508 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The packet<->secret-map run_id cross-check guarded against unblinding a packet with a foreign mapping, but the reviewer picks file — the artifact that decides each winner — was never validated against the packet. UnblindPairwisePacket joins picks by item_id only, so a picks file from a different pairwise run whose items share generic ids (item-1, …) would silently unblind the WRONG reviewer preferences and persist them as canonical RankedFeedbackEvents for THIS run with exit 0. Add the symmetric run_id guard next to the existing secret-map check. Picks supplied as a bare map shape carry no run_id and skip the check. Adds TestRunSkillOptPairwiseImportForeignPicksRunID, which fails (exit 0, foreign preference persisted) without the guard. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

jerryfane and others added 2 commits June 27, 2026 15:54

jerryfane merged commit 93803fb into main Jun 27, 2026
1 check passed

jerryfane deleted the feat/508-pairwise-ingest branch June 27, 2026 14:50

jerryfane mentioned this pull request Jun 27, 2026

Track future SkillOpt live pairwise evaluation mode #77

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skillopt): ingest blinded live-pairwise packet → canonical feedback (#77b)#511

feat(skillopt): ingest blinded live-pairwise packet → canonical feedback (#77b)#511
jerryfane merged 2 commits into
mainfrom
feat/508-pairwise-ingest

jerryfane commented Jun 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jerryfane commented Jun 27, 2026

What it does

Invariants

Reuse

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant