Skip to content

feat(skillopt): ingest blinded live-pairwise packet → canonical feedback (#77b)#511

Merged
jerryfane merged 2 commits into
mainfrom
feat/508-pairwise-ingest
Jun 27, 2026
Merged

feat(skillopt): ingest blinded live-pairwise packet → canonical feedback (#77b)#511
jerryfane merged 2 commits into
mainfrom
feat/508-pairwise-ingest

Conversation

@jerryfane

Copy link
Copy Markdown
Owner

What it does

Adds gitmoot skillopt pairwise import <packet-dir>, the consumer side of the #77a seam. It ingests a REVIEWED blinded live-pairwise packet on review close:

  • the per-item anonymized A/B options (pairwise-review.json),
  • the reviewer's per-item picks (pairwise-picks.json),
  • and the SEPARATE secret unblinding map (pairwise-secret-map.json).

For each item it unblinds the anonymous pick (A/B) back to champion vs challenger using the secret_map only, then writes a canonical RankedFeedbackEvent through the EXISTING Mode B recording path (recordSkillOptABPickensureSkillOptABRunRows + upsertSkillOptABRankedEvent).

Invariants

  • Unblind via secret_map only — resolution comes solely from the secret map and cross-checks mapping against champion_label/challenger_label; an inverted/garbage entry errors instead of silently flipping the preference. Tested for both A→champion and A→challenger orientations (the test FAILS on an inversion), guarding against the Mode B: champion-challenger Thompson-sampling bandit for ask/research agents (pairwise preference → SkillOpt) #473-style label-shuffle inversion.
  • Distinct source tagsource=live-pairwise, feedback_source=pairwise_valset, run id prefix skillopt-pairwise: — so validation-set live-pairwise feedback is separable from single-prompt skillopt-ab.
  • Idempotent — a STABLE per-item source_url keeps the (run_id,item_id,reviewer,source,source_url) conflict key identical across re-imports, so no double-count (tested: 3× import → 1 row).
  • Manual promotion preserved — ingestion writes feedback ONLY; it never promotes or auto-promotes (tested: current version unchanged, no pending version created).
  • Fail-safe per item — a missing/garbage secret entry or a missing pick is reported per item without aborting the rest of the import (tested: 1 imported / 2 skipped, exit 1, good row persisted).
  • Additive to contract_version=1 — reuses eval_review_options/eval_artifacts/ranked_feedback_events. The Mode B helpers are parameterized by itemID/source/feedback_source; existing skillopt ab + judge callers pass the prior defaults (byte-identical behavior).

Reuse

Consumes the #507 (#77a) packet contract exactly (run_pairwise_eval blinded packet + separate secret_map). Reuses the Mode B path rather than reimplementing recording.

Closes #508

jerryfane and others added 2 commits June 27, 2026 15:54
…edback (#77b)

Add `gitmoot skillopt pairwise import <packet-dir>`, the consumer side of the
#77a seam: it ingests a REVIEWED blinded live-pairwise packet (per-item
anonymized A/B options), the reviewer's per-item picks, and the SEPARATE secret
map, UNBLINDS each pick back to champion vs challenger via the secret map ONLY,
and writes canonical RankedFeedbackEvents through the existing Mode B recording
path (recordSkillOptABPick -> ensureSkillOptABRunRows + upsertSkillOptABRankedEvent).

- Distinct source tag (source=live-pairwise, feedback_source=pairwise_valset) so
  validation-set live-pairwise feedback is separable from single-prompt skillopt-ab.
- Unblind comes solely from the secret map and cross-checks mapping vs
  champion/challenger labels; an inverted/garbage entry errors instead of
  silently flipping the preference (tested for both A->champion and A->challenger).
- Idempotent: a stable per-item source_url keeps the
  (run_id,item_id,reviewer,source,source_url) conflict key identical across
  re-imports, so no double-count.
- Manual promotion preserved: ingestion writes feedback only and never promotes.
- Fail-safe per item: a missing/garbage secret entry or missing pick is reported
  per item without aborting the rest of the import.
- Additive to contract_version=1: reuses eval_review_options/eval_artifacts/
  ranked_feedback_events; the Mode B helpers are parameterized by
  itemID/source/feedback_source with existing callers passing the prior defaults.

Closes #508

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The packet<->secret-map run_id cross-check guarded against unblinding a
packet with a foreign mapping, but the reviewer picks file — the artifact
that decides each winner — was never validated against the packet.
UnblindPairwisePacket joins picks by item_id only, so a picks file from a
different pairwise run whose items share generic ids (item-1, …) would
silently unblind the WRONG reviewer preferences and persist them as
canonical RankedFeedbackEvents for THIS run with exit 0.

Add the symmetric run_id guard next to the existing secret-map check.
Picks supplied as a bare map shape carry no run_id and skip the check.
Adds TestRunSkillOptPairwiseImportForeignPicksRunID, which fails (exit 0,
foreign preference persisted) without the guard.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jerryfane jerryfane merged commit 93803fb into main Jun 27, 2026
1 check passed
@jerryfane jerryfane deleted the feat/508-pairwise-ingest branch June 27, 2026 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SkillOpt live-pairwise (Go): ingest blinded paired packet → canonical feedback events (#77b)

1 participant