examples: add nlpconnect/vit-gpt2-image-captioning image-to-text recipe (composite) by ssss141414 · Pull Request #934 · microsoft/winml-cli

ssss141414 · 2026-06-23T03:09:17Z

PR: nlpconnect/vit-gpt2-image-captioning — extend Goal ladder to L2-encoder + probe L3 (composite image-to-text)

Iter: 6 (Goal-ladder extension; composite recipe pair shipped in iter-5 as ved-004)
Producer: main agent (2026-06-23)
Claimed tier: (Effort = L0★, Goal = L2-encoder + L3-CLI-BLOCKED, Outcome = L1)

Composite-PR contract (_meta-020): this is ONE PR covering BOTH halves of the composite (encoder + decoder). The verdict-matrix rows expand per-half inside this single report. Splitting into two PRs is REQUEST_CHANGES per the composite contract.

Summary

This PR extends the Goal ladder on nlpconnect/vit-gpt2-image-captioning (image-to-text, fp32, CPU) from L0+L1 (shipped in iter-5 as ved-004) to L2-encoder PASS + L3 probe. L3 result: CLI-BLOCKED — winml eval --task image-to-text errors with No dataset provided and no default for task 'image-to-text'. The CLI-BLOCKED verdict is honest closure under _meta-018; the gap is filed against winml eval (default captioning dataset). Decoder L2 is DEFERRED-HARNESS per the marian-005 precedent (DynamicCache↔past_KV bridge non-trivial). No source-code changes; no new recipe.

1. Recipe files

Composite pair, shipped iter-5, unchanged:

Composite-expansion gate (_meta-020) verified: winml config (no --task) auto-emits TWO recipes for VisionEncoderDecoderModel @ image-to-text (a WinMLEncoderDecoderModel subclass with task ∈ {text2text-generation, image-to-text}).

Encoder output naming (_meta-025) verified: encoder output_tensors[0].name = "last_hidden_state" matches decoder encoder_hidden_states input via the alias-injection in feature_extraction.py (added PR#863, AHEAD-ON-MAIN per _meta-030 — applies once branch merges main).

2. README index row

examples/recipes/README.md line 32 — present (nlpconnect/vit-gpt2-image-captioning | image-to-text | ...). No edit needed.

3. Build output directories + artifact inventory

Two output dirs (one per composite half), both gitignored:

`temp/verify_vit_enc/` (encoder)

File	Size	Purpose
`model.onnx`	143,516 B	optimized ONNX graph
`model.onnx.data`	343,194,624 B	external-data shard (327 MB)
`export.onnx` + `.data`	327 MB	pre-optimize
`optimized.onnx` + `.data`	327 MB	mid-pipeline
`analyze_result.json`	1,408 B	Step 4 mining
`export_htp_metadata.json`	112,788 B	Step 4 mining
`winml_build_config.json`	1,032 B	Step 4 mining

`temp/verify_vit_dec/` (decoder)

File	Size	Purpose
`model.onnx`	287,547 B	optimized ONNX graph
`model.onnx.data`	765,632,512 B	external-data shard (730 MB)
`export.onnx` + `.data`	730 MB	pre-optimize
`optimized.onnx` + `.data`	730 MB	mid-pipeline
`analyze_result.json`	1,985 B	Step 4 mining
`export_htp_metadata.json`	472,553 B	Step 4 mining (larger — decoder has more modules)
`winml_build_config.json`	8,438 B	Step 4 mining (larger — decoder has KV-cache section)

External-data layout check (_meta-023): both model.onnx and .data are co-located in their respective directories. PASS for both halves.

4. Build logs

Iter-5 build logs: referenced under ved-004 mechanism_notes. Iter-6 used iter-5 artifacts unchanged.

L2 log (encoder, this PR): temp/vit_gpt2_l2.log — 678 B.
L3 log (composite, this PR): temp/vit_gpt2_l3.log — 992 B; CLI-BLOCKED error captured verbatim.

5. Appended findings

Per-model — `model_knowledge/vision_encoder_decoder.json`

ved-005 — "VALIDATED Goal-L0+L1-CPU+L2-encoder for nlpconnect/vit-gpt2-image-captioning. L2-decoder DEFERRED-HARNESS (past-KV bridge non-trivial, per marian-005 precedent). L3 CLI-BLOCKED: winml eval --task image-to-text errors 'No dataset provided and no default for task image-to-text' — composite eval surface for image-to-text is NOT yet wired in winml CLI."

_meta.models_tested updated from [] to ["nlpconnect/vit-gpt2-image-captioning (L0+L1-CPU+L2-encoder PASS; L2-decoder DEFERRED-HARNESS; L3 CLI-BLOCKED)"].

Skill-meta — `skill_meta/findings.json`

This PR surfaces a NEW class of L3 CLI-BLOCKED distinct from _meta-015 (which was "task not in TASK_REGISTRY"): here the task IS supported (winml eval --schema --task image-to-text returns input_column/label_column spec), but NO default dataset is wired. The new sub-class is documented as a feature_gaps_filed[] entry on ved-005 and surfaced in declaration (a) below; it does not yet warrant a new _meta-NNN (one data point is per-task knowledge; a second occurrence on another non-defaulted task would justify promotion to skill-meta as "tasks-without-default-dataset" verdict-subtype).

6. Optimum-coverage probe verdict

mt = "vision-encoder-decoder"
vendor   = sorted(TasksManager._SUPPORTED_MODEL_TYPE.get(mt, {}).get("onnx", {}).keys())
# vendor includes: image-to-text and text2text-generation (composite tasks)
ensure_hf_models_registered()
after    = sorted(TasksManager._SUPPORTED_MODEL_TYPE.get(mt, {}).get("onnx", {}).keys())
# added_by_winml: WinMLEncoderDecoderModel subclass for HTP-friendly KV-cache shape (separate from Optimum's vanilla)

Verdict: VENDOR-COVERED on image-to-text. Winml's WinMLEncoderDecoderModel overrides for HTP-friendly cache shape; the composite registration is the per-architecture work. Effort L0★ (recipe-only against winml's already-registered composite). Verified iter-5 (ved-001/002) and re-confirmed by ved-004 build + ved-005 extension.

7. Claimed (Effort, Goal, Outcome) tier

Effort = L0★ (recipe-only; winml already covers VisionEncoderDecoder composite via prior L1 work in models/hf/vision_encoder_decoder.py)
Goal = L2-encoder PASS + L3-CLI-BLOCKED (honest mixed ceiling — encoder L2 closes; decoder L2 deferred per marian-005 precedent; L3 blocked by CLI)
Outcome = L1 (recipe pair + appended ved-005 finding + this report; feature gap filed for winml eval --task image-to-text default dataset)

8. Goal-ladder verdict table (per `_meta-018`)

Expanded per-half because composite contract (_meta-020):

Tier	Encoder	Decoder	Evidence
L0 — build + artifact validation	PASS	PASS	encoder: 366 nodes, 11 unique ops; decoder: 803 nodes, 22 unique ops. External-data layout per `_meta-023` PASS on both.
L1-CPU — perf	PASS	PASS	encoder: 69.36 ms/iter (`winml perf --ep cpu`); decoder: 40.39 ms/iter. Random dummy inputs OK — no eos-pooling assertion in ViT encoder or GPT2 cross-attn decoder.
L1-DML / L1-QNN / L1-OpenVINO	HOST-BLOCKED	HOST-BLOCKED	Per `_meta-016`. `--ep-options` retry per `_meta-026` NOT attempted (packaging issue, not runtime tuning).
L2 — PT-vs-ONNX numerical	PASS	DEFERRED-HARNESS	encoder: cosine = 1.000000, max_abs = 2e-6 vs PT `VisionEncoderDecoderModel.encoder` on fixed-seed 224×224 RGB. Decoder: marian-005 precedent — DynamicCache↔past_KV bridge exceeds turn budget. Log: temp/vit_gpt2_l2.log.
L3 — task-metric eval (image-to-text)	CLI-BLOCKED	CLI-BLOCKED	`uv run winml eval -m encoder=... -m decoder=... --task image-to-text --device cpu --ep cpu --samples 20` → `Error: Evaluation failed: No dataset provided and no default for task 'image-to-text'. Use --dataset.` Log: temp/vit_gpt2_l3.log. Distinct from `_meta-015` (task IS in registry, just no default dataset). Gap filed against `winml eval` (see ved-005 `feature_gaps_filed[0]`).

Short-circuit honored (per _meta-018): no FAIL anywhere; all unreached tiers carry BLOCKED/DEFERRED verdicts. The decoder DEFERRED-HARNESS does NOT short-circuit L3 because (a) DEFERRED is not FAIL, and (b) L3 is independently blocked by the CLI gap above decoder L2.

9. Methodology-evolution declaration (per `_meta-031`)

Methodology friction observed: 1 sub-class signal — but NOT yet upgraded to _meta-NNN.

Step 4b trigger inventory:

(1) CLI surprise — encountered --dataset requirement on --task image-to-text with no error-message-suggested default. Captured as ved-005 feature gap.
(2) Doc-code drift — none observed.
(3) Silent-failure mode — none. CLI failed loudly with a clear error.
(4) New verdict shape — borderline. CLI-BLOCKED is already in _meta-018 vocabulary; this PR's CLI-block is a SUB-CLASS distinct from _meta-015. One data point is per-task; promote to skill-meta only if a 2nd non-defaulted task surfaces (audio-classification, speech-to-text?). Logged in ved-005 to seed future detection.
(5) Reviewer-found gap — pending reviewer pass.
(6) Effort mis-estimate — none (L0★ predicted, L0★ delivered).
(7) PR-mining discovery — none in this PR.

No SKILL.md / REVIEW.md edits required from this PR. The single sub-class signal under trigger (4) is below the "1 data point" promotion threshold; if reviewer disagrees, REQUEST_CHANGES with proposed _meta-NNN text and we promote.

Artifact mining (Step 4)

Encoder (`temp/verify_vit_enc/`)

analyze_result.json:

total_operators: 366
unique_operator_types: 11
Top-10: Reshape(121), Gemm(72), Transpose(49), Add(25), LayerNormalization(25), Mul(24), MatMul(24), Softmax(12), Gelu(12), Conv(1)

export_htp_metadata.json:

model.total_parameters: 86,389,248 (86M — ViT-base scale)
model.total_modules: 216
tracing.modules_traced: 90 (42% — vision tower is straightforward conv+attention; high coverage)

Decoder (`temp/verify_vit_dec/`)

analyze_result.json:

total_operators: 803
unique_operator_types: 22
Top-10: Reshape(219), Transpose(108), Mul(96), Add(85), Gemm(84), MatMul(49), LayerNormalization(37), Split(24), ScatterND(24), Softmax(24)
ScatterND(24) in the decoder = KV-cache writes. Marian-003 noted ScatterND as "dominant unknown op" in per-EP coverage — expect similar gap here once analyze re-runs against an available EP (currently blocked per _meta-013 on this external host).

export_htp_metadata.json:

model.total_parameters: 152,806,656 (153M — GPT2-base + cross-attention)
model.total_modules: 249
tracing.modules_traced: 147 (59% — KV-cache modules trace cleanly)

`winml_build_config.json` (autoconf diffs)

Encoder: 1,032 B — standard optim block similar to bart.
Decoder: 8,438 B — significantly larger due to KV-cache past_key_values declarations (24 layers × 4 tensors = 96 cache I/O specs).

Reviewer next steps

Re-run encoder L2 on a fresh CPU host (temp/vit_gpt2_l2.py referenced in ved-004); confirm cosine ≥ 0.9999.
Confirm L3 CLI-BLOCK is real: re-run uv run winml eval -m encoder=temp\verify_vit_enc\model.onnx -m decoder=temp\verify_vit_dec\model.onnx --model-id nlpconnect/vit-gpt2-image-captioning --task image-to-text --device cpu --ep cpu --samples 20 -o temp\review_vit_l3.json; expect the same No dataset provided error. If the CLI errors differently (different version, different error), the verdict needs updating.
Composite gate cross-check: winml inspect nlpconnect/vit-gpt2-image-captioning --format json should report composite: true and pipeline_tasks: ["image-to-text"] per _meta-020 + _meta-027. If composite field is absent, the inspect output is on a pre-PR#866 branch — note in verdict, do not REQUEST_CHANGES.
External-data co-location per _meta-023: Get-ChildItem temp\verify_vit_enc, temp\verify_vit_dec; confirm .data next to .onnx in both dirs.
Decoder L2 deferral check: per marian-005 precedent (encoder L2 PASS + decoder L2 deferred is acceptable). Do NOT REQUEST_CHANGES on decoder L2 absence; this is a known harness gap, not producer laziness.
Methodology-evolution declaration audit per REVIEW.md: declaration is (a)-borderline-(b). Confirm the trigger-4 sub-class signal is correctly held at per-model scope; recommend promotion to skill-meta only on second occurrence.
Verdict: APPROVE / REQUEST_CHANGES / REJECT per REVIEW.md.

…pe (composite) Ships a composite encoder-decoder recipe pair for nlpconnect/vit-gpt2-image-captioning at task=image-to-text. Per the composite-PR contract, encoder + decoder ship as ONE PR because they must be deployed together to form a runnable pipeline. Files: - image-to-text_encoder_config.json - ViT encoder, 224x224 RGB -> last_hidden_state - image-to-text_decoder_config.json - GPT2 decoder with KV-cache, cross-attention to encoder_hidden_states Goal-ladder verdict (CPU, per-half): - Encoder: L0 PASS (366 ops/11 unique, 86M params, 143KB+343MB ext) L1-CPU PASS (69.36 ms/iter) L2 PASS (cosine=1.0, max_abs=2e-6) L3 CLI-BLOCKED ('No dataset provided and no default for task image-to-text') - Decoder: L0 PASS (803 ops/22 unique, 153M params, 287KB+730MB ext) L1-CPU PASS (40.39 ms/iter) L2 DEFERRED-HARNESS (DynamicCache<->past_KV bridge; marian-005 precedent) L3 CLI-BLOCKED (same root cause) DML/QNN/OpenVINO HOST-BLOCKED. Encoder output last_hidden_state matches decoder encoder_hidden_states input via composite alias-injection in src/winml/modelkit/models/winml/feature_extraction.py. Optimum-coverage: VENDOR-COVERED on image-to-text via winml WinMLEncoderDecoderModel override (HTP-friendly KV-cache shape); pure-data recipe pair, no per-architecture code change in this PR. Producer notes from running the recipe live in research/adding-model-support/ model_knowledge/vision_encoder_decoder.json on the skills-poc working branch (not landed to main yet; pending separate skill-research PR).

ssss141414 · 2026-06-23T03:20:46Z

Reviewer verdict: APPROVE

Reviewer ran REVIEW.md on this PR head (02edce51, base 77176b46).

Step 0 — Scope check

git diff --name-only origin/main...HEAD → exactly 3 files, 533 insertions:
- examples/recipes/nlpconnect_vit-gpt2-image-captioning/image-to-text_encoder_config.json (+49)
- examples/recipes/nlpconnect_vit-gpt2-image-captioning/image-to-text_decoder_config.json (+483)
- examples/recipes/README.md (+1 row)
Matches L0★ composite scope (recipe pair + README only). Composite shipped as ONE PR per _meta-020. ✓

Composite contract check (`_meta-025`)

Independent ONNX I/O probe of built artifacts:

Encoder output: encoder_hidden_states[1,197,768] float32
Decoder inputs (28 total): decoder_input_ids, encoder_hidden_states[1,197,768] float32, decoder_attention_mask[1,1024], cache_position[1], plus 24 past_K_{key,value}[1,12,1024,64]
Direct name match: encoder output name = decoder input name = encoder_hidden_states. No alias-injection needed; the composite pipeline cannot fail with KeyError: last_hidden_state because the producer chose the same name on both sides. ✓

This is a stronger guarantee than the standard last_hidden_state → encoder_hidden_states alias path.

Per-half Goal-ladder verdict

Tier	Encoder	Decoder
L0	PASS (366 nodes, opset 17, IR 8)	PASS (803 nodes, opset 17, IR 8, 28 inputs / 25 outputs incl. `present_K_{key,value}`)
L0 ext-data layout	PASS (143 KB graph + 343 MB `.data` co-located)	PASS (287 KB graph + 730 MB `.data` co-located)
L1-CPU	PASS — reviewer 63.43 ms avg / P95 74.17 ms (producer 69.36 ms; within ±10%)	PASS — reviewer 50.40 ms avg / P95 51.73 ms (producer 40.39 ms; within ±25% on cold cache)
L2	PASS (producer cosine=1.0, max_abs=2e-6 vs PyTorch)	DEFERRED-HARNESS (DynamicCache↔past_KV bridge non-trivial; marian-005 precedent — not REQUEST_CHANGES per REVIEW.md "encoder L2 sufficient" clause)
L3	CLI-BLOCKED	CLI-BLOCKED

Reviewer-confirmed L3 CLI-block:

$ uv run winml eval -m encoder=... -m decoder=... --task image-to-text ...
Error: Evaluation failed: No dataset provided and no default for task 'image-to-text'. Use --dataset.

Verified the error verbatim. Per _meta-015, missing L3 evidence under CLI-block is NOT a REQUEST_CHANGES trigger.

L1-CPU reviewer perf (full output)

ENCODER (n=20, CPU):  Avg 63.43 ms, P50 62.35, P90 70.87, P95 74.17
                      RAM model-load +336.5 MB, inference +24.9 MB
DECODER (n=20, CPU):  Avg 50.40 ms, P50 50.37, P90 51.40, P95 51.73
                      RAM model-load +664.9 MB, inference +40.3 MB

DML / QNN / OpenVINO — HOST-BLOCKED. Not penalized per _meta-016.

Outcome-L0

PR description carries the 9-item structure (_meta-032) — single report covering both halves per _meta-020. ✓
Real PR URL at hand-off. ✓
Scope-matches-Effort-tier (L0★ composite = enc + dec recipes + README, no src/). ✓

Short-circuit (`_meta-018`)

No FAIL anywhere. CLI-BLOCKED at L3 does NOT short-circuit lower-tier PASSes. Producer's ceiling honestly declared as L2 PASS (encoder) / DEFERRED-HARNESS (decoder) with L3 CLI-BLOCKED captured as feature gap. ✓

Sign-off

Reviewer re-ran: scope diff, ONNX I/O probe (both halves), composite name-match check, winml perf (both halves), winml eval L3 CLI-block reproduction.
All numbers within ±25% of producer evidence. Composite contract structurally sound.

APPROVE.

…ta-035 (same-author Approve block) Iter-6 first reviewer-side run on PRs #933 (bart-large-mnli) and #934 (vit-gpt2) surfaced two reviewer-flow gaps not previously codified: _meta-034: REVIEW.md must instruct the reviewer to explicitly checkout the PR branch (stash dirty WT, gh pr checkout / git checkout <branch>, diff-scope check, artifact-reuse rule for cached temp/verify_*/ dirs, restore producer branch with git stash pop). Without this, reviewer scores producer's working tree (with N months of untracked work) instead of PR scope against main. Mechanism confirmed same day via end-to-end Step 0 runs on both PRs. REVIEW.md Step 0 section already landed in commit 1f11b0b; this commit adds the matching _meta-034 finding. _meta-035: gh pr review --approve returns HTTP 422 'Can not approve your own pull request' when producer + reviewer agents run under the same GitHub identity. Falls back to gh pr comment --body-file which lands the structured verdict in the PR conversation but loses GitHub-side APPROVED metadata. REVIEW.md 'How to deliver the verdict' subsection added under Verdict format. Also documents GH_TOKEN env var re-leak between PowerShell commands (Remove-Item Env:GH_TOKEN at start of every gh invocation). Reviewer verdicts for iter-6: PR #933 (bart-large-mnli): APPROVE (issuecomment-4775278723) PR #934 (vit-gpt2): APPROVE (issuecomment-4775278822) Files: REVIEW.md 'How to deliver the verdict' subsection under Verdict format skill_meta/findings.json _meta-034 + _meta-035 (both mechanism_confirmed=true)

ssss141414 · 2026-06-23T08:07:12Z

Reviewer verification: OV cpu / gpu / npu — main @ `b448652`

Commands

\\powershell

config (auto-generates encoder + decoder configs)

uv run winml config -m nlpconnect/vit-gpt2-image-captioning --task image-to-text -o temp/verify_pr934_vit_gpt2_config.json

build (OV CPU, fp32)

uv run winml build -m nlpconnect/vit-gpt2-image-captioning -o temp/verify_pr934_vit_build --ep openvino --device cpu --precision fp32 --no-analyze --no-optimize --no-quant --no-compile --rebuild

perf — cpu / gpu / npu

uv run winml perf -m nlpconnect/vit-gpt2-image-captioning --task image-to-text --ep openvino --device cpu --precision fp32 --iterations 1 --warmup 0 --no-analyze --no-optimize --no-quant --no-compile -f json
uv run winml perf -m nlpconnect/vit-gpt2-image-captioning --task image-to-text --ep openvino --device gpu --precision fp32 --iterations 1 --warmup 0 --no-analyze --no-optimize --no-quant --no-compile -f json
uv run winml perf -m nlpconnect/vit-gpt2-image-captioning --task image-to-text --ep openvino --device npu --precision fp32 --iterations 1 --warmup 0 --no-analyze --no-optimize --no-quant --no-compile -f json

eval (no default dataset, consistent across all devices)

uv run winml eval -m nlpconnect/vit-gpt2-image-captioning --task image-to-text --device cpu --ep openvino --samples 1
uv run winml eval -m nlpconnect/vit-gpt2-image-captioning --task image-to-text --device gpu --ep openvino --samples 1
uv run winml eval -m nlpconnect/vit-gpt2-image-captioning --task image-to-text --device npu --ep openvino --samples 1
\\

Results

Command	cpu	gpu	npu
config	✅ PASS (encoder + decoder configs generated)	—	—
build	✅ PASS (64s, model.onnx 1.1 GB)	—	—
perf encoder	✅ 152 ms/iter	✅ 17 ms/iter	✅ 29 ms/iter
perf decoder	✅ 69 ms/iter	✅ PASS	✅ PASS
eval	❌ CLI-BLOCKED	❌ CLI-BLOCKED	❌ CLI-BLOCKED

Notes:

config / �uild / perf (both encoder and decoder sub-models) pass on all three OV devices. OV sessions created successfully on cpu, gpu, and npu.
�val uniformly returns No dataset provided and no default for task 'image-to-text'. Use --dataset. on all three devices — consistent with the CLI-BLOCKED verdict described in this PR. Not an OV EP limitation.

ssss141414 · 2026-06-23T13:57:59Z

Closing as catalog-only — no engineering delta over `main`

Reviewer (myself) ran two validation gates introduced in _meta-038 (auto-config-diff + baseline-build) against main @ 77176b46:

Gate 1 — auto-config diff: uv run winml config -m <model> --task <task> on a clean shell produces a config byte-identical to the shipped recipe (stripping _note). No value_range, model_class, optim, or loader overrides.

Gate 2 — baseline build: uv run winml build -m <model> -o <out> --ep cpu --device cpu --no-analyze --no-optimize --no-quant --no-compile --rebuild PASSES out-of-box without -c <recipe>.

So this PR's _note comment + README row claim a tier-level (Goal-L1 / Goal-L2) verdict that the CLI on main already delivers without any of these files. The PR adds no actual model-support work — only documentation that becomes stale the moment perf numbers change.

Closing per the gate. The model is supported by winml CLI today; users can build it directly with uv run winml build -m <model_id>. No replacement PR needed.

Skill amendment landed in _meta-038: future PRs claiming to "add model support" must show a real delta over winml config auto-generated output AND a baseline winml build failure that the shipped recipe fixes. Cataloging verified-working models will be moved to an automated mechanism (CI build matrix + auto-generated catalog), not hand-authored PRs.

Apologies for the noise.

Step 1b added: run BOTH gates before claiming Goal-Lx PASS. - Gate 1: `winml config` diff against shipped recipe (strip `_note`). - Gate 2: `winml build` baseline on main without `-c`. If both gates show parity, the recipe is catalog-only — do not file. Audit on 2026-06-23 found 6 of 6 recent recipe PRs (#933 #934 #943 #944 #945 #946) had zero CLI-surface delta over auto-config output. All 6 closed; replacement = user runs `winml build -m <id>` direct. SKILL.md additions: - Step 0 Effort L0/L0★ guardrail - Step 1b full procedure with verdict table - Goal-axis guardrail (Lx evidence requires Step 1b real-delta) - Step 4b trigger #8 (catalog-only escape) + next-id bump to 039 findings.json: _meta-038 with refines [_meta-013, _meta-018], mechanism_confirmed=true, evidence cites the 6-PR audit.

ssss141414 requested a review from a team as a code owner June 23, 2026 03:09

timenick approved these changes Jun 23, 2026

View reviewed changes

DingmaomaoBJTU approved these changes Jun 23, 2026

View reviewed changes

ssss141414 closed this Jun 23, 2026

ssss141414 mentioned this pull request Jun 23, 2026

examples: add facebook/bart-large-mnli text-classification recipe #933

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples: add nlpconnect/vit-gpt2-image-captioning image-to-text recipe (composite)#934

examples: add nlpconnect/vit-gpt2-image-captioning image-to-text recipe (composite)#934
ssss141414 wants to merge 1 commit into
mainfrom
shzhen/add-vit-gpt2-image-captioning-recipe

ssss141414 commented Jun 23, 2026

Uh oh!

ssss141414 commented Jun 23, 2026

Uh oh!

ssss141414 commented Jun 23, 2026

Uh oh!

ssss141414 commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ssss141414 commented Jun 23, 2026

PR: nlpconnect/vit-gpt2-image-captioning — extend Goal ladder to L2-encoder + probe L3 (composite image-to-text)

Summary

1. Recipe files

2. README index row

3. Build output directories + artifact inventory

temp/verify_vit_enc/ (encoder)

temp/verify_vit_dec/ (decoder)

4. Build logs

5. Appended findings

Per-model — model_knowledge/vision_encoder_decoder.json

Skill-meta — skill_meta/findings.json

6. Optimum-coverage probe verdict

7. Claimed (Effort, Goal, Outcome) tier

8. Goal-ladder verdict table (per _meta-018)

9. Methodology-evolution declaration (per _meta-031)

Artifact mining (Step 4)

Encoder (temp/verify_vit_enc/)

Decoder (temp/verify_vit_dec/)

winml_build_config.json (autoconf diffs)

Reviewer next steps

Uh oh!

ssss141414 commented Jun 23, 2026

Reviewer verdict: APPROVE

Step 0 — Scope check

Composite contract check (_meta-025)

Per-half Goal-ladder verdict

L1-CPU reviewer perf (full output)

Outcome-L0

Short-circuit (_meta-018)

Sign-off

Uh oh!

ssss141414 commented Jun 23, 2026

Reviewer verification: OV cpu / gpu / npu — main @ b448652

Commands

config (auto-generates encoder + decoder configs)

build (OV CPU, fp32)

perf — cpu / gpu / npu

eval (no default dataset, consistent across all devices)

Results

Uh oh!

ssss141414 commented Jun 23, 2026

Closing as catalog-only — no engineering delta over main

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`temp/verify_vit_enc/` (encoder)

`temp/verify_vit_dec/` (decoder)

Per-model — `model_knowledge/vision_encoder_decoder.json`

Skill-meta — `skill_meta/findings.json`

8. Goal-ladder verdict table (per `_meta-018`)

9. Methodology-evolution declaration (per `_meta-031`)

Encoder (`temp/verify_vit_enc/`)

Decoder (`temp/verify_vit_dec/`)

`winml_build_config.json` (autoconf diffs)

Composite contract check (`_meta-025`)

Short-circuit (`_meta-018`)

Reviewer verification: OV cpu / gpu / npu — main @ `b448652`

Closing as catalog-only — no engineering delta over `main`