recipe(DepthPro-hf): add depth-estimation recipe (Goal-L1 PASS on CPU)#943
Draft
ssss141414 wants to merge 1 commit into
Draft
recipe(DepthPro-hf): add depth-estimation recipe (Goal-L1 PASS on CPU)#943ssss141414 wants to merge 1 commit into
ssss141414 wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR: apple/DepthPro-hf — depth-estimation recipe (fp32, CPU)
Iter: 6 (build closure shipped iter-3 as depth_pro-002/003; this PR adds the L1-CPU evidence on top)
Producer: main agent (2026-06-23)
Claimed tier:
(Effort = L0★, Goal = L1-CPU, Outcome = L0)Summary
This PR ships the
apple/DepthPro-hfdepth-estimation recipe. DepthPro is a 952M-param model with 3 independent DINOv2 backbones (patch + image + fov encoders) plus neck/fov/fusion stages — the recipe is structurally one of the largest single-graph models in the catalog. Builds cleanly via the standard L0★ template; L1-CPU PASSes via a custom Python harness that reuses the cached artifact (thewinml perfpath triggers a re-export per invocation, which is wasteful for a 3.6 GB model — see Diligence ladder note in §8). No source-code changes.1. Recipe file
examples/recipes/apple_DepthPro-hf/depth-estimation_fp16_config.json
Filename
fp16_*is cosmetic per_meta-014; recipe ships fp32 (102 FLOAT32 initializers, 0 FLOAT16 verified byonnx.load).Recipe input shape:
pixel_values [1, 3, 1536, 1536] float32 range [0, 1]. Outputs:predicted_depth [1, 1536, 1536]+field_of_view [1].2. README index row
examples/recipes/README.md — row to add for
apple/DepthPro-hf | depth-estimation | single (no composite) | recipe.3. Build output directory + artifact inventory
temp/depth_pro_build/(gitignored — referenced by path for reviewer re-execution):model.onnxmodel.onnx.dataexport.onnx+.dataoptimized.onnx+.dataanalyze_result.jsonexport_htp_metadata.jsonwinml_build_config.jsonExternal-data layout check (
_meta-023):model.onnxandmodel.onnx.dataco-located in same directory. PASS.4. Build log
temp/depth_pro_build.log —
Build complete in 758.0s(export 375s + optimize 355s). Build artifact path:temp/depth_pro_build/.5. Appended findings
Per-model —
model_knowledge/depth_pro.jsonSkill-meta
No new
_meta-NNNfindings in this PR (Lane B).6. Optimum-coverage probe verdict
Verdict: WINML-ONLY.
depth_promodel_type is not registered in Optimum'sTasksManager._SUPPORTED_MODEL_TYPE; winml'sregister_onnx_overwritedecorator atsrc/winml/modelkit/models/hf/depth_pro.pyis what makes export work. Despite the WINML-ONLY classification, no code is needed in THIS PR (the per-arch file already exists) — the recipe is a pure consumer of the existing registration. Effort L0★ confirmed.7. Claimed (Effort, Goal, Outcome) tier
depth_pro.pyalready exists from prior iter; this PR adds only the recipe + finding append)_meta-015analogue for depth-estimation)8. Goal-ladder verdict table (per
_meta-018)winml build→model.onnx+model.onnx.dataco-located; opset 17, fp32, 2822 nodes, 19 unique op types; Build complete in 758.0s. Log: temp/depth_pro_build.logpixel_values [1,3,1536,1536]input; warmup 29582 ms (cold); throughput 0.035 samples/sec on CPU. Custom Python harness per_meta-017(avoids re-export). Log: temp/depth_pro_perf_cpu.log; script: temp/depth_pro_perf.py_meta-016. 49% layout-move ops (Reshape/Transpose/Slice = 1378/2822 perdepth_pro-003) means QNN-NPU would likely be heavily move-bound even when available._meta-018. PT-vs-ONNX comparison would need DepthPro pipeline reconstruction (preprocessor → 3-backbone forward → neck → fusion → head); script not written this turn.winml evaltask registry does not includedepth-estimation(analogous to translation per_meta-015).Short-circuit honored: no FAIL anywhere. L2/L3 deferred-or-blocked do not halt the march per
_meta-018. Honest ceiling is L1-CPU PASS.Diligence ladder (
_meta-037) — invoked during L1 attempt:depth_pro.json— no prior perf workaround documented for this model.winml config— N/A, recipe already exists.--ep-optionsretry — N/A, CPU not failing.value_range/ shape pinning — recipe shape already pinned to[1,3,1536,1536].winml perftriggered full re-export (~13 min per invocation since eachuv run winml perfrebuilds the artifact); switched to directonnxruntime.InferenceSessionagainst cachedtemp/depth_pro_build/model.onnx. Loaded in 15.44s, ran 3 iters in 86s total.Feature gap from step 6 trigger:
winml perfshould accept a pre-built artifact path (e.g.--artifact temp/depth_pro_build/model.onnx) and skip the build phase entirely. For a 3.6 GB model, the build-per-perf-invocation cost is prohibitive. Captured underdepth_pro-003feature_gaps_filed[]as a follow-up.9. Methodology-evolution declaration (per
_meta-031)No NEW methodology friction in this PR. The custom-harness pattern is
_meta-017; thewinml perfre-export cost is a new observation but rolls into the existing_meta-017gotcha rather than a fresh_meta-NNN. Triggers:--warmup-iterationsvs--warmup, recovered viawinml perf --help; not a doc-cited flag).Reviewer should confirm "no methodology friction observed" per
_meta-031anti-trigger.Reviewer hand-off package — Step 6 9-item self-check