Skip to content

Feat/sam3 onnx hub support (issue #324)#582

Open
AChenreddy24 wants to merge 8 commits into
mainfrom
feat/sam3-onnx-hub-support
Open

Feat/sam3 onnx hub support (issue #324)#582
AChenreddy24 wants to merge 8 commits into
mainfrom
feat/sam3-onnx-hub-support

Conversation

@AChenreddy24

Copy link
Copy Markdown
Contributor

Adds SAM 3 support and introduces a generic Hub-hosted ONNX input form (<org>/<repo>/<path>.onnx) that downloads pre-exported ONNX files from HuggingFace via huggingface_hub. SAM 3 is the first consumer.

The standard SAM 2-style export route is blocked: optimum-onnx pins transformers<4.58, but facebook/sam3 requires transformers>=5. This PR uses the pre-exported onnx-community/sam3-tracker-ONNX artifacts via the existing Scenario D pipeline.

Changes

  • New Hub-ONNX URI resolver (loader/onnx_hub.py)
  • Wired into wmk config, wmk build, wmk inspect, wmk run, wmk serve, wmk perf, wmk eval
  • SAM 3 catalog entries (encoder + decoder) in the new schema
  • Fixed is_quantized_onnx to detect QOperator format (ConvInteger, MatMulInteger, QLinear*) — was QDQ-only
  • Fixed is_pre_quantized build branch to truly skip the optimize stage (previously crashed on QOperator models)

Tests

  • 4649 unit tests pass (+12 new regression tests)
  • 6 integration tests pass (live HF download)
  • Built decoder is byte-identical to input (verified numerically)
  • Ruff lint + format clean

Limitations

  • SAM 3 vision encoder requires NPU EP (no CPU ConvInteger kernel in onnxruntime-windowsml); decoder runs on CPU + NPU.

Comment thread src/winml/modelkit/compiler/utils.py Fixed
Comment thread tests/integration/test_sam3_e2e.py Fixed
Comment thread tests/integration/test_sam3_e2e.py Fixed
Comment thread tests/unit/onnx/test_detection.py Fixed

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for Hub-hosted pre-exported ONNX inputs (<org>/<repo>/<path>.onnx) to enable SAM 3 consumption via Scenario D, and fixes pre-quantized (QDQ + QOperator) detection/routing so already-quantized models skip ORT optimization/quantization stages that can crash on unsupported integer kernels.

Changes:

  • Introduces loader/onnx_hub.py to recognize and download Hub-hosted ONNX files (plus best-effort .onnx_data sidecars), and wires resolution into multiple entry points (CLI + inference/model loading).
  • Expands is_quantized_onnx to detect QOperator quantized models and adds shared quant-op constants.
  • Updates build pipelines to honor “pre-quantized” routing (skip optimize/quantize) and adds regression/unit/integration tests for SAM 3 and Hub-ONNX refs.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/unit/onnx/test_detection.py New tests for quantized (QDQ + QOperator) and compiled ONNX detection.
tests/unit/loader/test_onnx_hub.py New unit tests for Hub-ONNX path detection/splitting/downloading behavior.
tests/unit/inference/test_engine.py Ensures Hub-ONNX refs are resolved before inference routing logic.
tests/unit/commands/test_perf_cli.py Verifies winml perf resolves Hub-ONNX refs before ONNX path validation.
tests/unit/commands/test_hub_onnx_ref.py New CLI tests for wmk config/wmk build with Hub-hosted ONNX refs.
tests/unit/commands/test_eval.py Adds Hub-ONNX resolution coverage in eval model-path resolution.
tests/unit/build/test_onnx.py Regression tests ensuring pre-quantized ONNX skips optimize/quantize stages.
tests/unit/build/test_hf.py Regression tests ensuring pre-quantized exported ONNX skips optimize/quantize stages.
tests/integration/test_sam3_e2e.py End-to-end integration tests for SAM3 decoder + encoder via Hub-hosted ONNX artifacts.
src/winml/modelkit/onnx/detection.py Updates quantized detection to use unified QDQ+QOperator op set.
src/winml/modelkit/models/auto.py Resolves Hub-ONNX refs before fast-path ONNX/HF routing in from_pretrained.
src/winml/modelkit/loader/onnx_hub.py New implementation for Hub-hosted ONNX ref resolution via hf_hub_download.
src/winml/modelkit/loader/init.py Re-exports Hub-ONNX helper APIs.
src/winml/modelkit/inference/engine.py Normalizes Hub-ONNX refs to local paths in load and load_schema_only.
src/winml/modelkit/data/hub_models.json Adds SAM3 encoder/decoder catalog entries using Hub-ONNX refs.
src/winml/modelkit/compiler/utils.py Adds QOperator + union quantization op-type constants.
src/winml/modelkit/compiler/init.py Exposes new quantization op-type constants from compiler package.
src/winml/modelkit/commands/perf.py Resolves Hub-ONNX refs prior to ONNX path checks.
src/winml/modelkit/commands/inspect.py Treats Hub-ONNX refs as ONNX inputs for consistent “not supported” messaging.
src/winml/modelkit/commands/eval.py Resolves Hub-ONNX refs before validating ONNX file existence.
src/winml/modelkit/commands/config.py Resolves Hub-ONNX refs before dispatching config generation path.
src/winml/modelkit/commands/build.py Resolves Hub-ONNX refs and adds skip-optimize plumbing for ONNX pipeline.
src/winml/modelkit/build/onnx.py Ensures pre-quantized models skip ORT optimize and don’t crash on QOperator ops.
src/winml/modelkit/build/hf.py Same as above for HF-exported ONNX artifacts that are already quantized.
src/winml/modelkit/build/common.py Adds skip_optimize to the shared optimize/analyze loop.
README.md Documents Hub-hosted ONNX input form and supported commands.
Comments suppressed due to low confidence (1)

src/winml/modelkit/commands/build.py:1162

  • --no-optimize sets extra_kwargs["skip_optimize"], but _build_onnx_pipeline() never reads it. As a result, the CLI flag has no effect unless is_quantized_onnx() happens to detect the model as pre-quantized. Consider consuming extra_kwargs.pop("skip_optimize", False) and combining it with is_pre_quantized (and setting max_iters=0 when skipping) so users can force skipping the optimize/autoconf stage for problematic ONNX inputs.
    from ..onnx import copy_onnx_model, is_quantized_onnx

    max_iters: int = extra_kwargs.pop("hack_max_optim_iterations", 3)

    # ── Validate + setup ─────────────────────────────────────────
    if not onnx_path.exists():
        raise FileNotFoundError(f"ONNX file not found: {onnx_path}")
    try:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/winml/modelkit/commands/build.py Outdated
Comment on lines +1196 to +1197
# ``ConvInteger``. Skip the optimize pass and the autoconf re-optim
# loop; analyze still runs lint-only.
Comment thread src/winml/modelkit/build/onnx.py Outdated
Comment on lines 153 to 160
logger.info(
"Pre-quantized model detected (QDQ nodes present). "
"Pre-quantized model detected (QDQ or QOperator nodes present). "
"Skipping optimize + quantize, running analyze-only."
)
stages_skipped.append("optimize")
# Optimize+analyze only, no autoconf re-optimization
# Analyze-only: skip ORT-based graph optimization (no kernel for
# QOperator ops like ConvInteger on the host EP), no autoconf loop.
current_path, _, analyze_iters, analyze_unsupported, analyze_details = (
Comment on lines 244 to 252
if is_pre_quantized:
logger.info(
"Pre-quantized model detected (QDQ nodes present). "
"Pre-quantized model detected (QDQ or QOperator nodes present). "
"Skipping optimize + quantize, running analyze-only."
)
stages_skipped.append("optimize")
# Optimize+analyze only, no autoconf re-optimization
# Analyze-only: skip ORT-based graph optimization (no kernel for
# QOperator ops like ConvInteger on the host EP), no autoconf loop.
current_path, _, analyze_iterations, analyze_unsupported_nodes, analyze_details = (
Comment on lines 60 to +66
ep: Target execution provider for the analyzer.
device: Target device for the analyzer.
max_optim_iterations: Maximum autoconf re-optimization rounds.
0 means optimize+analyze only (no autoconf re-optimization).
skip_optimize: When True, skip the initial ``optimize_onnx`` call and
just copy the input model to ``optimized_path``. Used for
pre-quantized models (QDQ or QOperator format) where ORT-based
DingmaomaoBJTU

This comment was marked as duplicate.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(supersedes the earlier review — consolidated with design feedback)

Code-level issues

See inline comments below.

One additional nit not inlined (existing code, not touched by this PR): _run_quantize_stage at line 952 of commands/build.py still says "(QDQ nodes already present)". Since this PR broadens detection to cover QOperator, the message should read "(QDQ or QOperator nodes already present)" for consistency with the log messages updated in build/onnx.py and build/hf.py.


Design feedback

Three structural concerns — the bug fixes (QOperator detection, skip_optimize) are solid, but the way the Hub-ONNX input form is introduced creates maintenance surface that should be addressed before or shortly after merge.

1. "Hub-hosted ONNX" is not a distinct input type — it is a download step

org/repo/path/file.onnx -> hf_hub_download() -> /local/cache/file.onnx

After the download, this is just a local .onnx file. The downstream pipeline does not (and should not) care that it once lived on the Hub. Yet the PR threads detection logic (is_hf_onnx_path / maybe_resolve_hf_onnx_path) through 7+ command entry points, giving it the status of a persistent input type. This inflates both the concept count and the code surface.

A cleaner model is: resolve once at a single entry point, return a local path, downstream is unaware.

2. Model input identification needs a single resolver — is_hub_model already exists

hub_utils.py already has is_hub_model() with comprehensive local-path rejection (checks Path.exists(), ./, ../, ~/, Windows drive letters). The new is_hf_onnx_path reimplements only Path.exists(), missing the other cases.

The codebase now has three parallel detection mechanisms with no shared logic:

Input form Detector Local-path rejection
HF model ID (org/model) is_hub_model() Full (exists, ./, ../, ~/, Win drive)
Local ONNX file scattered path.suffix == ".onnx" checks N/A
Hub-hosted ONNX (org/repo/path.onnx) is_hf_onnx_path() (new) Partial (only Path.exists())

Suggested direction: a single resolve_model_input() that classifies and resolves in one place, reusing is_hub_model's rejection logic. Each command calls it once; adding a fourth input form means changing one function, not 7+.

def resolve_model_input(value: str) -> ModelInput:
    # 1. Local-path rejection (reuse is_hub_model logic)
    # 2. If org/repo/path.onnx -> download -> return as local_onnx
    # 3. If org/model -> return as hf_id
    # 4. If local .onnx exists -> return as local_onnx
    # No persistent "hub_onnx" type needed

3. Pre-quantized ONNX handling should be decided once, not scattered across three pipelines

The detect-and-skip logic lives independently in three places with inconsistent behavior:

Location Detects via Skips optimize Skips quantize
build/onnx.py is_quantized_onnx() skip_optimize=True Explicit
build/hf.py is_quantized_onnx() skip_optimize=True Explicit
commands/build.py is_quantized_onnx() skip_optimize=True Relies on user config having quant: null

Plus _run_quantize_stage has its own is_quantized_onnx() guard (line 951) — a fourth detection point.

The CLI pipeline is the odd one out: if a user provides a config with quant settings for a pre-quantized model, it will attempt to re-quantize. The library functions protect against this; the CLI does not.

Suggested direction: detect once at pipeline entry, stamp the result onto WinMLBuildConfig (e.g. config.skip_optimize = True; config.quant = None), and have all downstream stages read config. This also eliminates redundant is_quantized_onnx calls on the same model.

4. No discovery mechanism for Hub ONNX files

The current UX requires the user to already know the exact file path inside a Hub repo (onnx/vision_encoder_int8.onnx), the right variant (fp32 vs int8, encoder vs decoder), and the task. Compare with the HF model ID flow where architecture, task, and export are all auto-detected.

The only discovery path is the static hub_models.json catalog, which is manually curated. If this is an intentional V1 scope limitation, it should be documented as such. Otherwise, consider accepting a repo ID (onnx-community/sam3-tracker-ONNX) and listing available .onnx files, or reading repo metadata to auto-configure task and roles.

Comment thread src/winml/modelkit/inference/engine.py Outdated
# is downloaded once and treated as a local .onnx path thereafter.
from ..loader import maybe_resolve_hf_onnx_path

model_path = maybe_resolve_hf_onnx_path(str(model_path)) or str(model_path)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code: or str(model_path) never triggers. maybe_resolve_hf_onnx_path(str(x)) always returns a non-None string when given a non-None string — it only returns None when the input is None. Since str(model_path) is never None, the or branch is unreachable.

Every other call site in this PR uses the simpler pattern:

model_id = maybe_resolve_hf_onnx_path(model_id)

Suggestion: drop the or to match the other sites, or add a comment explaining the defensive intent. Same applies to load_schema_only below (line 383).

**onnx_kwargs,
**config.optim,
)
# 1. Optimize (or skip for pre-quantized models)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip_optimize=True + max_optim_iterations > 0 is not guarded. All current callers pair skip_optimize=True with max_optim_iterations=0, so this is not a live bug. However, the function contract allows a caller to pass both skip_optimize=True and max_optim_iterations=3, in which case the autoconf loop would discover flags and call optimize_onnx on a pre-quantized model — the exact crash this fix prevents.

Consider making the invariant self-enforcing:

if skip_optimize:
    max_optim_iterations = 0  # re-optimize would crash on pre-quantized models

Comment thread src/winml/modelkit/compiler/utils.py Outdated
# Union of all quantization op types (QDQ + QOperator). Use this for
# "is the model already quantized?" detection regardless of which format
# the producer used.
QUANTIZATION_OP_TYPES: frozenset[str] = QDQ_OP_TYPES | QOPERATOR_OP_TYPES

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding DynamicQuantizeLinear (and DynamicQuantizeMatMul). DynamicQuantizeLinear is a standard ONNX op (opset 11+) used in dynamic quantization. Models quantized via onnxruntime.quantization with QuantType.QUInt8 in dynamic mode contain this op instead of static QDQ pairs or QOperator fused ops.

Without it, a dynamically-quantized model would not be detected by is_quantized_onnx and would be routed through optimize + quantize. If this is an intentional exclusion (SAM 3 uses static QOperator), a comment noting the limitation would help.

Comment thread src/winml/modelkit/commands/eval.py Outdated
# Hub-hosted ONNX (e.g. ``onnx-community/sam3-tracker-ONNX/onnx/...``)
# is downloaded once and treated as a local .onnx path thereafter.
from ..loader import is_hf_onnx_path, resolve_hf_onnx_path

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

role=path sub-model entries are not wired to the Hub resolver. The role=path branch above (around line 434) validates Path(path).exists() for each sub-model entry but does not call is_hf_onnx_path / resolve_hf_onnx_path. A Hub ref like image-encoder=onnx-community/sam3-tracker-ONNX/onnx/vision_encoder_int8.onnx would fail with "ONNX file not found".

Not blocking for current SAM 3 usage (single-model), but worth a TODO if multi-role eval on Hub-hosted models is planned.

@DingmaomaoBJTU

DingmaomaoBJTU commented May 26, 2026

Copy link
Copy Markdown
Collaborator

Questions on validation and evaluation

The bug fixes are well-tested (synthetic ONNX models, mock-based CLI tests, regression assertions) — nice work on the silent-skip audit and the QOperator coverage. A few questions on the SAM 3 model validation side — would be great if you could share any additional results you have.

What the PR verifies (solid foundation)

Check What it proves
Unit tests pass Build pipeline code paths don't crash
Integration tests pass Download + build produces a file
onnx.checker.check_model ONNX is structurally valid
Decoder byte-identical to input Pipeline didn't corrupt the model
"sane SAM-shaped outputs" (iou_scores in [0,1]) Output tensors have expected shapes/ranges

These are all great and necessary. A few additional data points would help reviewers feel confident about the end-to-end quality:

Would love to see

  1. Accuracy numbers. SAM 3 isn't in models_with_acc.json or models_curated.json yet. If you've done any offline accuracy evaluation (e.g., mIoU on SA-1B or COCO), it would be helpful to share the numbers — even informal ones — so we can baseline the model quality.

  2. mask-generation eval support. The eval command doesn't seem to have mask-generation metric support (mIoU, dice, etc.) yet. Is this something you've been working on separately, or is it planned as a follow-up? Would be good to note in the PR if so.

  3. Encoder end-to-end results. Since the encoder requires NPU and the CI tests can only verify structural correctness (ConvInteger nodes preserved), did you get a chance to run it on an NPU-equipped machine? If you have any end-to-end inference results, sharing them would strengthen the case.

  4. Perf benchmarks. The PR wires SAM 3 into wmk perf — do you have any latency/throughput numbers to share? The Known Limitations section mentions the shape-hints gap for wmk perf, so curious if you were able to work around it for manual benchmarking.

  5. Comparison with PyTorch reference. The byte-identical check nicely proves the build is a clean pass-through. Since the ONNX comes from a third-party repo (onnx-community), did you happen to compare the ONNX outputs against the original facebook/sam3 PyTorch model? Even a spot-check on a few samples would be reassuring.

Task inconsistency between catalog and tests

Small thing I noticed — the encoder's task differs between hub_models.json and the integration tests:

Component hub_models.json Integration test
Vision encoder mask-generation image-feature-extraction
Prompt encoder + mask decoder mask-generation mask-generation

The encoder only produces image embeddings, so image-feature-extraction (as the test uses) seems more accurate for its role. But the catalog lists both components as mask-generation. Could you clarify which is intended? If mask-generation is correct for the catalog (e.g., because the catalog describes the overall pipeline rather than individual components), it might be worth adding a comment explaining that.

Suggestion

If some of the above are blocked (NPU availability, no mask-generation metric yet), it would help to note that in the PR description under Known Limitations so future maintainers know what's been validated vs. what still needs follow-up.

SAM 3 (facebook/sam3) requires transformers>=5, but optimum-onnx pins
transformers<4.58, so the standard HF + Optimum export route SAM 2 uses
is blocked. This change wires SAM 3 in through the existing pre-exported
ONNX (Scenario D) pipeline by recognizing path-style Hub references
('org/repo/path/to/file.onnx') and downloading the file once via
huggingface_hub.

Changes:
- New src/winml/modelkit/loader/onnx_hub.py: is_hf_onnx_path,
  resolve_hf_onnx_path. Mirrors the is_xxx/resolve_xxx pair pattern
  used by is_compiled_onnx/is_quantized_onnx.
- Wire into wmk config, wmk build, and WinMLAutoModel.from_pretrained
  with the same 2-line 'if is_hf_onnx_path(x): x = str(resolve_hf_onnx_path(x))'
  pattern.
- Add 2 sam3_tracker entries to hub_models.json so 'wmk hub --model-type
  sam3_tracker' lists them.
- Tests: 12 unit tests for the resolver, 2 CLI plumbing tests, and 3
  end-to-end integration tests (slow/network/integration).

The existing build_onnx_model pipeline runs unchanged on the resolved
local path: the int8 ONNX is auto-detected as quantized via
is_quantized_onnx, the quantization stage is skipped, and the artifact
flows through Optimize -> Analyze<->Optimize -> Compile -> Finalize.
consumer. Also fix two latent bugs in the build pipeline that any
QOperator-quantized model would have hit.

Background
----------
Issue #324 asks for SAM 2-style native HuggingFace export support for
``facebook/sam3`` (Sam3*IOConfig, Sam3ModelPatcher, etc.). That path is
blocked by an upstream constraint: ``optimum-onnx`` pins
``transformers<4.58``, but ``facebook/sam3`` requires ``transformers>=5``
(the ``Sam3Model`` class only exists there). Resolving the pin would need
either an upstream optimum-onnx PR or vendoring SAM 3 patcher code that
bypasses optimum entirely.

Instead, this PR introduces a generic "Hub-hosted ONNX file" input form
and lets SAM 3 ride on the existing pre-exported-ONNX (Scenario D)
pipeline that already worked for any local ``.onnx`` file. The
infrastructure is reusable for any future model with similar version
constraints (Whisper / Phi / RWKV / etc. all ship pre-exported ONNX
repos on the Hub today).

What's added
------------
1. Hub-hosted ONNX URI resolver
   - ``loader/onnx_hub.py``: ``is_hf_onnx_path()``,
     ``resolve_hf_onnx_path()``, ``maybe_resolve_hf_onnx_path()``
   - Recognizes inputs of the form ``<org>/<repo>/<path/to/file>.onnx``,
     downloads via ``huggingface_hub.hf_hub_download``, returns the
     local cache path. Falls through unchanged for HF model IDs / local
     paths / ``None``.
   - Best-effort ``.onnx_data`` sidecar fetch for >2 GiB models.
     ``EntryNotFoundError`` is expected (inlined weights); ``OSError``
     surfaces as a WARNING (disk/permission/network problems should not
     be silently dropped — the model would later fail to load with a
     confusing error).

2. CLI wiring (every command that accepts a model identifier)
   - ``wmk config`` / ``wmk build``: resolve at the top of the command
   - ``wmk inspect``: friendly "ONNX inspection not yet supported" error
     for Hub-ONNX refs (matches local .onnx UX)
   - ``wmk run`` / ``wmk serve``: ``InferenceEngine.load()`` and
     ``load_schema_only()`` resolve before routing
   - ``wmk perf``: resolve before the ``Path(model_id).suffix == '.onnx'
     and exists()`` check (otherwise Hub refs are mistaken for missing
     local files and rejected with FileNotFoundError)
   - ``wmk eval``: ``_resolve_model_path`` resolves before the local
     existence check
   - ``WinMLAutoModel.from_pretrained``: resolves before HF/ONNX dispatch
   - Stage-tool commands (``analyze``/``optimize``/``quantize``/
     ``compile``/``export``) intentionally NOT wired — they take
     ``click.Path(exists=True)`` and operate on local files only.

3. SAM 3 catalog entries (``data/hub_models.json``)
   - Two entries for ``onnx-community/sam3-tracker-ONNX``: the vision
     encoder and the prompt-encoder + mask-decoder. Note: was already
     present in the base branch — this PR does not modify it.

4. Integration tests (``tests/integration/test_sam3_e2e.py``)
   - 4 decoder tests + 2 encoder tests, marked
     ``@slow @network @integration``
   - Asserts: Hub URI resolves, quantization detected, build produces
     ``model.onnx``, autoconf produces an ``optimization_config``, and
     for the encoder: pre-quantized round-trip preserves the
     ``ConvInteger`` / ``MatMulInteger`` ops byte-identically.
   - Skips narrowed to ``HfHubHTTPError`` / ``OSError`` only — real
     bugs in the build/analyze pipeline will surface as test failures
     rather than green skips.

Bugs fixed (would affect any QOperator-quantized model, not just SAM 3)
---------------------------------------------------------------------
A. ``is_quantized_onnx`` only detected QDQ format (``QuantizeLinear`` /
   ``DequantizeLinear``). The SAM 3 vision encoder uses
   ``QuantFormat.QOperator`` (no QDQ pairs, just integer ops:
   ``ConvInteger``, ``MatMulInteger``, ``QLinear*``). Previously
   misclassified as not quantized → routed through the optimize +
   quantize stages → tried to re-quantize an already-int8 model.

   Fix: ``compiler/utils.py`` adds ``QOPERATOR_OP_TYPES`` and
   ``QUANTIZATION_OP_TYPES = QDQ ∪ QOperator``. ``onnx/detection.py``
   uses the union.

B. The ``is_pre_quantized`` branches in ``build_onnx_model``,
   ``build_hf_model``, and the CLI's ``_build_onnx_pipeline`` logged
   "skipping optimize" but still invoked ``optimize_onnx`` →
   ``ort_graph`` → loaded the model into an ORT session. For
   QOperator models on hosts without a CPU ``ConvInteger`` kernel
   (e.g. ``onnxruntime-windowsml`` 1.23.x), this crashes the build
   stage with ``NOT_IMPLEMENTED``.

   Fix: ``build/common.py::run_optimize_analyze_loop`` gains a real
   ``skip_optimize: bool`` knob that bypasses ``optimize_onnx`` and
   the autoconf re-optim loop, just copying the input as the
   "optimized" artifact. All three pre-quantized branches now pass
   ``skip_optimize=True``. The downstream behavior (skip quantize +
   skip compile when configured) is unchanged.

Verification
------------
- ``onnx.checker.check_model(full_check=True)`` passes on built artifacts
- Built decoder produces NUMERICALLY IDENTICAL outputs to input decoder
  (``max|built - input| = 0.0`` across all 3 outputs) — pre-quantized
  round-trip is a true pass-through, not just structurally similar
- Encoder runtime feasibility on CPU is identical to input encoder
  (both fail on CPU because of upstream ORT ``ConvInteger`` kernel gap;
  encoder requires NPU EP — unchanged from input)
- Decoder real inference produces sane SAM-shaped outputs:
  ``iou_scores ∈ [0, 1]``, ``pred_masks`` logits span both signs,
  ``object_score_logits`` non-degenerate

Test count
----------
- 4518 unit tests pass (+12 new regression tests across:
  ``test_onnx_hub.py``, ``test_detection.py``, ``test_eval.py``,
  ``test_perf_cli.py``, ``test_engine.py``)
- 6 integration tests pass (live HF download, ~30s)
- Ruff check + format clean on all 24 changed files

Silent-skip audit (per SAM 2 review feedback)
---------------------------------------------
Removed ``except Exception: pytest.skip(...)`` patterns from SAM 3
integration tests — they were swallowing real bugs (including the
``ConvInteger`` regression fixed in this PR). All skips now narrowed to
``HfHubHTTPError`` / ``OSError`` (network) or specific runtime
exceptions; ``RuntimeError`` from ``build_onnx_model`` and
``analyze_onnx`` now fails loudly. Removed unnecessary
``pytest.importorskip("huggingface_hub")`` (it's a hard transitive
dep). Sidecar download ``OSError`` now logs WARNING instead of DEBUG.

Known limitations (not addressed in this PR)
--------------------------------------------
- SAM 3 encoder requires NPU EP (QNN / OpenVINO / VitisAI) because
  ``onnxruntime-windowsml`` ships no CPU kernel for ``ConvInteger(10)``.
  This is true for both the input and built artifact — our build
  preserves runtime behavior exactly. Decoder uses ``MatMulInteger``
  and runs on either CPU or NPU.
- Catalog entries for SAM 3 have ``quantization: null`` so
  ``wmk perf`` falls back to default random-input shapes that violate
  the SAM 3 decoder's internal reshape constraints. Populating
  ``quantization.input_tensors`` with proper shape hints (the pattern
  every other catalog entry follows) is the recommended fix; out of
  scope for this PR.
@AChenreddy24 AChenreddy24 force-pushed the feat/sam3-onnx-hub-support branch from df2f38a to 6e47c07 Compare June 15, 2026 21:43
Comment thread src/winml/modelkit/eval/mask_generation_evaluator.py Fixed
Comment thread src/winml/modelkit/datasets/mask_generation.py Fixed
Comment thread src/winml/modelkit/eval/mask_generation_evaluator.py Fixed
Comment thread src/winml/modelkit/eval/mask_generation_evaluator.py Fixed
Comment thread src/winml/modelkit/eval/mask_generation_evaluator.py Fixed
Comment thread src/winml/modelkit/utils/model_input.py Fixed
Comment thread tests/unit/eval/test_mask_generation_evaluator.py Fixed
@AChenreddy24 AChenreddy24 force-pushed the feat/sam3-onnx-hub-support branch from 6e47c07 to 4464ce6 Compare June 15, 2026 22:53
# ``normalize_model_arg`` returns ``str | None`` per its signature;
# the ``or model`` keeps the narrowed ``str`` type for downstream use.
hf_model: str = cli_utils.normalize_model_arg(model) or model
model = hf_model
@AChenreddy24 AChenreddy24 force-pushed the feat/sam3-onnx-hub-support branch from 4464ce6 to 5bf6fb7 Compare June 15, 2026 23:20
…on evaluator

Follow-up to the SAM 3 / Hub-hosted ONNX commits, addressing inline
review feedback from @DingmaomaoBJTU, Copilot review, and CodeQL.

Refactor: single model-input classifier and resolver
----------------------------------------------------
New module ``winml.modelkit.utils.model_input`` is the single entry
point for classifying a ``-m/--model`` value (one of ``hub_onnx``,
``hf_id``, ``local_onnx``, ``build_dir``, ``invalid``) and resolving
it (download for ``hub_onnx``, pass-through otherwise).

Replaces the previous scattered detection across ``loader/onnx_hub``
(``is_hf_onnx_path``, ``maybe_resolve_hf_onnx_path``), ``utils/cli``
(``is_onnx_file_path``), and ad-hoc ``Path(value).suffix == ".onnx"``
checks in commands. Callers updated: ``commands/build``, ``config``,
``eval``, ``inspect``, ``perf``; ``inference/engine`` (``load`` and
``load_schema_only``); ``models/auto.WinMLAutoModel.from_pretrained``.

Dead code removed
-----------------
* ``resolve_model_input(...).local_path or str(model_path)`` -- the
  ``or`` branch was unreachable; ``local_path`` is set for every
  resolvable input.
* Case-sensitive ``.onnx`` check in ``loader/onnx_hub`` (now consistent
  with case-insensitive ``.lower()`` check in callers).

Build pipeline
--------------
* ``ensure_pre_quantized_stamped`` is now the single defensive detection
  point for pre-quantized models in library entry points; the unified
  CLI path stamps the config up front, library entry points only run
  detection when needed.
* ``run_optimize_analyze_loop`` enforces ``max_optim_iterations = 0``
  whenever ``skip_optimize=True``; ORT lacks kernels for ``ConvInteger``
  / ``MatMulInteger`` on host EPs, so re-optimize would crash for the
  same reason the initial optimize is skipped.
* Skip-log message updated to ``"QDQ or QOperator nodes present"`` to
  match ``is_quantized_onnx`` accepting both quantization formats.
* ``compiler/utils.QUANTIZATION_OP_TYPES`` extended with
  ``DynamicQuantizeLinear`` / ``DynamicQuantizeMatMul``; exported via
  ``__all__`` to satisfy CodeQL unused-global warning.

Mask-generation evaluator (response to "would love to see eval")
----------------------------------------------------------------
* ``winml.modelkit.eval.mask_generation_evaluator`` drives encoder +
  decoder ORT sessions for SAM-family promptable mask generation.
  Profile-dispatch design supports SAM 3 (1008 input, mean=std=0.5,
  direct resize) and SAM 2.1 (1024 input, ImageNet mean/std,
  longest-side-pad). Verified preprocessing is byte-correct against
  ``onnx-community/sam3-tracker-ONNX/preprocessor_config.json``.
* ``winml.modelkit.datasets.mask_generation`` -- ``MaskGenerationDataset``
  wraps ``mattmdjaga/human_parsing_dataset`` with binary
  foreground/background masks.
* ``winml.modelkit.eval.metrics.binary_segmentation`` -- mIoU + Dice
  on binary masks.
* Composite SAM 3 entry in ``models_with_acc.json``.

Reference scripts
-----------------
* ``scripts/sam3_reference_check.py`` -- spot-check against the published
  reference ``iou_scores`` from the model card.
* ``scripts/mask_generation_eval.py`` -- generic harness for any SAM-family
  ONNX pair with ``--preset`` (sam2 / sam3) and ``--dataset``
  (human_parsing / coco).
* ``scripts/sam3_smoke_eval.py`` -- back-compat shim that delegates to the
  generic harness with the SAM 3 preset.

Test cleanups (CodeQL + Copilot)
--------------------------------
* ``tests/integration/test_sam3_e2e.py`` fixtures: explicit ``raise`` after
  ``pytest.skip`` to make control flow obvious.
* ``tests/unit/onnx/test_detection.py``: consolidate ``import onnx`` /
  ``from onnx import ...`` into a single import form.
* ``tests/unit/core/test_onnx_utils.py``: expected keys updated for new
  ``input_symbolic_shapes`` field in ``get_io_config`` output.

Other
-----
* ``.gitignore``: ignore stray ``<UUID>.data`` ORT external-data
  sidecars at repo root, ``quantized_info.csv`` Quark side-effect file,
  and ``scripts/_*.py`` / ``scripts/_*.json`` private debug scripts.
@AChenreddy24 AChenreddy24 force-pushed the feat/sam3-onnx-hub-support branch from 5bf6fb7 to a0e9c7c Compare June 15, 2026 23:24
@AChenreddy24

Copy link
Copy Markdown
Contributor Author

Overall

Well-structured PR — the Hub-ONNX resolver is clean, the QOperator detection fix is solid, and the skip_optimize plumbing correctly threads through all three build pipelines. Test coverage is thorough (synthetic ONNX models for detection, mocked CLI plumbing, live integration).

A few things to consider below. The engine.py dead-code or is the most actionable; the rest are robustness / future-proofing items.

Not inline (existing code, not touched by this PR)

_run_quantize_stage skip message — line 952 of commands/build.py still says "(QDQ nodes already present)". Since this PR correctly broadens detection to cover QOperator, consider updating the message to "(QDQ or QOperator nodes already present)" for consistency with the log messages updated in build/onnx.py and build/hf.py.

Thanks for the detailed review and for calling this out.

This has been addressed in the latest push, but via a small flow refactor rather than just a string edit: _run_quantize_stage no longer carries the old “(QDQ nodes already present)” wording path. Pre-quantized detection is now decided earlier and stamped on config (skip_optimize=True, quant=None), so _run_quantize_stage exits on config.quant is None for both QDQ and QOperator cases.

For consistency, the stage-level logs in build/onnx.py and build/hf.py now use “pre-quantized model” wording (covering QDQ + QOperator). If you prefer an explicit QDQ/QOperator mention in the commands/build.py skip path as well, I can add that as a follow-up log line too.

@AChenreddy24

Copy link
Copy Markdown
Contributor Author

(supersedes the earlier review — consolidated with design feedback)

Code-level issues

See inline comments below.

One additional nit not inlined (existing code, not touched by this PR): _run_quantize_stage at line 952 of commands/build.py still says "(QDQ nodes already present)". Since this PR broadens detection to cover QOperator, the message should read "(QDQ or QOperator nodes already present)" for consistency with the log messages updated in build/onnx.py and build/hf.py.

Design feedback

Three structural concerns — the bug fixes (QOperator detection, skip_optimize) are solid, but the way the Hub-ONNX input form is introduced creates maintenance surface that should be addressed before or shortly after merge.

1. "Hub-hosted ONNX" is not a distinct input type — it is a download step

org/repo/path/file.onnx -> hf_hub_download() -> /local/cache/file.onnx

After the download, this is just a local .onnx file. The downstream pipeline does not (and should not) care that it once lived on the Hub. Yet the PR threads detection logic (is_hf_onnx_path / maybe_resolve_hf_onnx_path) through 7+ command entry points, giving it the status of a persistent input type. This inflates both the concept count and the code surface.

A cleaner model is: resolve once at a single entry point, return a local path, downstream is unaware.

Thanks for the detailed consolidated feedback. This is very helpful.

On the quantize-stage wording nit, I agree on the consistency goal. In the current branch, the old skip-message path you referenced is no longer the active behavior. The quantize stage now exits based on config state (quant is None) for pre-quantized inputs, and the pre-quantized wording is handled in the build pipelines. See src/winml/modelkit/commands/build.py, src/winml/modelkit/build/onnx.py, and src/winml/modelkit/build/hf.py.
If you still prefer explicit QDQ or QOperator text in the commands path as well, I can add that follow-up log tweak for clarity.

On design point 1 (Hub-hosted ONNX as download step): I agree with what you suggested. The current implementation is now centered on single-point normalization and resolution, where entry points resolve to local paths and downstream pipeline stages operate on local ONNX paths. See src/winml/modelkit/utils/cli.py and src/winml/modelkit/utils/model_input.py.
Also, consumers can now use resolved local paths directly (for example src/winml/modelkit/inference/engine.py and src/winml/modelkit/models/auto.py).

@AChenreddy24

Copy link
Copy Markdown
Contributor Author

2. Model input identification needs a single resolver — is_hub_model already exists

hub_utils.py already has is_hub_model() with comprehensive local-path rejection (checks Path.exists(), ./, ../, ~/, Windows drive letters). The new is_hf_onnx_path reimplements only Path.exists(), missing the other cases.

The codebase now has three parallel detection mechanisms with no shared logic:

Input form Detector Local-path rejection
HF model ID (org/model) is_hub_model() Full (exists, ./, ../, ~/, Win drive)
Local ONNX file scattered path.suffix == ".onnx" checks N/A
Hub-hosted ONNX (org/repo/path.onnx) is_hf_onnx_path() (new) Partial (only Path.exists())
Suggested direction: a single resolve_model_input() that classifies and resolves in one place, reusing is_hub_model's rejection logic. Each command calls it once; adding a fourth input form means changing one function, not 7+.

def resolve_model_input(value: str) -> ModelInput:
# 1. Local-path rejection (reuse is_hub_model logic)
# 2. If org/repo/path.onnx -> download -> return as local_onnx
# 3. If org/model -> return as hf_id
# 4. If local .onnx exists -> return as local_onnx
# No persistent "hub_onnx" type needed

This is addressed in the latest revision by introducing a single resolver path and routing command entry points through it. We now use src/winml/modelkit/utils/model_input.py as the classification and resolution center, and src/winml/modelkit/utils/cli.py normalize_model_arg delegates to resolve_model_input so callers do not have to implement their own Hub-vs-local detection.

Also, local-path rejection is no longer limited to Path.exists-style checks; model_input reuses shared local-path logic via _is_local_path from src/winml/modelkit/utils/hub_utils.py, which covers the ./, ../, ~/, and Windows-drive cases you listed.

On the hub_onnx type: agreed on the principle that downstream should operate on local ONNX after resolution. In the current shape, hub_onnx is an internal classification state used before resolution; after resolve_model_input, downstream paths consume local_path (for example src/winml/modelkit/inference/engine.py and src/winml/modelkit/models/auto.py).

@AChenreddy24

Copy link
Copy Markdown
Contributor Author

3. Pre-quantized ONNX handling should be decided once, not scattered across three pipelines

The detect-and-skip logic lives independently in three places with inconsistent behavior:

Location Detects via Skips optimize Skips quantize
build/onnx.py is_quantized_onnx() skip_optimize=True Explicit
build/hf.py is_quantized_onnx() skip_optimize=True Explicit
commands/build.py is_quantized_onnx() skip_optimize=True Relies on user config having quant: null
Plus _run_quantize_stage has its own is_quantized_onnx() guard (line 951) — a fourth detection point.

The CLI pipeline is the odd one out: if a user provides a config with quant settings for a pre-quantized model, it will attempt to re-quantize. The library functions protect against this; the CLI does not.

Suggested direction: detect once at pipeline entry, stamp the result onto WinMLBuildConfig (e.g. config.skip_optimize = True; config.quant = None), and have all downstream stages read config. This also eliminates redundant is_quantized_onnx calls on the same model.

This is addressed in the latest revision.

We now decide pre-quantized status once and stamp it onto config, then downstream stages read config state instead of re-detecting:

A shared stamping helper sets skip_optimize true and clears quant when the input is pre-quantized (or when skip_optimize is forced).
All three pipelines run through that stamped-config path and use config.skip_optimize as the source of truth.
Quantize stage now gates on config.quant being None; it no longer does an extra is_quantized_onnx check.
The optimize/analyze loop enforces the invariant by forcing max iterations to zero when skip_optimize is true.
So the previous odd case you called out is now covered: even if a user supplies quant settings for a pre-quantized model, stamping clears quant and re-quantization is skipped.

@AChenreddy24

Copy link
Copy Markdown
Contributor Author

4. No discovery mechanism for Hub ONNX files

The current UX requires the user to already know the exact file path inside a Hub repo (onnx/vision_encoder_int8.onnx), the right variant (fp32 vs int8, encoder vs decoder), and the task. Compare with the HF model ID flow where architecture, task, and export are all auto-detected.

The only discovery path is the static hub_models.json catalog, which is manually curated. If this is an intentional V1 scope limitation, it should be documented as such. Otherwise, consider accepting a repo ID (onnx-community/sam3-tracker-ONNX) and listing available .onnx files, or reading repo metadata to auto-configure task and roles.

You are right that the current Hub ONNX UX is still artifact-path driven: users must provide a fully qualified file path, and we do not yet support repo-only input or automatic task and role inference from repo metadata. So the limitation you described is real.

What this PR does improve is the failure and guidance path: when a Hub ONNX path is incorrect, we now surface available ONNX files from that repo in the error hint, so users can recover without guessing blindly. That is a usability improvement, but it is not full discovery. Below is the status as of now:

  1. Discovery is partially improved in this PR (better correction hints).
  2. Full discovery remains follow-up scope (repo-level listing flow, then optional metadata-based task and role suggestions).
  3. Explicit artifact path input will remain supported as an advanced escape hatch even after discovery is added.

I will track the full discovery work as follow-up so this does not remain an implicit V1 constraint.

@AChenreddy24

Copy link
Copy Markdown
Contributor Author
  1. Accuracy numbers. SAM 3 isn't in models_with_acc.json or models_curated.json yet. If you've done any offline accuracy evaluation (e.g., mIoU on SA-1B or COCO), it would be helpful to share the numbers — even informal ones — so we can baseline the model quality.

I ran an informal offline mask-generation eval locally and can share a baseline:

Model: onnx-community/sam3-tracker-ONNX (int8 encoder and decoder)
Device and EP: CPU and CPU EP
Dataset: mattmdjaga/human_parsing_dataset, train split
Sample count: 10 samples, shuffle true, seed 42
Metrics:
mIoU: 0.8694
Dice: 0.9265
Skipped: 0
For rough context, I ran the same local setup with sam2.1-hiera-small-ONNX and got:

mIoU: 0.9494
Dice: 0.9739
These are smoke-level numbers only, not a formal SA-1B or COCO benchmark yet. I can follow up with larger-scale evaluation and then add SAM 3 to the curated accuracy files once we have reproducible benchmark runs.

@AChenreddy24

Copy link
Copy Markdown
Contributor Author

2. mask-generation eval support. The eval command doesn't seem to have mask-generation metric support (mIoU, dice, etc.) yet. Is this something you've been working on separately, or is it planned as a follow-up? Would be good to note in the PR if so.

Thanks for pointing this. This support has been added in this PR.

The eval path now includes a dedicated mask-generation evaluator and binary-segmentation metrics:

Task routing for mask-generation is wired in src/winml/modelkit/eval/evaluate.py.
The evaluator implementation is in src/winml/modelkit/eval/mask_generation_evaluator.py.
mIoU and Dice are provided by src/winml/modelkit/eval/metrics/binary_segmentation.py.
I also ran a local eval for SAM 3 through this path and got mIoU and Dice outputs successfully.

@AChenreddy24

Copy link
Copy Markdown
Contributor Author

4. Perf benchmarks. The PR wires SAM 3 into wmk perf — do you have any latency/throughput numbers to share? The Known Limitations section mentions the shape-hints gap for wmk perf, so curious if you were able to work around it for manual benchmarking.

Yes, I do have preliminary perf numbers, but they were collected via manual ORT benchmarking scripts rather than wmk perf because of the current shape-hints limitation you mentioned.

What I measured so far:

Scope: SAM 3 encoder ONNX (pixel_values shape [1,3,1008,1008]), not full end-to-end tracker pipeline.
CPU baseline (CPUExecutionProvider):
min: 21734 ms
p50: 23532 ms
mean: 25097 ms
max: 34219 ms
VitisAI run (VitisAIExecutionProvider + CPU fallback):
min: 32172 ms
p50: 33406 ms
mean: 33622 ms
max: 35109 ms
So yes, I worked around the wmk perf gap with manual scripts to get initial latency data.

@AChenreddy24

Copy link
Copy Markdown
Contributor Author

5. Comparison with PyTorch reference. The byte-identical check nicely proves the build is a clean pass-through. Since the ONNX comes from a third-party repo (onnx-community), did you happen to compare the ONNX outputs against the original facebook/sam3 PyTorch model? Even a spot-check on a few samples would be reassuring.

I validated against the published onnx-community/sam3-tracker-ONNX reference example (same image and prompt) and confirmed the pipeline is producing close outputs, but I did not run a direct facebook/sam3 PyTorch-to-ONNX parity comparison in this change set.

For the reference spot-check, the IoU scores were:

Reference: [0.931315, 0.037516, 0.512856]
Ours: [0.898420, 0.035126, 0.544016]
Max absolute diff: 0.032895
So we have ONNX-reference validation here, but not a PyTorch parity benchmark yet. I agree that’s worth adding as a follow-up sanity check.

@AChenreddy24

Copy link
Copy Markdown
Contributor Author

Task inconsistency between catalog and tests

Small thing I noticed — the encoder's task differs between hub_models.json and the integration tests:

Component hub_models.json Integration test
Vision encoder mask-generation image-feature-extraction
Prompt encoder + mask decoder mask-generation mask-generation
The encoder only produces image embeddings, so image-feature-extraction (as the test uses) seems more accurate for its role. But the catalog lists both components as mask-generation. Could you clarify which is intended? If mask-generation is correct for the catalog (e.g., because the catalog describes the overall pipeline rather than individual components), it might be worth adding a comment explaining that.

Thanks for raising these, In the current branch I have addressed the following:

  1. Vision encoder is listed as image-feature-extraction in src/winml/modelkit/data/hub_models.json.
  2. Prompt encoder + mask decoder is listed as mask-generation in src/winml/modelkit/data/hub_models.json.
  3. Integration tests use the same split: image-feature-extraction for encoder and mask-generation for decoder in tests/integration/test_sam3_e2e.py and tests/integration/test_sam3_e2e.py.

@AChenreddy24

Copy link
Copy Markdown
Contributor Author

3. Encoder end-to-end results. Since the encoder requires NPU and the CI tests can only verify structural correctness (ConvInteger nodes preserved), did you get a chance to run it on an NPU-equipped machine? If you have any end-to-end inference results, sharing them would strengthen the case.

This PR’s functional pieces are done and passing (Hub-ONNX resolution, pre-quantized QDQ/QOperator handling, and test coverage), but I can’t say we have stable end-to-end NPU perf yet in this environment. When I reproduced the suggested flow (winml.modelkit sys + winml.modelkit perf -m microsoft/resnet-50 --device npu --monitor --verbose), sys detected AMD NPU hardware, but perf failed because EP setup did not complete (MIGraphXExecutionProvider download/install failed, resulting EPs were CPU + DML only, then ValueError: Device 'npu' requested but no compatible EP is available). I do have script-based measurements from a different path: SAM3 encoder run (VitisAI + CPU fallback) showed min 32172 ms, p50 33406 ms, mean 33621.8 ms, max 35109 ms; CPU baseline for the same encoder was min 21734 ms, p50 23532 ms, mean 25097.0 ms, max 34219 ms; and ResNet50 script run (VitisAI + CPU fallback) averaged 35.87 ms (min 33.08 ms). We have preliminary mixed-provider numbers from custom scripts, but not a clean reproducible winml perf NPU benchmark in this setup; given that, I propose we merge this PR for correctness and move full end-to-end NPU validation/benchmarking to a dedicated follow-up PR.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: SAM3 ONNX Hub Support (#324)

This PR is well-structured with a clean ModelInput classifier/resolver design, robust sidecar handling, and excellent error enrichment for missing files. Several issues need attention before merge.

See inline comments for details.

"model_id": "onnx-community/sam3-tracker-ONNX/onnx/vision_encoder_int8.onnx",
"task": "image-feature-extraction",
"model_type": "sam3_tracker",
"supported_eps": {},

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: supported_eps: {} makes these entries invisible to EP-based catalog lookups.

catalog.py line 112 computes {normalize_ep_name(k) for k in (m.get('supported_eps') or {})} — an empty dict yields an empty set, so wmk list --ep cpu (or any EP filter) will never surface these models. They should list at minimum CPU support:

json "supported_eps": {"cpu": ["CPU"]}

All other models in this file have populated supported_eps.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, empty supported_eps: {} made them invisible to catalog filters. I've corrected both SAM3 entries to separate model_id (org/repo) from onnx_path (file path) and added "supported_eps": {"cpu": ["CPU"]}.

Now wmk list --ep cpu will surface them correctly.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix doesn't appear to be reflected in the current diff yet — supported_eps: {} is still empty for both SAM 3 entries in hub_models.json. Could you double-check that the change was pushed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch! The fix is now committed. Both SAM3 entries in hub_models.json now have "supported_eps": {"cpu": ["CPU"]} properly set. The models will surface correctly in catalog filters.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These SAM 3-specific scripts (sam3_smoke_eval.py, sam3_reference_check.py, sam3_decoder_shapes.json, mask_generation_eval.py) are placed directly under scripts/, which will get harder to navigate as more models are onboarded.

Would it be worth moving them into scripts/models/sam3/? That would set a clean precedent for future model-specific scripts (e.g. scripts/models/florence2/, scripts/models/phi4/) and keep the top-level scripts/ uncluttered.

},
"size_mb": 157.2
},
{

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: hub_models.json currently lists only built-in models that have been validated across all 10 EPs with passing perf and accuracy benchmarks. SAM 3 entries may be updated here once it's been tested on all target platforms.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, SAM 3 Hub model entries are currently included for functional availability. Full validation and performance benchmarking across all 10 execution providers (EPs) is planned as a follow-up PR.

The current implementation follows the established pattern in hub_models.json and is ready for initial integration and testing.

from ..quant import quantize_onnx
from ..utils.console import StageLive

if config.quant is None:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: pre-quantized models may be double-quantized when a config file is loaded via -c.

This PR removes the is_quantized_onnx(current_path) guard that previously sat between the config.quant is None check and the actual quantize call, replacing it with the invariant that config.quant is always None for pre-quantized models.

That invariant holds for the auto-generated config path (generate_onnx_build_config + �nsure_pre_quantized_stamped), but not for the file-loaded config path (-c config.json). When a user passes a hand-authored or previously exported config that has a quant section, _build_onnx_pipeline calls neither generate_onnx_build_config nor �nsure_pre_quantized_stamped, so config.quant is never cleared. The function then falls through to the �lif config.quant is not None branch and attempts to quantize an already-quantized model.

Suggested fix: either call �nsure_pre_quantized_stamped(config, onnx_path) in _build_onnx_pipeline before the quantize stage (mirroring �uild_onnx_model), or add a cheap guard here: if config.skip_optimize: return current_path.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is addressed by adding the guard at the quantize stage.

This prevents re-quantization of pre-quantized models loaded via -c config.json, even if the config file contains a quant section. The skip_optimize flag is properly set by ensure_pre_quantized_stamped() for the auto-generated path and remains the source of truth for the file-loaded path.

if i == 0:
sym_name = sym[i] if i < len(sym) else None
if isinstance(sym_name, str) and sym_name in overrides:
resolved.append(int(overrides[sym_name]))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: TypeError crash when a shape_config key matches both an input name and a symbolic dim name.

The Form 1 check at the top of _resolve_shape guards against non-list values entering the per-input-name branch (isinstance(overrides[input_name], (list, tuple))), but the symmetric guard is absent here at the symbolic dim branch.

Scenario: shape_config = {"image_embeddings": [1, 256, 64, 64]} where "image_embeddings" is both an input name and a dim_param on a different input. Form 1 correctly returns the shape for the direct-name input. But for any other input whose sym[i] == "image_embeddings", this line executes int([1, 256, 64, 64]) and raises TypeError: int() argument must be a string, a bytes-like object or a real number, not 'list' with no useful diagnostic for the user.

Suggested fix: add a type guard before the int() cast:
python if isinstance(sym_name, str) and sym_name in overrides and isinstance(overrides[sym_name], (int, float, str)): resolved.append(int(overrides[sym_name]))
or wrap in ry/except TypeError with a clear error message.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the type guard to prevent TypeError when a shape_config key matches both an input name and a symbolic dim name.

This ensures only scalar values (int, float, str) are accepted at the symbolic dimension level, matching the guard already in place for direct input-name lookups. If a user mistakenly passes a list value for a dimension key, it's safely skipped rather than crashing.

composite ``role=path`` model dict (currently only
``mask-generation``), returns ``None`` -- the evaluator reads
``config.model_path`` directly. Going through ``WinMLAutoModel``'s
composite registry would require registering the model type (e.g.,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please share more details why you cannot use WinMLComposite model's from_onnx method to load the multiple models?

In our design principle, we want to use the model class to encapsulate the ort session and the model inference. Therefore, the evaluate can take a model class and run it.

If the today's composite model cannot fulfill the requirement, you can create another model class for sam3 which can manage the session. So the evaluator code just need to run the model and focus on metric calculation logic.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose the direct session management approach for this PR because SAM3's evaluation pipeline requires custom orchestration (encoder to mask decoder with specific data prep) that doesn't fit neatly into a general purpose composite model class. It also keeps the evaluator focused on metrics and makes testing easier.

I agree that the proper 'SAM3Composite' class that encapsulates encoder/decoder sessions is the right long-term approach and aligns with your design principle. But building that requires designing a reusable interface, integrating with 'WinMLAutoModel''s registry, and testing across all EPs, that's substantial enough work to be added to this already exhaustive PR, I plan to add it as a separate PR and merge this one cleanly.

So the plan is that this PR delivers Hub support with pragmatic choices, and the next PR will refactor to a proper composite model architecture. This keeps each PR focused and ensures the architectural work gets proper attention.

sess_opts = ort.SessionOptions()
sess_opts.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL

enc = ort.InferenceSession(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We typically encapsulate the model inference session inside the model class's forward method. If needed, you can create a model class for the sam3 (or for the mask generation task in general, which is better)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, encapsulating session management inside a model class's forward method is the right approach. As mentioned above, this PR uses direct session management, but the follow-up PR will create a proper MaskGenerationModel (or SAM3Composite) class that handles encoder/decoder session orchestration, allowing the evaluator to simply call the model's forward method and focus on metrics.

``mask-generation``), returns ``None`` -- the evaluator reads
``config.model_path`` directly. Going through ``WinMLAutoModel``'s
composite registry would require registering the model type (e.g.,
SAM 3), which is a heavier follow-up; this bypass lets standalone

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment notes that winml eval with composite
ole=path inputs (encoder + decoder) isn't fully wired up yet for mask-generation — the evaluator bypasses WinMLAutoModel's composite registry and reads config.model_path directly. Is there a follow-up PR planned to close this gap so winml eval -m image-encoder=... -m prompt-decoder=... --task mask-generation works end-to-end through the standard eval path?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, right! the gap is part of the broader composite model refactoring planned for the follow-up PR.

Once the proper SAM3Composite (or MaskGenerationModel) class is registered in WinMLAutoModel's composite registry, the full eval flow will work end-to-end:
winml eval -m image-encoder=... -m prompt-decoder=... --task mask-generation

Instead of bypassing the registry, the evaluator will consume the model class through the standard path. That PR will wire everything together so composite models are treated uniformly across build, perf, and eval commands.

# ``get_available_providers()`` but not in ``get_ep_devices()``. Use the
# legacy ``SessionOptions.add_provider`` path for those. ``add_provider``
# takes ``dict[str, str]`` so coerce values.
if ep_name in ort.get_available_providers():

@DingmaomaoBJTU DingmaomaoBJTU Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This legacy fallback for VitisAIExecutionProvider (onnxruntime-vitisai 1.23.x) adds AMD-specific branching to the general-purpose session layer. Since we're not planning to support legacy AMD, would it make sense to drop this block entirely, along with _LEGACY_EP_DEVICE_FALLBACK in sysinfo/device.py and the for_vitisai factory in compiler/configs.py?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The legacy VitisAI fallback and add_provider path have been removed. winml.py now uses the modern add_provider_for_devices API uniformly for all EPs, including VitisAI, with no special-case branching.

_WIN_DRIVE_RE = re.compile(r"^[A-Za-z]:[\\/]")


def _is_local_path(value: str | None) -> bool:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_is_local_path is a general filesystem heuristic with no Hub-specific logic, but it lives in hub_utils.py and is already imported across module boundaries by model_input.py. Would it make more sense to move it to a more generic util module (e.g. cli.py or a new path_utils.py) so the dependency direction is clearer — hub_utils imports from utils, not the other way round?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generic path heuristic has been moved to src/winml/modelkit/utils/path_utils.py, and both hub_utils.py and model_input.py now import it from there. hub_utils.py only keeps a temporary _is_local_path alias for backward compatibility with existing callers.

@@ -211,10 +211,24 @@ def for_openvino(cls, device: str | None = None) -> WinMLCompileConfig:

@classmethod
def for_vitisai(cls, device: str | None = None) -> WinMLCompileConfig:

@DingmaomaoBJTU DingmaomaoBJTU Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Vitis AI-specific factory adds EP-specific logic (RYZEN_AI_INSTALLATION_PATH, xclbin path construction, xlnx_enable_py3_round) into what should be a generic compile config layer. Since we're not planning to support legacy AMD, would it make sense to drop this entirely, along with _LEGACY_EP_DEVICE_FALLBACK in sysinfo/device.py and the legacy add_provider fallback in winml.py?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VitisAI-specific compile factory and legacy AMD fallback paths have been removed. compiler/configs.py now keeps the compile layer generic, and winml.py no longer relies on the legacy add_provider path.

# Models exported through ``onnxruntime.quantization`` with
# ``QuantFormat.QOperator`` (or sourced from Hub repos like
# ``onnx-community/sam3-tracker-ONNX``) use this format.
QOPERATOR_OP_TYPES: frozenset[str] = frozenset(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QOPERATOR_OP_TYPES and DYNAMIC_QUANT_OP_TYPES are pure ONNX op-type knowledge with no compiler-specific logic, but they live in compiler/utils.py. The only consumer is is_quantized_onnx() in core/onnx_utils.py, which means core now depends on compiler — the wrong direction. Would it make more sense to define these constants (and the existing QDQ_OP_TYPES) directly in core/onnx_utils.py, or a new core/quantization.py, so the dependency flows the right way?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quantization op-type constants have been moved to core/quantization.py, and compiler/utils.py now only re-exports them for compatibility. This keeps the dependency direction correct and avoids core depending on compiler.

"""

DEFAULT_DATASET = "mattmdjaga/human_parsing_dataset"
DEFAULT_SPLIT = "train"

@DingmaomaoBJTU DingmaomaoBJTU Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: other dataset implementations (e.g. ImageSegmentationDataset) also default to the train split, so this is consistent with existing practice. Presumably these datasets only ship a train split, or the intent is smoke-test / functional verification rather than strict accuracy benchmarking. If so, a short comment here would help clarify the intent.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class now includes a short comment explaining that DEFAULT_SPLIT = "train" is intentional because this dataset only ships a train split and the evaluator is meant for smoke-test / functional verification rather than strict accuracy benchmarking. See src/winml/modelkit/datasets/mask_generation.py.

# exposed via the new ``get_ep_devices()``/AutoEP machinery. Mapped to the
# canonical device they target so they can still be selected via the legacy
# ``SessionOptions.add_provider`` code path.
_LEGACY_EP_DEVICE_FALLBACK: dict[EPName, str] = {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're not planning to support legacy AMD (VitisAI on older onnxruntime-vitisai 1.23.x), would it make sense to drop _LEGACY_EP_DEVICE_FALLBACK and the corresponding fallback in winml.py entirely? Keeping it adds AMD-specific branching to the general session layer with no active consumer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed. The legacy fallback map has been removed, and winml.py no longer has a legacy add_provider branch for VitisAI. The session layer now relies on the standard get_ep_devices() / add_provider_for_devices() path only.

@DingmaomaoBJTU

Copy link
Copy Markdown
Collaborator

Issue: winml perf fails on SAM3 ONNX due to dynamic shape handling

When benchmarking pre-exported SAM3 ONNX files (prompt_encoder_mask_decoder_int8.onnx) on CPU/GPU, inference fails with:

Reshape error: Input shape:{1,1,608,256}, requested shape:{1,128,-1,256}

Root cause: --shape-config is explicitly disabled for pre-exported ONNX files (perf.py L1636-1641), so the dynamic dimension resolver uses positional defaults (DYNAMIC_DIM_DEFAULTS[1] = 128 for the num_points axis). SAM3 has inter-dependent dynamic dimensions where num_points must satisfy downstream Reshape constraints (e.g., divisible by certain factors).

Suggestion: Allow --shape-config to override dynamic dims for ONNX files (at least for the input generation step), or add model-type-aware default shapes for known Hub ONNX models like SAM3.

# of treating the path as a HuggingFace repo id (which would try to
# load the .onnx file as a JSON config and crash).
if cli_utils.is_onnx_file_path(model_id):
config_or_configs = generate_build_config(
device=device,
)
else:
config_or_configs = generate_build_config(
@AChenreddy24

Copy link
Copy Markdown
Contributor Author

These SAM 3-specific scripts (sam3_smoke_eval.py, sam3_reference_check.py, sam3_decoder_shapes.json, mask_generation_eval.py) are placed directly under scripts/, which will get harder to navigate as more models are onboarded.

Would it be worth moving them into scripts/models/sam3/? That would set a clean precedent for future model-specific scripts (e.g. scripts/models/florence2/, scripts/models/phi4/) and keep the top-level scripts/ uncluttered.

All SAM3-specific scripts have been moved to scripts/models/sam3/, including:

mask_generation_eval.py
sam3_smoke_eval.py
sam3_reference_check.py
sam3_decoder_shapes.json
Plus supporting utilities

@AChenreddy24

Copy link
Copy Markdown
Contributor Author

Issue: winml perf fails on SAM3 ONNX due to dynamic shape handling

When benchmarking pre-exported SAM3 ONNX files (prompt_encoder_mask_decoder_int8.onnx) on CPU/GPU, inference fails with:

Reshape error: Input shape:{1,1,608,256}, requested shape:{1,128,-1,256}

Root cause: --shape-config is explicitly disabled for pre-exported ONNX files (perf.py L1636-1641), so the dynamic dimension resolver uses positional defaults (DYNAMIC_DIM_DEFAULTS[1] = 128 for the num_points axis). SAM3 has inter-dependent dynamic dimensions where num_points must satisfy downstream Reshape constraints (e.g., divisible by certain factors).

Suggestion: Allow --shape-config to override dynamic dims for ONNX files (at least for the input generation step), or add model-type-aware default shapes for known Hub ONNX models like SAM3.

--shape-config is now allowed for ONNX files and is applied during dummy input generation, which lets SAM3 ONNX benchmarks override dynamic dims that the positional defaults can’t satisfy.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: SAM3 ONNX Hub Support (Issue #324)

This re-review checks whether previously-raised issues were addressed in code, and surfaces additional findings. All 6 previously-raised issues remain unresolved in code (author replied in comments but made no corresponding commits). New issues are also noted below.

Issues not fixed since initial review

  • hub_models.json: supported_eps: {} is still empty for both SAM3 entries (catalog EP filtering broken)
  • hub_models.json: model_id still uses full ONNX path instead of org/repo schema
  • onnx_hub.py: hint URL is still hardcoded to /tree/main regardless of revision
  • eval.py: Hub download failures in _resolve_model_path are still unhandled
  • mask_generation_evaluator.py: bare dict key access on hardcoded embedding names is still present
  • mask_generation_evaluator.py: text prompt_mode still silently fails per-sample

New findings

  • perf.py: Same unhandled exception pattern as eval.py for normalize_model_arg
  • build.py(_build_onnx_pipeline): reads config.skip_optimize but never calls ensure_pre_quantized_stamped, allowing double-quantization when -c config.json with quant != None is passed with a pre-quantized ONNX model

},
{
"model_id": "onnx-community/sam3-tracker-ONNX/onnx/vision_encoder_int8.onnx",
"task": "image-feature-extraction",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug — not fixed] model_id is set to the full ONNX file path (org/repo/path/file.onnx) rather than the org/repo form that every other entry in this file uses. This breaks api.model_info(model_id) calls and catalog display. The ONNX-specific path should be stored in a separate onnx_path field (matching build/hf.py which reads model_info.get('onnx_path')), and model_id should be onnx-community/sam3-tracker-ONNX.

"task": "image-feature-extraction",
"model_type": "sam3_tracker",
"supported_eps": {},
"size_mb": 504.2

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug — not fixed] supported_eps is an empty object {}. Catalog lookups use {normalize_ep_name(k) for k in (m.get('supported_eps') or {})}, so an empty dict makes this entry invisible to all EP-based filters (wmk list --ep cpu, etc.). Every other model entry in this file has populated supported_eps. Minimum: {"cpu": ["CPU"]}.

"task": "mask-generation",
"model_type": "sam3_tracker",
"supported_eps": {},
"size_mb": 9.6

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug — not fixed] Same issue: supported_eps: {} for the second SAM3 entry. Both entries are invisible to EP catalog filters.

return (
f"Could not list available .onnx files in '{repo_id}' "
f"(see https://huggingface.co/{repo_id}/tree/main)."
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Warning — not fixed] The error hint URL is hardcoded to /tree/main regardless of the revision parameter. If a user pins a specific revision (e.g., onnx-community/sam3-tracker-ONNX@abc1234/onnx/file.onnx), the hint points to main which may not contain the file. Fix:

f"(see https://huggingface.co/{repo_id}/tree/{revision or 'main'})."

Same issue applies to the fallback hint two lines below.

param_hint="-m/--model",
)
path = cli_utils.normalize_model_arg(path) or path
if not Path(path).exists():

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Warning — not fixed] cli_utils.normalize_model_arg() can raise FileNotFoundError, RepositoryNotFoundError, or OSError when Hub download fails, but there is no try/except wrapping these calls in _resolve_model_path. The user will see a raw Python traceback. Contrast with build.py line 587 which wraps the same call inside a try/except Exception with actionable error messages. Add a try/except (FileNotFoundError, OSError) as e: raise click.ClickException(str(e)) guard around both normalize_model_arg calls here.

# Hub-hosted ONNX (e.g. ``onnx-community/sam3-tracker-ONNX/onnx/...``)
# is downloaded once and treated as a local .onnx path thereafter.
value = cli_utils.normalize_model_arg(value) or value
if not Path(value).exists():

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Warning — not fixed] Second normalize_model_arg call (for multi-model eval paths) is also missing exception handling. Same fix needed as the call above.

# ``normalize_model_arg`` returns ``str | None`` per its signature;
# the ``or model`` keeps the narrowed ``str`` type for downstream use.
hf_model: str = cli_utils.normalize_model_arg(model) or model
model = hf_model

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Warning — new] normalize_model_arg at this line has no surrounding try/except. If the Hub download fails (network error, repo not found, gated repo), the exception propagates as an unhandled traceback. The perf command should wrap this call (and/or the entire body of the perf() function) with appropriate exception handling, as build() does.

"input_labels": labels,
"input_boxes": box,
"image_embeddings.0": emb["image_embeddings.0"],
"image_embeddings.1": emb["image_embeddings.1"],

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug — not fixed] Bare dict key access on hardcoded SAM3 output names. If the encoder produces differently-named outputs (e.g., a future version or a SAM3 variant that uses image_embedding_0 instead of image_embeddings.0), this raises a raw KeyError with no context. Use .get() with a clear error, or validate encoder output names against these keys at session creation time:

for key in ("image_embeddings.0", "image_embeddings.1", "image_embeddings.2"):
    if key not in emb:
        raise ValueError(f"Encoder output '{key}' not found. Got: {list(emb.keys())}")

self.config = config
mapping = config.dataset.columns_mapping or {}
self._prompt_mode: str = mapping.get("prompt_mode", "bbox")
self._enc_sess, self._dec_sess = self._load_sessions()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Info — not fixed] prompt_mode='text' is accepted here and stored, but _build_decoder_inputs raises ValueError for any mode that is not 'bbox' or 'point'. This means text mode will silently fail per-sample (the except Exception in compute() swallows the error), yielding mIoU=0.0, num_samples=0 with no top-level error message. Either reject unsupported modes in __init__ with a clear ValueError, or document that only 'bbox' and 'point' are supported.

# (here and inside ``build_onnx_model``) read the flag instead of
# re-running ``is_quantized_onnx`` on the same file.
is_pre_quantized = config.skip_optimize

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug — new] _build_onnx_pipeline reads config.skip_optimize to detect pre-quantized models but never calls ensure_pre_quantized_stamped() (which is defined in build/common.py and also clears config.quant). If a user runs wmk build -m pre_quantized_int8.onnx -c config.json and config.json has quant != null, the model will be double-quantized: the optimize stage runs on a pre-quantized model (since config.skip_optimize defaults to False), then the quantize stage also runs (since config.quant was never cleared). Fix: call ensure_pre_quantized_stamped(current_path, config) immediately after copying the model to output_dir (around line 1511), before either the optimize or quantize stages.

SAM3_PROFILE = MaskGenProfile(
name="sam3",
encoder_ref="onnx-community/sam3-tracker-ONNX/onnx/vision_encoder_int8.onnx",
decoder_ref="onnx-community/sam3-tracker-ONNX/onnx/prompt_encoder_mask_decoder_int8.onnx",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug — DML precision failure, tested] The int8 preset is hardcoded for both CPU and DML (--ep dml), but DML does not produce correct masks for this model. Tested on Qualcomm Adreno X1-85 (Snapdragon X Elite):

EP Model mIoU Dice enc s/img
CPU vision_encoder_int8.onnx 0.9175 0.9559 ~96s
DML vision_encoder_int8.onnx 0.0000 0.0000 ~50s
DML vision_encoder_fp16.onnx 0.0000 0.0000 7.1s

The int8 failure is because DmlExecutionProvider does not support QLinearConv/ConvInteger operators — they silently produce wrong outputs instead of erroring. The fp16 failure is a separate numerical precision issue (decoder pred_masks are all-negative after DML inference, yielding empty masks after sigmoid + threshold), despite pIoU≈0.84 internally.

Fix options:

  1. Switch the default DML preset to fp32 (vision_encoder.onnx) — DML supports fp32 natively, and the fp16 encoder is 13× faster than CPU already shows DML GPU scheduling works.
  2. Add a DML-specific profile or a runtime warning when --ep dml is used with the sam3 preset.

For now, the sam3 preset should document that only --ep cpu is validated, or the script should warn when DML is selected with an int8/fp16 model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants