Summary
winml perf --module is handled by a standalone _perf_modules() function in src/winml/modelkit/commands/perf.py, separate from the PerfBenchmark class used for single-model and composite runs. The two paths duplicate build + benchmark orchestration (input generation, session compile, monitored/simple loops, HW monitor wiring, result collection) and have already drifted (e.g. device/EP resolution lives in different places).
Motivation
While fixing #931 (perf with no --ep should target one concrete EP, not aggregate all), device/EP resolution was moved inside PerfBenchmark (it resolves config.device/config.ep at the start of _load_model, failing fast before the build). The --module path still resolves the EP in the CLI perf() function because _perf_modules is not part of PerfBenchmark. This leaves a transitional duplication:
PerfBenchmark._resolve_device_ep() — single-model / composite
- inline
resolve_device/resolve_eps block in perf()'s module branch — per-module
Proposal
Fold per-module benchmarking into PerfBenchmark so there is a single orchestration path:
- A per-module mode on
PerfBenchmark (or a small subclass) that builds each submodule ONNX and benchmarks it through the same input-gen / compile / loop / collect machinery used for single models.
- Remove the standalone
_perf_modules() and the duplicated device/EP resolution in the CLI module branch — the CLI just constructs the benchmark and runs it.
- Reuse
_run_monitored_loop / _run_simple_loop and result collection instead of the parallel implementations currently in _perf_modules.
Acceptance
winml perf --module <Class> produces the same per-instance table + JSON report.
- Device/EP resolution happens in exactly one place (
PerfBenchmark).
- Existing
tests/unit/commands/test_perf_module.py passes (adjusted only for the new call path).
References
Summary
winml perf --moduleis handled by a standalone_perf_modules()function insrc/winml/modelkit/commands/perf.py, separate from thePerfBenchmarkclass used for single-model and composite runs. The two paths duplicate build + benchmark orchestration (input generation, session compile, monitored/simple loops, HW monitor wiring, result collection) and have already drifted (e.g. device/EP resolution lives in different places).Motivation
While fixing #931 (perf with no
--epshould target one concrete EP, not aggregate all), device/EP resolution was moved insidePerfBenchmark(it resolvesconfig.device/config.epat the start of_load_model, failing fast before the build). The--modulepath still resolves the EP in the CLIperf()function because_perf_modulesis not part ofPerfBenchmark. This leaves a transitional duplication:PerfBenchmark._resolve_device_ep()— single-model / compositeresolve_device/resolve_epsblock inperf()'s module branch — per-moduleProposal
Fold per-module benchmarking into
PerfBenchmarkso there is a single orchestration path:PerfBenchmark(or a small subclass) that builds each submodule ONNX and benchmarks it through the same input-gen / compile / loop / collect machinery used for single models._perf_modules()and the duplicated device/EP resolution in the CLI module branch — the CLI just constructs the benchmark and runs it._run_monitored_loop/_run_simple_loopand result collection instead of the parallel implementations currently in_perf_modules.Acceptance
winml perf --module <Class>produces the same per-instance table + JSON report.PerfBenchmark).tests/unit/commands/test_perf_module.pypasses (adjusted only for the new call path).References
perf()'s--modulebranch insrc/winml/modelkit/commands/perf.pypoints here.