refactor: fold _perf_modules into PerfBenchmark

## Summary

`winml perf --module` is handled by a standalone `_perf_modules()` function in `src/winml/modelkit/commands/perf.py`, separate from the `PerfBenchmark` class used for single-model and composite runs. The two paths duplicate build + benchmark orchestration (input generation, session compile, monitored/simple loops, HW monitor wiring, result collection) and have already drifted (e.g. device/EP resolution lives in different places).

## Motivation

While fixing #931 (perf with no `--ep` should target one concrete EP, not aggregate all), device/EP resolution was moved **inside** `PerfBenchmark` (it resolves `config.device`/`config.ep` at the start of `_load_model`, failing fast before the build). The `--module` path still resolves the EP in the CLI `perf()` function because `_perf_modules` is not part of `PerfBenchmark`. This leaves a transitional duplication:

- `PerfBenchmark._resolve_device_ep()` — single-model / composite
- inline `resolve_device`/`resolve_eps` block in `perf()`'s module branch — per-module

## Proposal

Fold per-module benchmarking into `PerfBenchmark` so there is a single orchestration path:

- A per-module mode on `PerfBenchmark` (or a small subclass) that builds each submodule ONNX and benchmarks it through the same input-gen / compile / loop / collect machinery used for single models.
- Remove the standalone `_perf_modules()` and the duplicated device/EP resolution in the CLI module branch — the CLI just constructs the benchmark and runs it.
- Reuse `_run_monitored_loop` / `_run_simple_loop` and result collection instead of the parallel implementations currently in `_perf_modules`.

## Acceptance

- `winml perf --module <Class>` produces the same per-instance table + JSON report.
- Device/EP resolution happens in exactly one place (`PerfBenchmark`).
- Existing `tests/unit/commands/test_perf_module.py` passes (adjusted only for the new call path).

## References

- Follow-up to #931.
- Code marker: comment in `perf()`'s `--module` branch in `src/winml/modelkit/commands/perf.py` points here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: fold _perf_modules into PerfBenchmark #939

Summary

Motivation

Proposal

Acceptance

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

refactor: fold _perf_modules into PerfBenchmark #939

Description

Summary

Motivation

Proposal

Acceptance

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions