Add play (inference) benchmark with PlayBundle (benchmark refactor, Part 4/4)#6201
Add play (inference) benchmark with PlayBundle (benchmark refactor, Part 4/4)#6201AntoineRichard wants to merge 15 commits into
Conversation
Introduce the capture, metrics, builders, stepping, profiling, and backend_descriptor submodules for assembling the schema-v1 benchmark bundles, add a schema output backend, and let BaseIsaacLabBenchmark emit several backends in one run via a new attach_bundle hook. Unit tests cover each submodule plus the schema backend and multi-backend finalize. Part 1 of a series splitting the oversized benchmark refactor (core -> runtime/startup -> training -> play).
Add backend-agnostic runtime.py (random-action stepping, emits a RuntimeBundle) and startup.py (cProfile startup-phase profiling, emits a StartupBundle), wired to develop's launch API (launch_simulation and add_launcher_args from isaaclab.app; preset tokens forwarded to Hydra without folding). Remove the legacy benchmark_non_rl.py and benchmark_startup.py scripts plus the run_non_rl_benchmarks.sh and run_physx_benchmarks.sh runner shells; repoint benchmark_hydra_resolve at _common.get_backend_type. Part 2 of the benchmark refactor series (core -> runtime/startup -> training -> play); stacked on Part 1 (isaac-sim#6197).
Add training.py dispatching over --rl_library {rsl_rl, rl_games, skrl,
sb3}; each adapter runs real training under BenchmarkMonitor and emits a
TrainingBundle via the shared core, with an optional success-metric early
stop. Scripts use develop's launch API (launch_simulation from
isaaclab.app; preset tokens forwarded without folding). Remove the legacy
benchmark_rsl_rl.py / benchmark_rlgames.py scripts, the
run_training_benchmarks.sh runner shell, and the obsolete utils.py helper.
Part 3 of the benchmark refactor series (core -> runtime/startup ->
training -> play); stacked on Parts 1-2 (isaac-sim#6197, isaac-sim#6198).
Introduce scripts/benchmarks/play.py, a --rl_library dispatcher mirroring training.py, plus the rsl_rl inference adapter scripts/benchmarks/rsl_rl/bench_play_rsl_rl.py. The adapter resolves a checkpoint via resolve_play_checkpoint, loads the policy the way the rsl_rl play script does, rolls it out under a BenchmarkMonitor using run_play_loop, and emits a PlayBundle.
Roll out a checkpointed skrl policy under a BenchmarkMonitor and emit a PlayBundle. The skrl env wrapper returns reward and done tensors shaped (num_envs, 1); reshape them to (num_envs,) in run_play_loop so the per-environment return accumulator broadcasts correctly across backends.
Roll out a checkpointed Stable-Baselines3 policy under a BenchmarkMonitor and emit a PlayBundle. The sb3 vec env returns NumPy reward/done arrays and a per-environment list of info dicts; coerce reward and dones onto the env device in run_play_loop so CPU NumPy returns do not clash with the on-device accumulators, and skip success extraction when the info value is not a dict.
- rl_games adapter: read obs_groups/concate_obs_groups from the agent cfg and pass them to RlGamesVecEnvWrapper, so tasks with asymmetric/non-default observation layouts feed the policy the same observation it was trained on. - _common: key the published-checkpoint lookup on the bare training-task name (drop any namespace prefix and the -Play suffix), matching the reinforcement_learning play scripts; add a unit-testable _published_task_name helper. - rl_games adapter: drop the inaccurate RNN-state-reset claim from the policy docstring.
53df4a3 to
c1f234f
Compare
Greptile SummaryThis PR is Part 4/4 of a benchmark refactor series, adding a checkpoint-driven play (inference) benchmark with a new
Confidence Score: 3/5The new play benchmark scaffolding (schema, builders, capture, metrics) is clean and well-tested. The single actionable defect is in The core schema types, builder functions, and scripts/benchmarks/_common.py (resolve_play_checkpoint fallback), source/isaaclab/isaaclab/test/benchmark/stepping.py (success_rate aggregation methodology), scripts/benchmarks/skrl/bench_play_skrl.py (unnecessary env.state() call in policy closure) Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant CLI as play.py dispatcher
participant Adapter as bench_play adapter
participant Common as _common.py
participant StepFn as stepping.run_play_loop
participant BM as BenchmarkMonitor
participant Build as builders
participant Out as SchemaBundleFile
CLI->>Adapter: dispatch_library_entrypoint()
Adapter->>Common: resolve_play_checkpoint(checkpoint, framework, task)
alt checkpoint provided by user
Common-->>Adapter: _retrieve_file_path(checkpoint) - local path
else fallback published Nucleus checkpoint
Common-->>Adapter: raw Nucleus URI (missing _retrieve_file_path)
end
Adapter->>BM: enter BenchmarkMonitor context
Adapter->>StepFn: run_play_loop(env, policy, num_frames)
StepFn-->>Adapter: step_times, reward, ep_length, success_rate
BM-->>Adapter: exit + update_manual_recorders
Adapter->>Build: build_play_bundle(run, versions, hardware, runtime, ...)
Build-->>Adapter: PlayBundle
Adapter->>Out: attach_bundle then _finalize_impl writes play.json
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant CLI as play.py dispatcher
participant Adapter as bench_play adapter
participant Common as _common.py
participant StepFn as stepping.run_play_loop
participant BM as BenchmarkMonitor
participant Build as builders
participant Out as SchemaBundleFile
CLI->>Adapter: dispatch_library_entrypoint()
Adapter->>Common: resolve_play_checkpoint(checkpoint, framework, task)
alt checkpoint provided by user
Common-->>Adapter: _retrieve_file_path(checkpoint) - local path
else fallback published Nucleus checkpoint
Common-->>Adapter: raw Nucleus URI (missing _retrieve_file_path)
end
Adapter->>BM: enter BenchmarkMonitor context
Adapter->>StepFn: run_play_loop(env, policy, num_frames)
StepFn-->>Adapter: step_times, reward, ep_length, success_rate
BM-->>Adapter: exit + update_manual_recorders
Adapter->>Build: build_play_bundle(run, versions, hardware, runtime, ...)
Build-->>Adapter: PlayBundle
Adapter->>Out: attach_bundle then _finalize_impl writes play.json
|
| Raises: | ||
| FileNotFoundError: When no *checkpoint* is given and no published Nucleus | ||
| checkpoint exists for *framework*/*task*. | ||
| """ | ||
| if checkpoint: | ||
| return _retrieve_file_path(checkpoint) | ||
|
|
||
| logger.warning( | ||
| "No --checkpoint given; falling back to the published Nucleus checkpoint for %s / %s.", | ||
| framework, | ||
| task, | ||
| ) | ||
| path = _published_checkpoint(framework, task) | ||
| if not path: | ||
| raise FileNotFoundError( | ||
| f"No checkpoint available for framework {framework!r} and task {task!r}: pass --checkpoint" | ||
| " with a local or Nucleus path, or publish a Nucleus checkpoint for this task." | ||
| ) | ||
| return path |
There was a problem hiding this comment.
Missing
_retrieve_file_path call on published checkpoint fallback
When the user omits --checkpoint, resolve_play_checkpoint calls _published_checkpoint and returns its result directly — skipping the _retrieve_file_path download step that is applied to user-supplied paths. get_published_pretrained_checkpoint typically returns a Nucleus URI (omniverse://…), so every downstream adapter (runner.load(resume_path), agent.restore(resume_path), PPO.load(resume_path), etc.) will receive a raw URI string and fail with a file-not-found / invalid-path error. The explicit --checkpoint branch is correctly guarded; the fallback is not.
| path = _published_checkpoint(framework, task) | ||
| if not path: | ||
| raise FileNotFoundError( | ||
| f"No checkpoint available for framework {framework!r} and task {task!r}: pass --checkpoint" | ||
| " with a local or Nucleus path, or publish a Nucleus checkpoint for this task." | ||
| ) | ||
| return path |
There was a problem hiding this comment.
Resolve the published Nucleus URI through
_retrieve_file_path so it is downloaded to a local path before being returned, matching the behaviour for user-supplied paths.
| path = _published_checkpoint(framework, task) | |
| if not path: | |
| raise FileNotFoundError( | |
| f"No checkpoint available for framework {framework!r} and task {task!r}: pass --checkpoint" | |
| " with a local or Nucleus path, or publish a Nucleus checkpoint for this task." | |
| ) | |
| return path | |
| path = _published_checkpoint(framework, task) | |
| if not path: | |
| raise FileNotFoundError( | |
| f"No checkpoint available for framework {framework!r} and task {task!r}: pass --checkpoint" | |
| " with a local or Nucleus path, or publish a Nucleus checkpoint for this task." | |
| ) | |
| return _retrieve_file_path(path) |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Description
Part 4 of 4 of the benchmark refactor series — a checkpoint-driven play (inference) benchmark, the inference counterpart to the training benchmark.
Series: Part 1/4 core (#6197) → Part 2/4 runtime + startup (#6198) → Part 3/4 training (#6199) → Part 4/4 play (this PR).
Loads a trained checkpoint, runs the policy-driven rollout, and emits a new typed
PlayBundlecapturing inference/step performance plus the played policy's reward / episode-length / success.Adds:
PlayBundleschema type (mirrorsRuntimeBundlewithrun.frameworkset, plus typedsuccess_rate/reward/ep_length(MeanStd) /checkpoint_path/video_path; no learning curve). Additive — Odin gains aplay.jsonshape; existing bundles unchanged.build_play_bundle,run_play_loop(policy-driven rollout, aggregates per-episode return/length/success; handles 4- and 5-tuple step signatures + numpy returns), andresolve_play_checkpoint(chain:--checkpoint <path or Nucleus URI>→ else the published Nucleus checkpoint with a warning → else a clear error).scripts/benchmarks/play.pydispatcher over--rl_library {rsl_rl, rl_games, skrl, sb3}+ per-backendbench_play_<backend>.pyadapters (each mirrors itsreinforcement_learning/<backend>/play.pycheckpoint-load + inference policy; develop launch API).benchmarks.rstplay section + arg table) and a 3.0 migration-guide entry.Validated on
develop(Newton/MJWarp): all four backends generate-then-play and emit a validPlayBundle(rsl_rl ≈7.5k inference FPS, rl_games ≈209k @512 envs, skrl ≈5.4k, sb3 ≈4.6k; reward/ep_length populated). Note:reward/ep_length/success_rateaggregate only completed episodes, so--num_framesmust exceed the task's episode length (documented).Fixes # (n/a)
Type of change
Checklist
pre-commitchecks with./isaaclab.sh --formatsource/<pkg>/changelog.d/for every touched packageCONTRIBUTORS.mdor my name already exists there