You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
createContainer hook disable-device-node-modification fails: /proc/driver/nvidia/params not available in containerRootFs (gvisor#13034 follow-up) #13283
After #13034, three of four NVIDIA CDI createContainer hooks emitted by k8s-device-plugin (DEVICE_LIST_STRATEGY=cdi-annotations) now run successfully inside the gofer (create-symlinks, enable-cuda-compat, update-ldcache). The fourth — disable-device-node-modification — still fails because /proc/driver/nvidia/params does not exist inside the prepared containerRootFs at hook time:
nvidia-ctk hook disable-device-node-modification
stderr: failed to mount modified params file: open o_path procfd:
open /run/containerd/io.containerd.runtime.v2.task/k8s.io/<id>/rootfs/proc/driver/nvidia/params:
no such file or directory
FATAL ERROR: error executing CreateContainer hooks
The hook is part of every spec the NVIDIA k8s-device-plugin generates today (no opt-out yet — NVIDIA/k8s-device-plugin#1245), so any cluster using gVisor + GPU via the plugin currently can't start a gVisor sandbox without patching the spec.
580.159.04 host; nvproxy-driver-version = 580.126.20 (latest in runsc nvproxy list-supported-drivers for 580 series)
k8s-device-plugin
DEVICE_LIST_STRATEGY=cdi-annotations
Sequence
sequenceDiagram
participant Gofer
participant CRoot as containerRootFs
participant Hook as nvidia-cdi-hook
participant Sentry
Note over Gofer,CRoot: After #13034
Gofer->>CRoot: SetupMounts + SetupDev<br/>(libs, /dev/nvidia*, /dev/nvidiactl, ...)
Note over CRoot: /proc is an empty dir<br/>(procfs not mounted)
Gofer->>Hook: create-symlinks (libcuda etc.)
Hook-->>Gofer: success
Gofer->>Hook: enable-cuda-compat --host-driver-version=580.159.04
Hook-->>Gofer: success
Gofer->>Hook: update-ldcache --folder /usr/lib/...
Hook-->>Gofer: success
Gofer->>Hook: disable-device-node-modification
Hook--xCRoot: open o_path procfd:<br/>open /...rootfs/proc/driver/nvidia/params → ENOENT
Note over Gofer: FATAL — sandbox never reaches Sentry
Loading
Live evidence from /tmp/runsc-debug/runsc.<ts>.gofer.log:
hooks.go:63] Executing hook nvidia-ctk hook create-symlinks ...
hooks.go:118] Execute hook success!
hooks.go:63] Executing hook nvidia-ctk hook enable-cuda-compat --host-driver-version=580.159.04
hooks.go:118] Execute hook success!
hooks.go:63] Executing hook nvidia-ctk hook update-ldcache --folder ...
hooks.go:118] Execute hook success!
hooks.go:63] Executing hook nvidia-ctk hook disable-device-node-modification
util.go:107] FATAL ERROR: error executing CreateContainer hooks: failure executing hook "/usr/bin/nvidia-ctk", err: exit status 1
stderr: failed to mount modified params file: open o_path procfd:
open /run/containerd/.../rootfs/proc/driver/nvidia/params: no such file or directory
Re-running with the failing hook removed from the device-plugin spec → sandbox starts cleanly, torch.cuda.is_available() returns True, T4 visible.
Why the hook fails on gVisor
The hook's implementation (NVIDIA Container Toolkit, PR #927) opens <containerRootFs>/proc/driver/nvidia/params, writes a tmpfs copy with ModifyDeviceFiles: 0, and bind-mounts that tmpfs copy back over the original path. Under runc, /proc has already been mounted into the container's mount namespace before createContainer hooks run (per OCI spec), so the kernel-provided /proc/driver/nvidia/params is visible. Under gVisor with #13034, the containerRootFs has device cdevs and library bind-mounts but /proc is left empty until the sentry boots its synthetic procfs later.
The hook's semantic purpose (prevent in-container libnvidia-ml from auto-creating extra /dev/nvidiaN nodes) is moot under gVisor anyway — nvproxy mediates all device access, and the sentry owns /dev. The hook just needs to be able to complete so sandbox setup proceeds.
Proposed fix
In runsc/cmd/sandboxsetup/gofer_mount.go, when nvproxy is enabled, bind-mount the host's /proc/driver/nvidia directory into <containerRootFs>/proc/driver/nvidia before running createContainer hooks (i.e. right alongside the SetupDev step that already exposes /dev/nvidia*). This:
Makes <containerRootFs>/proc/driver/nvidia/params available for the hook's open(o_path).
Lets the hook complete (its overmount lives in the gofer-side view of containerRootFs; the sentry's procfs is unaffected, which is fine — the hook's effect doesn't apply under gVisor regardless).
Has no observable effect when the hook isn't present in the spec.
Sketch:
// In SetupDev (or a sibling), when specutils.NVProxyEnabled(spec, conf):nvidiaProcSrc:="/proc/driver/nvidia"if_, err:=os.Stat(nvidiaProcSrc); err==nil {
nvidiaProcDst:=filepath.Join(root, "proc/driver/nvidia")
iferr:=os.MkdirAll(nvidiaProcDst, 0755); err!=nil {
returnfmt.Errorf("creating nvidia procfs mount target: %w", err)
}
iferr:=specutils.SafeSetupAndMount(nvidiaProcSrc, nvidiaProcDst,
"bind", unix.MS_BIND|unix.MS_RDONLY, procPath); err!=nil {
returnfmt.Errorf("bind-mounting %q to %q: %w", nvidiaProcSrc, nvidiaProcDst, err)
}
}
Happy to send a PR for this — I'll prepare against master with unit-test coverage for the helper.
Workaround today
For anyone hitting this on release-20260520.0+, on each GPU node (re-run when the device-plugin pod restarts):
sudo python3 -c 'import jsonp="/var/run/cdi/k8s.device-plugin.nvidia.com-gpu.json"d=json.load(open(p))d["containerEdits"]["hooks"]=[h for h in d["containerEdits"]["hooks"] if h["args"][2] != "disable-device-node-modification"]json.dump(d,open(p,"w"),indent=2)'
Related
#13034 — fixed the other three hooks; container cgroupRoot setup is now done before hooks
Summary
After #13034, three of four NVIDIA CDI
createContainerhooks emitted byk8s-device-plugin(DEVICE_LIST_STRATEGY=cdi-annotations) now run successfully inside the gofer (create-symlinks,enable-cuda-compat,update-ldcache). The fourth —disable-device-node-modification— still fails because/proc/driver/nvidia/paramsdoes not exist inside the preparedcontainerRootFsat hook time:The hook is part of every spec the NVIDIA
k8s-device-plugingenerates today (no opt-out yet — NVIDIA/k8s-device-plugin#1245), so any cluster using gVisor + GPU via the plugin currently can't start a gVisor sandbox without patching the spec.Environment
runsc --versionrelease-20260520.0(includes #13034)2.2.21.35.21.19.1-1nvproxy-driver-version = 580.126.20(latest inrunsc nvproxy list-supported-driversfor 580 series)DEVICE_LIST_STRATEGY=cdi-annotationsSequence
Live evidence from
/tmp/runsc-debug/runsc.<ts>.gofer.log:Re-running with the failing hook removed from the device-plugin spec → sandbox starts cleanly,
torch.cuda.is_available()returnsTrue, T4 visible.Why the hook fails on gVisor
The hook's implementation (NVIDIA Container Toolkit, PR #927) opens
<containerRootFs>/proc/driver/nvidia/params, writes a tmpfs copy withModifyDeviceFiles: 0, and bind-mounts that tmpfs copy back over the original path. Underrunc,/prochas already been mounted into the container's mount namespace beforecreateContainerhooks run (per OCI spec), so the kernel-provided/proc/driver/nvidia/paramsis visible. Under gVisor with #13034, thecontainerRootFshas device cdevs and library bind-mounts but/procis left empty until the sentry boots its synthetic procfs later.The hook's semantic purpose (prevent in-container
libnvidia-mlfrom auto-creating extra/dev/nvidiaNnodes) is moot under gVisor anyway — nvproxy mediates all device access, and the sentry owns/dev. The hook just needs to be able to complete so sandbox setup proceeds.Proposed fix
In
runsc/cmd/sandboxsetup/gofer_mount.go, when nvproxy is enabled, bind-mount the host's/proc/driver/nvidiadirectory into<containerRootFs>/proc/driver/nvidiabefore runningcreateContainerhooks (i.e. right alongside theSetupDevstep that already exposes/dev/nvidia*). This:<containerRootFs>/proc/driver/nvidia/paramsavailable for the hook'sopen(o_path).containerRootFs; the sentry's procfs is unaffected, which is fine — the hook's effect doesn't apply under gVisor regardless).Sketch:
Happy to send a PR for this — I'll prepare against
masterwith unit-test coverage for the helper.Workaround today
For anyone hitting this on
release-20260520.0+, on each GPU node (re-run when the device-plugin pod restarts):Related
cgroupRootsetup is now done before hooks