meta-nvidia: bump NVIDIA stack to 595.58.03#57
Conversation
Tested on RTX PRO 6000 Blackwell Server Edition (10de:2bb5) with TDX CC mode ON: nvidia-smi reports driver 595.58.03 + CUDA 13.2, GPU initializes cleanly without RmInitAdapter errors seen on 570.x.
There was a problem hiding this comment.
Pull request overview
Bumps the Yocto/OE NVIDIA driver stack in meta-nvidia to version 595.58.03 and updates the distro configuration in meta-dstack to prefer the new NVIDIA recipe versions.
Changes:
- Add
nvidia_595.58.03.bbrecipe for the driver stack. - Add
nvidia-fabricmanager_595.58.03.bbandlibnvidia-nscq_595.58.03.bbrecipes for matching userspace components. - Update
meta-dstack/conf/distro/dstack.confto setNVIDIA_VERSION = "595.58.03".
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| meta-nvidia/recipes-graphics/nvidia/nvidia_595.58.03.bb | Adds recipe for NVIDIA 595.58.03 driver stack (kernel module + libs via includes). |
| meta-nvidia/recipes-graphics/nvidia/nvidia-fabricmanager_595.58.03.bb | Adds Fabric Manager 595.58.03 archive-based install + systemd unit packaging. |
| meta-nvidia/recipes-graphics/nvidia/libnvidia-nscq_595.58.03.bb | Adds NSCQ 595.58.03 archive-based library packaging. |
| meta-dstack/conf/distro/dstack.conf | Bumps NVIDIA_VERSION and preferred versions to 595.58.03. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| do_install() { | ||
| install -d ${D}${libdir} | ||
|
|
||
| install -m 0755 ${S}/lib/libnvidia-nscq.so.${PV} ${D}${libdir} |
There was a problem hiding this comment.
The shared library is installed with mode 0755, which marks it executable. In this layer other shared libraries (e.g. Fabric Manager's libnvfm.so.1) are installed as 0644; consider using 0644 here as well to keep permissions consistent and avoid QA surprises.
| install -m 0755 ${S}/lib/libnvidia-nscq.so.${PV} ${D}${libdir} | |
| install -m 0644 ${S}/lib/libnvidia-nscq.so.${PV} ${D}${libdir} |
| SRC_URI = "https://developer.download.nvidia.com/compute/nvidia-driver/redist/fabricmanager/linux-x86_64/fabricmanager-linux-x86_64-${PV}-archive.tar.xz" | ||
| SRC_URI[md5sum] = "8761b4b5f371b7e87b2b82141c901311" | ||
| SRC_URI[sha256sum] = "ea86951987ced2c2bbed1a9a9037e917dd2268add0ec484db9216e843d8b7de8" | ||
|
|
||
| S = "${WORKDIR}/fabricmanager-linux-x86_64-${PV}-archive" | ||
|
|
There was a problem hiding this comment.
This recipe hard-codes the x86_64 Fabric Manager archive in both SRC_URI and S, but it doesn't restrict the recipe to x86_64 targets. On non-x86_64 builds this will fetch/install the wrong binaries into the target rootfs. Add a COMPATIBLE_HOST (e.g. x86_64.*-linux) or parameterize the URL/S by TARGET_ARCH and provide per-arch sources.
| SRC_URI[md5sum] = "5b559a614f2c2c79b6eb796b68b6fb0f" | ||
| SRC_URI[sha256sum] = "1c008787111fc55a7fc1d3521a6d41c2523b8949354e5651eadb4498554e5469" | ||
|
|
||
| S = "${WORKDIR}/libnvidia_nscq-linux-x86_64-${PV}-archive" |
There was a problem hiding this comment.
This recipe hard-codes the linux-x86_64 NSCQ archive and installs an x86_64 shared library, but it doesn't restrict the recipe to x86_64 targets. If someone builds for a different TARGET_ARCH, the resulting package will contain incompatible binaries. Add a COMPATIBLE_HOST (e.g. x86_64.*-linux) or make SRC_URI/S arch-aware.
| S = "${WORKDIR}/libnvidia_nscq-linux-x86_64-${PV}-archive" | |
| S = "${WORKDIR}/libnvidia_nscq-linux-x86_64-${PV}-archive" | |
| COMPATIBLE_HOST = "x86_64.*-linux" |
|
Thanks @copilot-pull-request-reviewer. All three comments are fair in isolation, but the patterns (0755 on libnvidia-nscq.so, hard-coded x86_64 in SRC_URI/S without COMPATIBLE_HOST) are copied verbatim from the existing 570.x/580.x recipes already on main. Changing them only in 595 would leave the tree inconsistent. Since the whole dstack-nvidia stack is x86_64-only by design (TDX is x86_64-only), and this PR is a straight version bump rather than a refactor, I'd prefer to address these across all nvidia recipes (570/580/595) in a follow-up cleanup PR if we want to tighten them. Merging this one as-is. |
Summary
NVIDIA_VERSIONinmeta-dstack/conf/distro/dstack.confTest plan
Built dstack-nvidia-0.5.9 image on tdxlab and deployed on rtx6000 (RTX PRO 6000 Blackwell SE
10de:2bb5, TDX, GPU passthrough15:00.0, CC mode ON).Inside guest:
nvidia-smi: NVIDIA-SMI 595.58.03, Driver 595.58.03, CUDA 13.2 — GPU initializes cleanly (noRmInitAdapter failedlike 570.x)nvidia-smi conf-compute -f:CC status: ONNotes
nvidia-smi conf-compute -garsflag was renamed in 595 (returns "Option not recognized"); update any tooling that depended on it.