Skip to content

meta-nvidia: bump NVIDIA stack to 595.58.03#57

Merged
kvinwang merged 1 commit intomainfrom
feat/nvidia-595.58.03
Apr 23, 2026
Merged

meta-nvidia: bump NVIDIA stack to 595.58.03#57
kvinwang merged 1 commit intomainfrom
feat/nvidia-595.58.03

Conversation

@kvinwang
Copy link
Copy Markdown
Collaborator

Summary

  • Add 595.58.03 recipes for nvidia, fabricmanager, nscq (mirrors 580.95.05 layout)
  • Bump NVIDIA_VERSION in meta-dstack/conf/distro/dstack.conf

Test plan

Built dstack-nvidia-0.5.9 image on tdxlab and deployed on rtx6000 (RTX PRO 6000 Blackwell SE 10de:2bb5, TDX, GPU passthrough 15:00.0, CC mode ON).

Inside guest:

  • nvidia-smi: NVIDIA-SMI 595.58.03, Driver 595.58.03, CUDA 13.2 — GPU initializes cleanly (no RmInitAdapter failed like 570.x)
  • nvidia-smi conf-compute -f: CC status: ON
  • nvbandwidth (single GPU, CC ON): H→D 4.41 GB/s, D→H 5.31 GB/s (vs 580.95.05: 4.28 / 3.66 GB/s — D→H ~+45%)

Notes

  • nvidia-smi conf-compute -gars flag was renamed in 595 (returns "Option not recognized"); update any tooling that depended on it.

Tested on RTX PRO 6000 Blackwell Server Edition (10de:2bb5) with TDX
CC mode ON: nvidia-smi reports driver 595.58.03 + CUDA 13.2, GPU
initializes cleanly without RmInitAdapter errors seen on 570.x.
Copilot AI review requested due to automatic review settings April 23, 2026 06:47
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Bumps the Yocto/OE NVIDIA driver stack in meta-nvidia to version 595.58.03 and updates the distro configuration in meta-dstack to prefer the new NVIDIA recipe versions.

Changes:

  • Add nvidia_595.58.03.bb recipe for the driver stack.
  • Add nvidia-fabricmanager_595.58.03.bb and libnvidia-nscq_595.58.03.bb recipes for matching userspace components.
  • Update meta-dstack/conf/distro/dstack.conf to set NVIDIA_VERSION = "595.58.03".

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
meta-nvidia/recipes-graphics/nvidia/nvidia_595.58.03.bb Adds recipe for NVIDIA 595.58.03 driver stack (kernel module + libs via includes).
meta-nvidia/recipes-graphics/nvidia/nvidia-fabricmanager_595.58.03.bb Adds Fabric Manager 595.58.03 archive-based install + systemd unit packaging.
meta-nvidia/recipes-graphics/nvidia/libnvidia-nscq_595.58.03.bb Adds NSCQ 595.58.03 archive-based library packaging.
meta-dstack/conf/distro/dstack.conf Bumps NVIDIA_VERSION and preferred versions to 595.58.03.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

do_install() {
install -d ${D}${libdir}

install -m 0755 ${S}/lib/libnvidia-nscq.so.${PV} ${D}${libdir}
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shared library is installed with mode 0755, which marks it executable. In this layer other shared libraries (e.g. Fabric Manager's libnvfm.so.1) are installed as 0644; consider using 0644 here as well to keep permissions consistent and avoid QA surprises.

Suggested change
install -m 0755 ${S}/lib/libnvidia-nscq.so.${PV} ${D}${libdir}
install -m 0644 ${S}/lib/libnvidia-nscq.so.${PV} ${D}${libdir}

Copilot uses AI. Check for mistakes.
Comment on lines +7 to +12
SRC_URI = "https://developer.download.nvidia.com/compute/nvidia-driver/redist/fabricmanager/linux-x86_64/fabricmanager-linux-x86_64-${PV}-archive.tar.xz"
SRC_URI[md5sum] = "8761b4b5f371b7e87b2b82141c901311"
SRC_URI[sha256sum] = "ea86951987ced2c2bbed1a9a9037e917dd2268add0ec484db9216e843d8b7de8"

S = "${WORKDIR}/fabricmanager-linux-x86_64-${PV}-archive"

Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This recipe hard-codes the x86_64 Fabric Manager archive in both SRC_URI and S, but it doesn't restrict the recipe to x86_64 targets. On non-x86_64 builds this will fetch/install the wrong binaries into the target rootfs. Add a COMPATIBLE_HOST (e.g. x86_64.*-linux) or parameterize the URL/S by TARGET_ARCH and provide per-arch sources.

Copilot uses AI. Check for mistakes.
SRC_URI[md5sum] = "5b559a614f2c2c79b6eb796b68b6fb0f"
SRC_URI[sha256sum] = "1c008787111fc55a7fc1d3521a6d41c2523b8949354e5651eadb4498554e5469"

S = "${WORKDIR}/libnvidia_nscq-linux-x86_64-${PV}-archive"
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This recipe hard-codes the linux-x86_64 NSCQ archive and installs an x86_64 shared library, but it doesn't restrict the recipe to x86_64 targets. If someone builds for a different TARGET_ARCH, the resulting package will contain incompatible binaries. Add a COMPATIBLE_HOST (e.g. x86_64.*-linux) or make SRC_URI/S arch-aware.

Suggested change
S = "${WORKDIR}/libnvidia_nscq-linux-x86_64-${PV}-archive"
S = "${WORKDIR}/libnvidia_nscq-linux-x86_64-${PV}-archive"
COMPATIBLE_HOST = "x86_64.*-linux"

Copilot uses AI. Check for mistakes.
@kvinwang
Copy link
Copy Markdown
Collaborator Author

Thanks @copilot-pull-request-reviewer. All three comments are fair in isolation, but the patterns (0755 on libnvidia-nscq.so, hard-coded x86_64 in SRC_URI/S without COMPATIBLE_HOST) are copied verbatim from the existing 570.x/580.x recipes already on main. Changing them only in 595 would leave the tree inconsistent.

Since the whole dstack-nvidia stack is x86_64-only by design (TDX is x86_64-only), and this PR is a straight version bump rather than a refactor, I'd prefer to address these across all nvidia recipes (570/580/595) in a follow-up cleanup PR if we want to tighten them. Merging this one as-is.

Copy link
Copy Markdown

Copilot AI commented Apr 23, 2026

@kvinwang I've opened a new pull request, #58, to work on those changes. Once the pull request is ready, I'll request review from you.

@kvinwang kvinwang merged commit 3fb107b into main Apr 23, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants