Skip to content

[feat] Enable openYuanrong RDMA support#108

Merged
0oshowero0 merged 3 commits into
Ascend:mainfrom
KaisennHu:feat/enable-yr-rdma
May 29, 2026
Merged

[feat] Enable openYuanrong RDMA support#108
0oshowero0 merged 3 commits into
Ascend:mainfrom
KaisennHu:feat/enable-yr-rdma

Conversation

@KaisennHu
Copy link
Copy Markdown
Collaborator

Description

For 910B nodes with an additional RoCE NIC (besides NPU-side RoCE), openYuanrong datasystem supports host RDMA (H2H) transport via UCX. Since TQ routes CPU tensors through KV client and NPU tensors through tensor client by tensor location, H2H RDMA and RH2D can be enabled simultaneously — they are not mutually exclusive.

Previously, enabling RDMA required manually adding --enable_rdma true to worker_args and setting UCX_TLS=rc_x in the environment. This PR introduces dedicated config options for one-click RDMA enablement.

Changes

  1. config.yaml: Added enable_rdma (default false) and ucx_env_vars (default {}). When enable_rdma=true, TQ auto-adds --enable_rdma true to dscli cmd and defaults UCX_TLS=rc_x. ucx_env_vars lets users specify UCX env vars (UCX_TLS, UCX_LOG_FILE, UCX_LOG_LEVEL, UCX_NET_DEVICES, UCX_TCP_CM_ROUTE) with highest priority over parent env.

  2. yuanrong_bootstrap.py: Wired enable_rdma and ucx_env_vars through config → actor → start_datasystem_worker. Env priority: ucx_env_vars > parent env > default UCX_TLS=rc_x.

  3. openyuanrong_datasystem.md: Added RDMA Options section, updated config examples, added manual RDMA startup instructions, and added RDMA FAQ (endpoint timeout, verification, container memlock).

Related Issues

Closes #98

Signed-off-by: Haichuan Hu <kaisennhu@gmail.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

KaisennHu, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds configurable openYuanrong host RDMA support so TransferQueue can start Yuanrong datasystem workers with RDMA enabled and UCX environment overrides.

Changes:

  • Added enable_rdma and ucx_env_vars configuration options.
  • Wired RDMA flags and UCX environment handling into Yuanrong worker bootstrap.
  • Updated openYuanrong documentation with RDMA setup and troubleshooting guidance.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
transfer_queue/config.yaml Adds default RDMA and UCX environment configuration fields.
transfer_queue/storage/bootstrap/yuanrong_bootstrap.py Passes RDMA options through actor startup and applies UCX env precedence for dscli.
docs/storage_backends/openyuanrong_datasystem.md Documents RDMA configuration, manual startup, and troubleshooting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread transfer_queue/config.yaml Outdated
Comment thread docs/storage_backends/openyuanrong_datasystem.md Outdated
# Additional config for yuanrong worker.
# Recommended options for NPU environments:
# --remote_h2d_device_ids Enable RH2D for efficient cross-node data transfer. Specify NPU device IDs (comma-separated).
# --enable_huge_tlb Enable huge page memory to improve performance. Required for >21GB shared memory on 910B.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some useful comments here:

If you want to use RDMA or NPU transport with >20GB shared memory, please enable huge page to accelerate startup and transfer. Before enable_huge_tlb, the following os configurations are required (need root privilege)

# Each huge page is 2MB. For example If you want to allocate 128GB, then allocate 65536
systctl -w vm.nr_hugepages=65536
# This allows the current user to pin enough memory pages so that RDMA/Ascend can work
ulimit -l unlimited

--enable_rdma true \
--arena_per_tenant 1 \
--enable_worker_worker_batch_get true \
--shared_memory_size_mb 8192
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need more explanations on this param. For instance, is this per-node shared memory or per-client? Or even across-node total memory size?

Copy link
Copy Markdown
Collaborator Author

@KaisennHu KaisennHu May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Added. Per-node shared memory size in MB. All clients on the same node share this shared memory space.

Signed-off-by: Haichuan Hu <kaisennhu@gmail.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

KaisennHu, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Signed-off-by: Haichuan Hu <kaisennhu@gmail.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

KaisennHu, thanks for your pull request. All authors of the commits have signed the CLA. 👍

@0oshowero0 0oshowero0 merged commit dc7d203 into Ascend:main May 29, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Performance] enable openYuanrong RDMA support

5 participants