This repo contains runnable examples for:
- NCCL collectives (CUDA + NCCL + MPI)
- DPDK packet processing (EAL + RX/TX loops)
Note: These are Linux-first examples. NCCL examples require NVIDIA GPUs and CUDA. DPDK examples require Linux + hugepages + appropriate NIC access.
Top-level:
mkdir -p build && cd build
cmake ..
cmake --build . -jRequires:
- CUDA toolkit
- NCCL
- MPI (e.g., OpenMPI)
Run (single node):
mpirun -np 2 ./nccl_allreduce_mpiRequires:
- DPDK installed (and pkg-config able to find it)
- hugepages configured
- appropriate permissions
Run:
sudo ./dpdk_rx_burst_stats -l 0-1 -n 4See each example's README for details.