Skip to content

Dev/update readme pip install#766

Open
wenxie-amd wants to merge 12 commits into
mainfrom
dev/update-readme-pip-install
Open

Dev/update readme pip install#766
wenxie-amd wants to merge 12 commits into
mainfrom
dev/update-readme-pip-install

Conversation

@wenxie-amd

Copy link
Copy Markdown
Collaborator

No description provided.

amd-fuyuajin and others added 3 commits June 12, 2026 20:48
Reorganize the Setup & Deployment section around the AMD published
training Docker images and split the walkthrough into two options.

- Quick Start Option 1: git clone the repo and run training in a
  container (now recommended), with release-branch checkout and
  submodule init steps
- Quick Start Option 2: install the Primus wheel into a venv and run
  training in a container (or directly on a host)
- add the MaxText (rocm/jax-training) image alongside rocm/primus
- remove the older git+pip / deps-sync install walkthrough
Provide documentation and tooling to reproduce the training Docker
image's software stack directly on a host machine, for users who
cannot run containers.

- docs/install-on-host.md: step-by-step host install derived from the
  training Dockerfile (ROCm via pip rocm-sdk-devel so no system ROCm or
  sudo, source-built kernel libs, optional multi-node UCX/OpenMPI/
  rocSHMEM, verification, and operational notes)
- tools/installation/{setup.sh,env.sh,README.md}: staged, re-runnable
  venv installer with automatic GPU arch detection (gfx942/gfx950),
  fail-fast error handling, and host-specific fixes (mamba via pip,
  apache-tvm-ffi/z3-solver pins, Megatron helpers_cpp build)
Copilot AI review requested due to automatic review settings June 12, 2026 23:47

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a no-docker/no-sudo host installation path for reproducing the Primus training environment via a Python venv, and updates top-level setup guidance to emphasize running training in AMD-published Docker images (with an alternative wheel/CLI path).

Changes:

  • Added staged installer scripts under tools/installation/ to build a full ROCm + PyTorch + kernel-lib stack inside a venv.
  • Added a WIP guide for installing the full training stack on bare metal (no Docker), including references to the new scripts.
  • Updated the root README.md to focus on Docker-based training workflows and add a wheel/CLI container-launch option.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tools/installation/setup.sh New staged installer script to reproduce the training environment in a venv.
tools/installation/env.sh New activation script exporting ROCm/toolchain/runtime environment variables and auto-detecting GPU arch.
tools/installation/README.md New README explaining how to use the venv installer scripts and what’s skipped.
README.md Updated setup/deployment guidance and quickstart flows (Docker + CLI, wheel install).
docs/install-on-host.md New WIP guide for host installation; references and partially duplicates scripted flow.

Comment thread tools/installation/setup.sh Outdated
Comment thread tools/installation/setup.sh Outdated
Comment thread tools/installation/setup.sh Outdated
Comment thread docs/install-on-host.md
Comment thread docs/install-on-host.md Outdated
Comment thread docs/install-on-host.md
Comment on lines +28 to +29
- `**setup.sh**` — runs the install in re-runnable **stages**. It sources
`env.sh` automatically.
Comment thread docs/install-on-host.md Outdated
Comment thread README.md Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 16, 2026 15:27

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 16, 2026 15:28

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 16, 2026 15:29

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 16, 2026 15:29

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 16, 2026 15:30

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 16, 2026 15:31

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 16, 2026 15:36

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Copilot AI review requested due to automatic review settings June 16, 2026 17:16

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

@amd-fuyuajin

Copy link
Copy Markdown
Collaborator

Add one testing point here. I tested the setup.sh script on MI355X. Setting up the whole training environment took quite a few hours, because several packages were built from source. This info has been specified in the instruction.
After installing the packages on the host, I tested llama3.1-8B model and Qwen3.5-35B-A3B model. Both could run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants