Production training framework for fine-tuning open-weight LLMs on AMD MI300X
and serving them through an OpenAI-compatible API. Single ordered package,
canonical layout per docs/blueprints/mindXtrain2.md
§Part 4.
The single architectural feature that distinguishes mindxtrain from Axolotl, LLaMA-Factory, Unsloth, torchtune and Primus is its 60-second AOT autotune probe: CK-vs-Triton attention, hipBLASLt heuristic, RCCL config — the plan is fixed at training start, JIT autotune is forbidden in the production loop.
Status: production deployment in progress. The CPU-only base install passes
its full pytest suite (ruff + mypy clean); with the training extras installed the
suite is 672 green. Many modules ship as real Python on a CPU-only laptop;
heavyweight training, eval, and quantization paths gate on opt-in extra dep
groups. See docs/actualization_status.md for the
per-module map and HANDOFF.md for the operator checklist.
- Operator + Coach UI: https://mindx.pythai.net/coach
- Public training-jobs API:
https://mindx.pythai.net/v1/training/jobs(bearer auth viaMINDXTRAIN_API_KEY) - mindX self-training loop: mindX's dream cycle writes JSONL training
data; this framework consumes it via the
mindx_dreamsdata source and fine-tunes a small fallback model on a single MI300X.
mindXtrain doesn't just assert that training works — it proves recall. The
dcoach proof loop (/coach/dcoach) imprints a persona onto a
tiny model on CPU, then measures whether the model recalls it: the classroom
scores recall before vs after training, the boardroom rules success or failure,
and the verdict feeds an autotune feedback loop that tunes the next run. A clean
CPU run reports a positive imprint Δ (e.g. recall 0.07 → 0.28) and an approved
verdict. docs/NAV.md is the full documentation hub.
uv sync # base install
uv run pytest -q # → 564 passed
uv run mindxtrain --help # 9 verbs
uv run mindxtrain init --template qwen3_8b_sft_lora --out run.yaml
uv run mindxtrain bench --dry-run --out plan.json
uv run uvicorn mindxtrain.operator.app:app --host 0.0.0.0 --port 8080
# open http://localhost:8080/coach/ for the interactive UITo unlock training / eval / quantize / publish, install the matching dep group:
uv sync --extra ml --extra eval --extra data # train + eval + curate
# or
uv sync --all-extras # everything except amd-quarkGPU steps (bench without --dry-run, train, quantize, serve) require
an AMD MI300X with ROCm 7.2.1; run inside rocm/primus:v26.2. The full
operator checklist lives in HANDOFF.md.
mindxtrain/{cli,config,data,models,train,eval,autotune,
operator,storage,provenance,deploy,budget}/ # 99 modules
contracts/ Foundry workspace for ERC-8004 attestation registry
ops/ containerfiles, compose, k8s, vmm, gensyn
tests/ pytest suite — 566 tests, CPU-only smoke
examples/ demo YAML configs
docs/ user-facing documentation + frozen blueprints
scripts/ dev helpers
| Doc | What it covers |
|---|---|
HANDOFF.md |
Operator checklist — ordered steps from local setup to live deployment. |
docs/quickstart.md |
Install + base-vs-extras command tour. |
docs/architecture.md |
Canonical layout + 5-layer architecture + MI300X invariants. |
docs/actualization_status.md |
Per-module map of what's real vs. requires extras. |
docs/autotune.md |
The 60-second AOT probe — the architectural differentiator. |
docs/coach.md |
Interactive /coach/ web UI bundled in the operator. |
docs/dcoach.md |
The dcoach proof loop — prove a CPU model recalls its training; decentralized-training fit. |
docs/cli.md |
Every mindxtrain verb with synopsis, options, exit codes. |
docs/yaml_schema.md |
Every field of the 10-section XTrainConfig. |
docs/benchmarks.md |
Target metrics + the 7-cell framework comparison. |
docs/development.md |
Toolchain, optional-deps, lazy-import pattern, invariants. |
docs/blueprints/ |
Source design briefs (frozen specification). |
Apache-2.0. See LICENSE, NOTICE, and the upstream-license
notices in LICENSE-MIT-upstream-glm51 and
LICENSE-NOTICE.md. Version history in
CHANGELOG.md.