Skip to content

Professor-Codephreak/mindXtrain

Repository files navigation

mindxtrain

Production training framework for fine-tuning open-weight LLMs on AMD MI300X and serving them through an OpenAI-compatible API. Single ordered package, canonical layout per docs/blueprints/mindXtrain2.md §Part 4.

The single architectural feature that distinguishes mindxtrain from Axolotl, LLaMA-Factory, Unsloth, torchtune and Primus is its 60-second AOT autotune probe: CK-vs-Triton attention, hipBLASLt heuristic, RCCL config — the plan is fixed at training start, JIT autotune is forbidden in the production loop.

Status: production deployment in progress. The CPU-only base install passes its full pytest suite (ruff + mypy clean); with the training extras installed the suite is 672 green. Many modules ship as real Python on a CPU-only laptop; heavyweight training, eval, and quantization paths gate on opt-in extra dep groups. See docs/actualization_status.md for the per-module map and HANDOFF.md for the operator checklist.

Where this runs

  • Operator + Coach UI: https://mindx.pythai.net/coach
  • Public training-jobs API: https://mindx.pythai.net/v1/training/jobs (bearer auth via MINDXTRAIN_API_KEY)
  • mindX self-training loop: mindX's dream cycle writes JSONL training data; this framework consumes it via the mindx_dreams data source and fine-tunes a small fallback model on a single MI300X.

Prove it trains

mindXtrain doesn't just assert that training works — it proves recall. The dcoach proof loop (/coach/dcoach) imprints a persona onto a tiny model on CPU, then measures whether the model recalls it: the classroom scores recall before vs after training, the boardroom rules success or failure, and the verdict feeds an autotune feedback loop that tunes the next run. A clean CPU run reports a positive imprint Δ (e.g. recall 0.07 → 0.28) and an approved verdict. docs/NAV.md is the full documentation hub.

Quickstart

uv sync                                                    # base install
uv run pytest -q                                           # → 564 passed
uv run mindxtrain --help                                   # 9 verbs
uv run mindxtrain init --template qwen3_8b_sft_lora --out run.yaml
uv run mindxtrain bench --dry-run --out plan.json
uv run uvicorn mindxtrain.operator.app:app --host 0.0.0.0 --port 8080
# open http://localhost:8080/coach/  for the interactive UI

To unlock training / eval / quantize / publish, install the matching dep group:

uv sync --extra ml --extra eval --extra data         # train + eval + curate
# or
uv sync --all-extras                                  # everything except amd-quark

GPU steps (bench without --dry-run, train, quantize, serve) require an AMD MI300X with ROCm 7.2.1; run inside rocm/primus:v26.2. The full operator checklist lives in HANDOFF.md.

Layout

mindxtrain/{cli,config,data,models,train,eval,autotune,
            operator,storage,provenance,deploy,budget}/   # 99 modules
contracts/        Foundry workspace for ERC-8004 attestation registry
ops/              containerfiles, compose, k8s, vmm, gensyn
tests/            pytest suite — 566 tests, CPU-only smoke
examples/         demo YAML configs
docs/             user-facing documentation + frozen blueprints
scripts/          dev helpers

Documentation

Doc What it covers
HANDOFF.md Operator checklist — ordered steps from local setup to live deployment.
docs/quickstart.md Install + base-vs-extras command tour.
docs/architecture.md Canonical layout + 5-layer architecture + MI300X invariants.
docs/actualization_status.md Per-module map of what's real vs. requires extras.
docs/autotune.md The 60-second AOT probe — the architectural differentiator.
docs/coach.md Interactive /coach/ web UI bundled in the operator.
docs/dcoach.md The dcoach proof loop — prove a CPU model recalls its training; decentralized-training fit.
docs/cli.md Every mindxtrain verb with synopsis, options, exit codes.
docs/yaml_schema.md Every field of the 10-section XTrainConfig.
docs/benchmarks.md Target metrics + the 7-cell framework comparison.
docs/development.md Toolchain, optional-deps, lazy-import pattern, invariants.
docs/blueprints/ Source design briefs (frozen specification).

License

Apache-2.0. See LICENSE, NOTICE, and the upstream-license notices in LICENSE-MIT-upstream-glm51 and LICENSE-NOTICE.md. Version history in CHANGELOG.md.

About

Production training framework for AMD MI300X with 60-second AOT autotune. AMD x lablab.ai hackathon, May 4-10 2026 tested on CPU for coach and imprint from training in classroom

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors