MATE (MUSA AI Tensor Engine) is a centralized library for Generative AI workloads on MUSA. It provides high-performance attention and GEMM operators, and compatibility wrappers for CUDA-oriented Python APIs.
- High-performance attention and GEMM operators for MUSA
- Compatibility wrappers for
flash_attn_3,sageattention,flash_mla, anddeep_gemm - CLI tools for environment checks, configuration inspection, and replay
- CLI documentation: docs/mate_cli.md
- FlashAttention-3 compatibility summary: docs/flash_attention.md
- FlashAttention-3 wrapper: wrappers/flash-attention/README.md
- SageAttention wrapper: wrappers/SageAttention/README.md
- FlashMLA wrapper: wrappers/FlashMLA/README.md
- DeepGEMM wrapper: wrappers/DeepGEMM/README.md
| Component | Requirement |
|---|---|
| MUSA Toolkit | 4.3.6 or later |
| TorchMUSA | 2.7 or later |
| Architecture | Pinghu (MP31) |
git clone https://github.com/MooreThreads/mate.git --recursive
cd mate
python -m build --wheel --no-isolationFor local development, install MATE in editable mode:
git clone https://github.com/MooreThreads/mate.git --recursive
cd mate
pip install --no-build-isolation -e . -vIf you forgot --recursive when cloning, initialize submodules before building or
installing:
git submodule update --init --recursivePre-build AOT kernels before packaging the wheel:
git clone https://github.com/MooreThreads/mate.git --recursive
cd mate
MATE_MUSA_ARCH_LIST=3.1 python -m mate.aot
python -m build --wheel --no-isolationCustomize AOT coverage by operator family when needed:
python -m mate.aot --attention-aot-level 0 --add-gemm true --add-moe false| Path | Purpose |
|---|---|
mate/ |
Core Python package and public APIs |
wrappers/ |
Compatibility wrapper packages for existing Python ecosystems |
docs/ |
Markdown docs and Sphinx sources |
tests/ |
Correctness and integration tests |
benchmarks/ |
Performance and benchmarking scripts |
MATE provides a command-line interface for configuration, debugging, diagnostics, and replay.
| Command | Purpose |
|---|---|
mate check |
Validate the runtime environment |
mate show-config |
Display installation and runtime configuration |
mate env |
Show relevant environment variables |
mate replay --dir PATH |
Replay API calls from Level 10 dumps |
mate list-dumps PATH |
List recorded dump directories |
Example:
mate check
mate show-config
mate env
mate replay --dir mate_dumps/
mate list-dumps mate_dumps/See docs/mate_cli.md for full CLI documentation.
MATE uses the packages under wrappers/ as a compatibility layer for CUDA-oriented software stacks on MUSA. These wrappers preserve familiar package names and high-level APIs while routing execution to MATE operators and kernels on MUSA, which helps existing integrations migrate with smaller code changes.
| Wrapper | Package | Import Path | Purpose | Documentation |
|---|---|---|---|---|
wrappers/flash-attention |
flash_attn_3 |
flash_attn_interface |
FlashAttention-3-compatible APIs on top of MATE attention operators on MUSA | wrapper README, compatibility summary |
wrappers/SageAttention |
sageattention |
sageattention |
SageAttention-compatible dense quantized attention wrapper on top of MATE on MUSA | wrapper README |
wrappers/FlashMLA |
flash_mla |
flash_mla |
FlashMLA-compatible MLA dense/sparse decode and sparse prefill APIs on top of MATE MLA operators on MUSA | wrapper README |
wrappers/DeepGEMM |
deep-gemm |
deep_gemm |
DeepGEMM-compatible APIs on top of MATE GEMM operators on MUSA | wrapper README |
After installing mate, build the Sphinx docs with:
pip install sphinx furo
cd docs
make htmlMATE is inspired by FlashInfer, FlashAttention, cutlass, FlashMLA, and DeepGemm.