Uniting Models, Algorithms, and System Innovators with Top-Down Evolutionary Benchmarks.
🌐 Website: www.teasbench.com
| Category | Description |
|---|---|
| MoE-Benchmark | Benchmarks for Mixture-of-Experts models |
| TTS-Benchmark | Benchmarks for Test-Time Scaling methods |
| Agentic-Benchmark | Benchmark for Agentic Workflows (Under construction) |
Most of our benchmarking code is developed in the following projects:
| Component | Repository | Description |
|---|---|---|
| MoE-CAP | GitHub | MoE benchmarking framework (~4K LoC, Python) |
| AgentCAP | GitHub | Agentic Workflow benchmarking framework (currently ~3K LoC, Python) |
| ServerlessLLM + Pylet | GitHub | Benchmark platform (~103K LoC, Python + C++) |
Run benchmarks directly using Python scripts without any infrastructure setup.
Note: Direct test scripts are developed and tested on Kubernetes clusters with NVIDIA GPU support. They currently only work on Kubernetes environments.
Prerequisites:
- Python 3.10+
- GPU with sufficient VRAM for the model
Installation:
For SGLang backend:
conda create -n sglang python=3.10 -y
conda activate sglang
pip install sglang==0.5.8 transformers datasetsFor vLLM backend:
conda create -n vllm python=3.10 -y
conda activate vllm
pip install vllm==0.11.0 transformers==4.56.0 datasetsFor Kubernetes-based serverless deployment with auto-scaling, see the detailed guide:
Note: Serverless scripts are developed and tested on Kubernetes clusters with NVIDIA GPU support. They currently only work on Kubernetes environments.
This approach is recommended for:
- Production deployments
- Multi-model serving
- Auto-scaling based on load
- Kubernetes-native infrastructure
- Add new benchmark scripts to the appropriate category folder
- For direct tests, add to
<Category>/direct-test-scripts/ - For serverless tests, add to
<Category>/serverless-scripts/ - Update documentation as needed