TEAS-Bench

Uniting Models, Algorithms, and System Innovators with Top-Down Evolutionary Benchmarks.

Benchmark Categories

Category	Description
MoE-Benchmark	Benchmarks for Mixture-of-Experts models
TTS-Benchmark	Benchmarks for Test-Time Scaling methods
Agentic-Benchmark	Benchmark for Agentic Workflows (Under construction)

Core Components

Most of our benchmarking code is developed in the following projects:

Component	Repository	Description
MoE-CAP	GitHub	MoE benchmarking framework (~4K LoC, Python)
AgentCAP	GitHub	Agentic Workflow benchmarking framework (currently ~3K LoC, Python)
ServerlessLLM + Pylet	GitHub	Benchmark platform (~103K LoC, Python + C++)

Quick Start

Option 1: Direct Testing

Run benchmarks directly using Python scripts without any infrastructure setup.

Note: Direct test scripts are developed and tested on Kubernetes clusters with NVIDIA GPU support. They currently only work on Kubernetes environments.

Prerequisites:

Python 3.10+
GPU with sufficient VRAM for the model

Installation:

For SGLang backend:

conda create -n sglang python=3.10 -y
conda activate sglang
pip install sglang==0.5.8 transformers datasets

For vLLM backend:

conda create -n vllm python=3.10 -y
conda activate vllm
pip install vllm==0.11.0 transformers==4.56.0 datasets

Option 2: Serverless Deployment

For Kubernetes-based serverless deployment with auto-scaling, see the detailed guide:

Serverless Deployment Guide

Note: Serverless scripts are developed and tested on Kubernetes clusters with NVIDIA GPU support. They currently only work on Kubernetes environments.

This approach is recommended for:

Production deployments
Multi-model serving
Auto-scaling based on load
Kubernetes-native infrastructure

Contributing

Add new benchmark scripts to the appropriate category folder
For direct tests, add to <Category>/direct-test-scripts/
For serverless tests, add to <Category>/serverless-scripts/
Update documentation as needed

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
eidf		eidf
experiments		experiments
pipeline		pipeline
postprocessing		postprocessing
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TEAS-Bench

Benchmark Categories

Core Components

Quick Start

Option 1: Direct Testing

Option 2: Serverless Deployment

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TEAS-Bench

Benchmark Categories

Core Components

Quick Start

Option 1: Direct Testing

Option 2: Serverless Deployment

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages