Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 3 additions & 5 deletions architecture/ai-pipeline.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -59,13 +59,11 @@ The routing is opinionated, not magic: clear rules in `~/CLAUDE.md` and `AGENTS.

## Local AI gateway (Bifrost)

Bifrost is the OpenAI-compatible HTTP gateway that sits between every AI tool on the workstation and whichever provider eventually answers the call. It exposes `http://localhost:30080/v1/chat/completions` and fans out to OpenAI, Gemini, OpenRouter, and the local MLX server based on the task class.
[Bifrost](/tools/bifrost) is the OpenAI-compatible HTTP gateway that sits between every AI tool on the workstation and whichever provider eventually answers the call. It exposes `http://localhost:30080/v1/chat/completions` and fans out to OpenAI, Gemini, OpenRouter, and the local MLX server based on the task class.

- **Never hardcode model identifiers in committed config.** Models change weekly; identifiers rot. Tools resolve task classes (Research, Coding, Review, Pre-commit) to a current model at call time via `listmodels`.
- **Local MLX models carry an `mlx-local/` prefix** when called through Bifrost (it expects `provider/model` format). Calling the vllm-mlx server directly on port 11434 uses the bare HuggingFace model ID β€” no prefix.
- **Cloud models go in unprefixed.** Bifrost handles routing; do not add a provider prefix yourself.
Tools resolve task classes (Research, Coding, Review, Pre-commit) to a current model at call time β€” never hardcode model identifiers in committed config. When `localOnlyMode` is enabled, every request routes exclusively to the local MLX inference server on port 11434.

When `localOnlyMode` is enabled (or the `--local` flag is passed), every task routes to the MLX inference server on port 11434 and no cloud API calls happen. Verify the LaunchAgent is running before invoking: `launchctl list | grep vllm-mlx`.
See [Bifrost](/tools/bifrost) for routing conventions, local-only mode details, and provider capabilities.

### PAL MCP

Expand Down
3 changes: 2 additions & 1 deletion docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,8 @@
"pages": [
"tools/overview",
"tools/mlx-benchmarks",
"tools/automation"
"tools/automation",
"tools/bifrost"
]
},
{
Expand Down
68 changes: 68 additions & 0 deletions tools/bifrost.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
title: "Bifrost AI gateway"
description: "OpenAI-compatible HTTP gateway that routes AI requests from local tools to the right provider β€” OpenAI, Gemini, OpenRouter, or local MLX inference."
tier: 2
---

> One endpoint for every AI tool. Bifrost handles the routing.

Bifrost is the OpenAI-compatible HTTP gateway that sits between every AI tool on the workstation and whichever provider eventually answers the call. It exposes `http://localhost:30080/v1/chat/completions` and fans out to OpenAI, Gemini, OpenRouter, and the local MLX server based on the task class.

- **GitHub:** https://github.com/maximhq/bifrost
- **Homepage:** https://www.getmaxim.ai/bifrost

## Model routing conventions

Never hardcode model identifiers in committed config. Models change frequently; identifiers rot. Tools resolve task classes (Research, Coding, Review, Pre-commit) to a current model at call time via `listmodels`.

| Context | Format |
| --- | --- |
| Local MLX models through Bifrost | `mlx-local/<model>` β€” Bifrost expects `provider/model` |
| Direct vllm-mlx on port 11434 | bare HuggingFace model ID β€” no prefix |
| Cloud models through Bifrost | unprefixed β€” Bifrost routes by task class |

## Local-only mode

When `localOnlyMode` is enabled or the `--local` flag is passed, every request routes to the MLX inference server on port 11434. No cloud API calls occur.

Verify the LaunchAgent is running before enabling local-only mode:

```bash
launchctl list | grep vllm-mlx
```

## Priority in the AI gateway stack

Bifrost is the second layer in the gateway priority order:

1. **Anthropic official** β€” Claude Code plugins, skills, patterns
2. **Bifrost AI gateway** β€” multi-provider routing at `localhost:30080`
3. **PAL MCP** β€” only for `clink` (parallel calls) and `consensus` (multi-model agreement)
4. **Personal or custom** β€” only when no alternative exists

## Capabilities

Bifrost supports 23+ providers including OpenAI, Anthropic, AWS Bedrock, Google Vertex, and local inference servers. Key features:

- **Intelligent failover** β€” transparent routing to a configured fallback when a provider is unavailable
- **Semantic caching** β€” caches responses by semantic similarity, reducing cost and latency
- **MCP support** β€” Model Context Protocol integration for multi-tool coordination
- **Prometheus metrics** β€” built-in observability for latency, throughput, and cost tracking

Performance at scale: <100 Β΅s gateway overhead at 5,000 RPS.

## Deployment

Bifrost runs locally as a lightweight gateway process. Options:

```bash
npx bifrost@latest # 30-second startup via NPX
```

Docker containers and a Go SDK are also available for embedded or orchestrated deployments.

## See also

- [AI development pipeline](/architecture/ai-pipeline) β€” how Bifrost fits into the full model-routing pipeline
- [PAL MCP](/architecture/ai-pipeline#pal-mcp) β€” the multi-model coordination layer on top of Bifrost
- [nix-ai](/nix/nix-ai) β€” Nix package and config layer that manages the Bifrost process
Loading