A transparent proxy that forwards API requests to upstream LLM providers and sends telemetry to Langfuse in the background. Zero latency overhead on the response path.
Supports OpenAI, Anthropic, and Google Gemini APIs natively.
+--> Upstream OpenAI (/v1/*)
Consumer --> Proxy --------+--> Upstream Anthropic (/v1/messages)
| +--> Upstream Gemini (/v1beta/*)
|
v (background, non-blocking)
LangfuseHow it works:
- Consumer sends a standard API request to the proxy
- Proxy forwards it to the appropriate upstream provider
- Upstream response stream is split via
ReadableStream.tee()— one branch goes to the consumer immediately, the other is consumed in the background for telemetry - Langfuse receives a trace with full input/output, model, token usage, TTFB, and total duration
Key features:
- Multi-provider — native support for OpenAI, Anthropic, and Gemini APIs with provider-specific stream parsing and telemetry
- Passthrough auth — consumers send their own API key, proxy forwards it upstream. No user management.
- OpenAI catch-all —
ALL /v1/*forwards any OpenAI-compatible request. Chat completions, embeddings, audio, images, assistants — all work automatically. - Streaming support — SSE streams are split and returned immediately. For OpenAI, the proxy injects
stream_options.include_usageso Langfuse always gets token counts. - Full telemetry — every request is logged to Langfuse with input messages, output content, model, full token usage breakdown, TTFB, and total duration.
- Optional auth gate — set
PROXY_API_KEYto require consumers to authenticate with the proxy itself (timing-safe comparison). - Upstream key override — set
UPSTREAM_API_KEY/ANTHROPIC_API_KEY/GEMINI_API_KEYto use a single key for all upstream requests regardless of what consumers send. - Graceful shutdown — SIGTERM/SIGINT stops accepting connections, waits for in-flight requests, and flushes Langfuse before exiting.
Prerequisites: Bun v1.0+
# Install dependencies
bun install
# Configure environment
cp .env.example .envEdit .env with your settings. At minimum, configure Langfuse credentials to enable telemetry:
LANGFUSE_BASE_URL=https://cloud.langfuse.com
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...Start the server:
# Development (hot reload)
bun dev
# Production
bun startPoint any OpenAI-compatible SDK at the proxy:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:3000/v1",
api_key="sk-your-openai-key", # forwarded to upstream
)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}],
)import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:3000/v1",
apiKey: "sk-your-openai-key",
});
const response = await client.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Hello!" }],
});Use the Anthropic SDK pointed at the proxy:
from anthropic import Anthropic
client = Anthropic(
base_url="http://localhost:3000",
api_key="sk-ant-your-key", # forwarded to upstream
)
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
)import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
baseURL: "http://localhost:3000",
apiKey: "sk-ant-your-key",
});
const message = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
messages: [{ role: "user", content: "Hello!" }],
});Send requests to the /v1beta/* endpoints:
curl "http://localhost:3000/v1beta/models/gemini-2.0-flash:generateContent" \
-H "x-goog-api-key: your-gemini-key" \
-H "Content-Type: application/json" \
-d '{"contents":[{"parts":[{"text":"Hello!"}]}]}'# Non-streaming
curl http://localhost:3000/v1/chat/completions \
-H "Authorization: Bearer sk-your-openai-key" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello!"}]}'
# Streaming
curl http://localhost:3000/v1/chat/completions \
-H "Authorization: Bearer sk-your-openai-key" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o-mini","stream":true,"messages":[{"role":"user","content":"Hello!"}]}'curl http://localhost:3000/api/health| Endpoint | Description |
|---|---|
ALL /v1/messages |
Anthropic pass-through — forwards to Anthropic API |
ALL /v1beta/* |
Gemini pass-through — forwards to Gemini API |
ALL /v1/* |
OpenAI catch-all — forwards any request to upstream provider |
GET /api/health |
Health check — returns app version and per-provider reachability |
Routes are matched in order:
/v1/messagesis matched before the/v1/*catch-all, so Anthropic requests are routed correctly.
The health endpoint returns per-provider status:
{
"name": "langfuse-proxy",
"version": "0.0.0",
"status": "ok",
"upstream": {
"openai": "ok",
"anthropic": "ok",
"gemini": "not_configured"
}
}statusis"degraded"if OpenAI is unreachable or any configured provider has errors- Anthropic and Gemini show
"not_configured"if their API key is not set
Every proxied request creates a Langfuse trace with:
- Trace: request path, input messages, output content, HTTP metadata
- Generation: model name, full input/output, token usage with detailed breakdowns, timing
The usageDetails field includes the full OpenAI token breakdown:
| Field | Description |
|---|---|
input |
Non-cached prompt tokens |
input_cached_tokens |
Prompt tokens served from OpenAI's cache |
input_audio_tokens |
Audio input tokens |
output |
Completion tokens |
output_reasoning_tokens |
Reasoning/chain-of-thought tokens (o1, etc.) |
output_audio_tokens |
Audio output tokens |
Anthropic and Gemini providers report their native token usage in the same format.
Timing metadata on each generation:
| Field | Description |
|---|---|
startTime |
When the proxy received the request |
completionStartTime |
When the first byte was received from upstream (TTFB) |
endTime |
When the full response was consumed |
Set TELEMETRY_MAX_BODY_BYTES to limit how much response data is buffered for telemetry (default 1MB). The consumer always gets the full response regardless of this limit.
Leave LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY empty to disable telemetry entirely.
| Variable | Description | Default |
|---|---|---|
NODE_ENV |
Environment mode | development |
PORT |
Server port | 3000 |
LOG_LEVEL |
Pino log level (debug, info, warn, error, silent) |
info |
| OpenAI / catch-all | ||
UPSTREAM_BASE_URL |
Upstream LLM provider base URL | https://api.openai.com |
UPSTREAM_API_KEY |
Override consumer's key for upstream (optional) | - |
PROXY_API_KEY |
Gate consumers with this key (optional) | - |
PROXY_TIMEOUT_MS |
Upstream request timeout in ms | 300000 (5 min) |
TELEMETRY_MAX_BODY_BYTES |
Max response body to buffer for telemetry | 1048576 (1MB) |
| Anthropic | ||
ANTHROPIC_BASE_URL |
Anthropic API base URL | https://api.anthropic.com |
ANTHROPIC_API_KEY |
Override consumer's key for Anthropic (optional) | - |
ANTHROPIC_VERSION |
Default anthropic-version header |
2023-06-01 |
| Gemini | ||
GEMINI_BASE_URL |
Gemini API base URL | https://generativelanguage.googleapis.com |
GEMINI_API_KEY |
Override consumer's key for Gemini (optional) | - |
| Langfuse | ||
LANGFUSE_BASE_URL |
Langfuse instance URL | https://cloud.langfuse.com |
LANGFUSE_PUBLIC_KEY |
Langfuse public key (empty = telemetry disabled) | - |
LANGFUSE_SECRET_KEY |
Langfuse secret key (empty = telemetry disabled) | - |
docker build -t langfuse-proxy .
docker run -p 3000:3000 --env-file .env langfuse-proxyThe Dockerfile uses a multi-stage build that compiles the app to a standalone binary (~50MB image).
Two ways to deploy on Coolify:
- Dockerfile build pack: point Coolify at this repo, configure environment variables in the dashboard, and set
/api/healthon port 3000 as the health check. - Prebuilt image (recommended when running alongside other Coolify services like n8n): deploy
ghcr.io/fazer-ai/langfuse-proxy:latestas a Docker Image service on Coolify's predefined network. Other services on the same network reach the proxy via Docker DNS, with no public exposure or TLS needed. See n8n Integration below for the full recipe.
No database or external services are required either way.
A common deployment pairs this proxy with n8n on Coolify so all LLM calls from n8n workflows are transparently logged to Langfuse.
- On the n8n service in Coolify, enable Connect To Predefined Network and restart it.
- Create a new service of type Docker Image using
ghcr.io/fazer-ai/langfuse-proxy:latest. - Configure environment variables using .env.example as a template. At minimum, paste your Langfuse public and secret keys from the Langfuse project settings.
- Set the health check. Type:
CMD, Command:wget -qO- http://localhost:3000/api/health. - Set Network Aliases to
langfuse-proxyso other services on the network resolve the proxy by that hostname in their credentials.
Create new LLM credentials in n8n (OpenAI, Anthropic, or Google Gemini) pointing at the proxy:
- API Key: the same key you would use against the upstream provider directly. The proxy forwards it.
- Base URL:
- OpenAI:
http://langfuse-proxy:3000/v1 - Anthropic:
http://langfuse-proxy:3000 - Gemini:
http://langfuse-proxy:3000/v1beta
- OpenAI:
Editing an existing credential to switch the base URL also works, but is riskier without testing first. Creating a fresh credential and validating end-to-end before swapping nodes over is the safer path.
For multi-tenant setups, send X-User-Id on every request so traces are grouped per client in Langfuse. On the n8n credential, click Add Custom Header:
- Header name:
X-User-Id - Value: a workflow expression resolving to a tenant identifier, e.g.
{{ $('Trigger').item.json.client_id }}
n8n supports dynamic expressions in credential fields, but the referenced node must exist in every workflow that uses the credential, otherwise the expression fails to resolve and requests break. The same applies to X-Session-Id (groups related traces in Langfuse) and X-Request-Id (used as the Langfuse trace ID).
bun install # Install dependencies
bun dev # Start with hot reload
bun test # Run tests with coverage
bun lint # Lint with Biome
bun format # Auto-fix lint and formatting
bun check # Lint + type-check + tests (runs in pre-commit hook)src/
├── api/
│ ├── features/
│ │ ├── anthropic/ # ALL /v1/messages
│ │ │ ├── anthropic.controller.ts Anthropic handler, auth, header forwarding
│ │ │ └── anthropic.stream.ts Anthropic SSE parsing
│ │ ├── gemini/ # ALL /v1beta/*
│ │ │ ├── gemini.controller.ts Gemini handler, API key forwarding
│ │ │ └── gemini.stream.ts Gemini stream parsing
│ │ ├── health/ # GET /api/health
│ │ │ └── health.controller.ts Per-provider reachability checks
│ │ └── proxy/ # ALL /v1/*
│ │ ├── proxy.controller.ts Catch-all handler, auth gate, header forwarding
│ │ ├── proxy.stream.ts Stream consumption, SSE parsing, JSON parsing
│ │ ├── proxy.telemetry.ts Background Langfuse reporting (all providers)
│ │ └── proxy.types.ts TypeScript interfaces
│ └── lib/
│ ├── langfuse.ts Langfuse client singleton + shutdown
│ └── logger.ts Pino logger with pretty-print (dev) / JSON (prod)
├── app.ts Elysia app setup (logging, error handling, routes)
├── config.ts Environment configuration
└── index.ts Entry point, server startup, graceful shutdown
tests/
└── api/features/
├── anthropic/ Anthropic controller and stream parser tests
├── gemini/ Gemini controller and stream parser tests
├── health/ Health endpoint tests
└── proxy/ Proxy controller and stream parser tests