TokenAltar is a self-contained LLM capacity console for small teams, research groups, private circles, and API-sharing communities.
It pools upstream OpenAI, Anthropic, Gemini, and compatible model capacity behind one local gateway. Members call TokenAltar with local API keys; providers contribute upstream channels; the console keeps routing, quotas, pricing, health, point settlement, and social-economy flows visible from one embedded Vue UI.
TokenAltar runs as a single Rust process with SQLite persistence and embedded frontend assets. There is no separate web server to deploy after the console is built.
TokenAltar turns scattered LLM accounts into a governed shared pool:
- Unified gateway: expose OpenAI Chat Completions, OpenAI Responses, Anthropic Messages, and Gemini Generate Content routes through local
sk-...keys. - Provider channels: let users or admins contribute upstream model channels with model coverage, quota windows, fire-sale policy, and globally governed provider-share settlement.
- Scoped client keys: issue keys with enable/disable, soft deletion, rotation, optional expiration, model allow-lists, spend limits, and explicit channel allow-lists.
- Dynamic routing: choose healthy channels by key scope, model coverage, quota state, cooldown, route weight, fire-sale state, and affinity binding.
- Point economy: settle usage into points, reward providers, support P2P transfers, phrase red packets, and day/month leaderboards.
- Operator console: manage users, API keys, channels, prices, affinity rules, ledger entries, runtime settings, health windows, and the visual Guide page from the same binary.
member account
-> local API key
-> TokenAltar gateway
-> scoped, healthy upstream channel
-> upstream model response
-> usage settlement
-> consumer debit + provider reward + ledger record
The gateway estimates input usage before forwarding a request, reserves local points and channel quota, then settles against upstream usage when the response completes. Failed retry attempts release their reservation before another eligible channel is tried.
| Page | Purpose |
|---|---|
| Dashboard | Live capacity, surge state, available points, enabled channels, and current point spend. |
| Users | Admin-only account management, roles, balances, password resets, and account suspension. |
| API Keys | Client credentials, model fences, spend ceilings, channel allow-lists, rotation, and soft deletion. |
| Channels | Upstream providers, model coverage, quota windows, fire-sale economics, cloning, testing, and batch enable/disable. |
| Health | Passive request-derived health windows, TTFT, empty replies, degraded samples, and down windows. |
| Pricing | Per-1M-token model tariffs with channel overrides before global defaults. |
| Affinity | Admin-only sticky routing rules for tenant/session/cache locality. |
| Economy | Point transfers, phrase red packets, claim history, and anonymous leaderboard preference. |
| Leaderboards | Day/month provider-token and consumer-point rankings. |
| Ledger | Settlement archive with token counts, tokenizer notes, and formula text. |
| Guide | Visual product map plus a user-facing rulebook for access, capacity, routing, settlement, health, and economy. |
| Settings | Admin-only runtime controls for invitation, economy, routing, pricing fallback, and defaults. |
Regular users see owner-scoped channel and pricing views. Admin-only data remains behind backend permission checks; console live updates are topic invalidations, not privileged data snapshots.
Client requests authenticate with Authorization: Bearer sk-....
Supported gateway routes:
| Client route | Intended compatibility |
|---|---|
POST /v1/chat/completions |
OpenAI Chat Completions-style clients |
POST /v1/responses |
OpenAI Responses-style clients |
POST /v1/messages |
Anthropic Messages-style clients |
POST /v1beta/models/{model}:generateContent |
Gemini Generate Content-style clients |
POST /v1beta/models/{model}:streamGenerateContent |
Gemini streaming clients |
Text protocol conversion supports text messages, image inputs, system, temperature, max-token controls, and basic tool/function fields across OpenAI, Anthropic, and Gemini. Same-protocol traffic is passed through unchanged to matching upstream channel types; cross-protocol traffic is converted only when needed.
Files, embeddings, rerank, realtime, audio, and other non-text extensions are intentionally outside the current gateway surface.
Routing starts from the local API key. A request can only use channels allowed by that key, then must pass model coverage, enabled/status checks, quota-window checks, cooldown checks, and affinity policy. Fire-sale channels stay in the same weighted selection pool as regular channels; their route weight is multiplied by the configured fire-sale factor instead of becoming a hard priority lane.
Retryable upstream failures release local reservations and can fall back to another healthy channel before semantic content reaches the client. Retryable cases include connection errors, 408, 429, 5xx, and semantic-empty replies. For streams, heartbeat/comment frames, usage-only metadata, whitespace-only deltas, and terminal markers do not count as semantic content; once real text or tool-call semantics have been forwarded, the gateway does not interrupt or replay that stream.
Built-in affinity presets cover:
- GPT Responses:
^gpt-.*$,/v1/responses,prompt_cache_key - Claude Messages:
^claude-.*$,/v1/messages,metadata.user_id - Gemini Generate Content:
^gemini-.*$, generate/stream routes,cachedContent
These presets use a 3600-second TTL, enable successful fallback switching, and keep cache-local bound traffic pinned on bound-channel failure. They intentionally omit model name from the cache key, matching model-family locality behavior.
Channels can define arbitrary quota windows. Each window has a point limit, period unit/count, anchor timestamp, and IANA timezone. Every configured window is enforced. If any configured window is exhausted, the channel stops routing until the relevant window refreshes.
The first quota window is the primary inventory window used for dashboard availability, route weighting, surge normalization, and fire-sale reset timing. API and Settings payloads use limit_points / used_points for quota windows.
Prices are stored per 1M tokens and split into input, output, and cache-token rates. Price resolution is:
- channel-scoped model pattern
- channel-scoped
default - global model pattern
- global
default - runtime fallback rates from Settings
Settlement applies the selected tariff, surge multiplier, fire-sale discount, admin-configured global provider share, and configured rounding. Ledger entries keep the final point amount and a readable formula note. When an upstream reports cached prompt/input tokens inside usage details, TokenAltar splits those tokens out of normal input usage so cached tokens are charged once at the cache rate rather than double-counted.
Built-in global presets currently cover GPT-5.5, GPT-5.4, GPT-5.3-codex, GPT-5.2, GPT-5.2-codex, Claude Opus 4.7/4.6/4.5/4.1/4, Claude Sonnet 4.6/4.5/4, and Claude Haiku 4.5. The single cache price field represents cached-input/cache-hit pricing.
Channel health is passive by default. Real gateway attempts append health events for available, empty, degraded, and down outcomes. Manual health tests are still available from the console, but there is no scheduled background probing.
The Health page displays 48 fixed 30-minute windows over the most recent 24 hours. TTFT averages include successful non-empty samples only. Failed, degraded, and empty events affect status color/counts but do not contribute to TTFT averages. Windows with no records render as gray.
Console sessions authenticate with Authorization: Bearer ta-....
The live stream is:
GET /api/events
Accept: text/event-stream
Authorization: Bearer ta-...Events carry resource topics such as ledger, channels, settings, economy, health, users, prices, and leaderboards. The Vue console reloads the affected REST endpoints after receiving an invalidation.
Prerequisites:
- Rust toolchain with
cargo - Node.js and
pnpm - SQLite is used through the Rust application; no external database service is required
Build the console and start the server:
pnpm --dir frontend install
pnpm --dir frontend build
TOKENALTAR_ADMIN_EMAIL=admin@example.com \
TOKENALTAR_ADMIN_PASSWORD='change-me-now' \
cargo runThe server listens on 127.0.0.1:8080 by default and stores data in tokenaltar.sqlite3.
Run pnpm --dir frontend build before compiling or releasing Rust whenever the console changes. Built assets are embedded into the Rust binary, so runtime deployment does not need a frontend/dist directory.
Build and run locally with Docker Compose:
cp .env.example .env
$EDITOR .env
docker compose up -d --buildThe compose service publishes TokenAltar on http://localhost:8080, binds the application to 0.0.0.0:8080 inside the container, and persists SQLite data in the fixed Docker volume tokenaltar-data at /data/tokenaltar.sqlite3. The volume is explicitly named so updates from a different checkout directory or Compose project still reuse the same database volume.
To pull the image built by GitHub Actions instead of building locally:
docker pull ghcr.io/codeboy2006/tokenaltar:latest
docker compose up -dIf an earlier Compose deployment created a project-prefixed volume such as oldproject_tokenaltar-data, migrate it into the fixed volume before starting the new stack:
docker volume create tokenaltar-data
docker run --rm \
-v oldproject_tokenaltar-data:/from:ro \
-v tokenaltar-data:/to \
alpine sh -c 'cp -a /from/. /to/'
docker compose up -dDo not run docker compose down -v unless you intentionally want to delete the SQLite database volume.
The image build is automated by .github/workflows/docker-image.yml. Pushes to main, tags matching v*, and manual workflow runs publish to GitHub Container Registry with branch, tag, semantic-version, SHA, and default-branch latest tags. The workflow uses the repository GITHUB_TOKEN; ensure GitHub Actions has package write permission in the repository settings if GHCR publishing is blocked.
Environment variables:
| Variable | Default | Purpose |
|---|---|---|
TOKENALTAR_BIND |
127.0.0.1:8080 |
HTTP bind address. |
TOKENALTAR_DATABASE_URL |
sqlite://tokenaltar.sqlite3 |
SQLite database URL. |
TOKENALTAR_ADMIN_EMAIL |
unset | Creates the first admin if no admin exists. |
TOKENALTAR_ADMIN_PASSWORD |
unset | Initial password for the first admin. |
TOKENALTAR_LEADERBOARD_TIMEZONE |
server local timezone | Optional IANA timezone for day/month leaderboard windows, for example Asia/Shanghai. |
Runtime settings live in the console Settings page and /api/runtime-settings. Admins can configure invitation policy, seed balances, per-1M fallback prices, rounding, surge thresholds and multipliers, provider payout multiplier, retry/cooldown behavior, route weighting, queue/cache capacities, and defaults for new keys and channels.
Most request-time economy and routing settings apply without rebuilding. Startup-sized values, such as ledger queue capacity and affinity cache capacity, apply when the process starts.
- Channel token windows refresh on startup, dashboard/channel reads, and gateway requests.
- Disabled users cannot log in or authenticate through existing sessions or API keys.
- Disabling a user also disables their active API keys and channels while preserving ledger history.
- Deleted API keys and channels are hidden and unusable, but historical ledger rows remain.
- Regular users can add and manage their own upstream channels; upstream API secrets are always redacted from console responses.
- New API keys default to every current route channel. Keys that still cover the full route pool are automatically granted newly created channels; manually narrowed keys stay narrowed.
- Invite-gated registration is controlled by
invite_requiredandinvite_code_defaultin Settings. - Red packet claims are transaction guarded with unique
(packet, user)claims. - Leaderboards support
period=dayandperiod=month, count successful ledger entries only, and mask users that enable anonymous ranking.
cargo test
cargo clippy -- -D warnings
pnpm --dir frontend build
cargo build --releaseFor visual changes, run the app against a temporary SQLite database and inspect the relevant console pages with Playwright or a browser.


