Track token usage and cost for OpenRouter Groq and OpenAI — with seamless FastAPI integration
Monitor every token spent per request chat session plus cost estimation using real model pricing
- Introduction 📖
- Features ✨
- Architecture 🏗️
- Workflow 🔄
- Tech Stack 🛠️
- Installation 📥
- Project Structure 🗂️
- Usage 🚀
AI-Token-Monitor solves the challenge of tracking token consumption and cost across multiple AI providers such as OpenRouter, Groq, and OpenAI. It provides fine-grained monitoring per chat session and request, enabling developers to optimize usage and control expenses effectively.
This project benefits AI developers, API integrators, and product teams who rely on OpenAI-compatible APIs and want transparent and accurate token usage billing metrics integrated directly with FastAPI applications.
| Feature | AI-Token-Monitor | Alternative A | Alternative B |
|---|---|---|---|
| Multi-provider token tracking | ✅ OpenRouter Groq OpenAI | ❌ Limited to OpenAI only | ❌ Limited to single provider |
| FastAPI middleware support | ✅ Built-in middleware | ❌ Requires custom setup | ❌ No middleware |
| Per chat session tracking | ✅ Yes | ❌ No | ❌ No |
| Real-time cost estimation | ✅ Based on live pricing | ❌ Static pricing only | ❌ No cost estimate |
| Simple in-memory storage | ✅ Default with option to extend | ❌ No storage or DB only | ❌ Only cloud DB |
| Open-source & extensible | ✅ Fully open and modular | ❌ Closed or proprietary | ❌ Limited extensibility |
- 🔍 Track tokens used per request and chat session with detailed logs
- 💰 Real-time cost estimation using up-to-date model pricing from OpenRouter
- 🔗 Support for multiple AI providers: OpenRouter, Groq, OpenAI
- 🛠️ FastAPI middleware integration for automatic request tracking
- 🧩 Modular design with components like TokenMonitor, ChatManager, and Storage
- ⚡ Lightweight in-memory storage by default, easily replaceable with any DB adapter
- 🐍 Pythonic API with simple methods for adding messages and tracking tokens
- 📦 Packaged as a PyPI-installable library for easy integration
- 🚀 Ready to deploy with FastAPI applications out-of-the-box
- 🔒 Secure environment variable management via dotenv support
- 📝 Clear logging with configurable verbosity for production and development
flowchart LR
Client[Client Request] --> API[FastAPI Application]
API --> Middleware[TokenMonitorMiddleware]
Middleware --> ChatManager[ChatManager]
Middleware --> TokenMonitor[TokenMonitor]
TokenMonitor --> Storage[InMemoryStorage]
ChatManager --> Storage
API --> ExternalAI[OpenRouter Groq OpenAI APIs]
ExternalAI --> API
| Component | Role | Technology |
|---|---|---|
| FastAPI Application | Serves API endpoints and handles client requests | Python FastAPI |
| TokenMonitorMiddleware | Intercepts and tracks tokens for each API request | Starlette Middleware |
| ChatManager | Manages chat sessions and messages | Python Class |
| TokenMonitor | Tracks tokens and estimates cost per response | Python Class |
| InMemoryStorage | Stores token usage logs for retrieval | Python List-based |
| ExternalAI APIs | Provides AI model responses | OpenRouter Groq OpenAI |
sequenceDiagram
actor User
participant ClientApp
participant FastAPI
participant Middleware
participant ChatManager
participant TokenMonitor
participant Storage
participant ExternalAI
User->>ClientApp: Sends chat message
ClientApp->>FastAPI: POST chat message
FastAPI->>Middleware: Process request
Middleware->>ChatManager: Create or retrieve chat session
FastAPI->>ExternalAI: Forward chat message
ExternalAI-->>FastAPI: Returns AI response
Middleware->>TokenMonitor: Track tokens and cost
TokenMonitor->>Storage: Save token logs
Middleware-->>FastAPI: Pass updated response
FastAPI-->>ClientApp: Send AI response with usage info
ClientApp-->>User: Display chat and cost
- User sends a chat message via client app.
- FastAPI receives the message and passes it through TokenMonitorMiddleware.
- Middleware invokes ChatManager to create or fetch the chat session.
- FastAPI forwards the message to the external AI provider (OpenRouter/Groq/OpenAI).
- AI provider returns a response with token usage metadata.
- Middleware calls TokenMonitor to extract tokens and calculate cost.
- TokenMonitor stores usage logs in InMemoryStorage.
- Response with usage details is returned to client app and displayed to the user.
| Layer | Technology | Purpose |
|---|---|---|
| API Framework | FastAPI | Web API and middleware |
| Middleware | Starlette Middleware | Request interception and tracking |
| AI SDK | OpenAI Python SDK | Interact with AI providers |
| Storage | In-memory Python List | Temporary token usage storage |
| Environment | Python dotenv | Secure environment variable management |
| Logging | Python logging | Structured application logs |
- Python 3.8 or newer
- pip package manager
- OpenRouter API key (or keys for Groq/OpenAI)
git clone https://github.com/Tharanika-R-Git/AI-Token-Monitor.git
cd AI-Token-Monitor
pip install -r requirements.txtcp .env.example .env
# Edit .env to add your OPENROUTER_API_KEY and other credentialsAI-Token-Monitor/
├── ai_token_monitor/
│ ├── __init__.py # Package initialization
│ ├── chat.py # Chat session management
│ ├── logger.py # Logging setup
│ ├── middleware.py # FastAPI middleware for token tracking
│ ├── monitor.py # Token tracking and cost calculation
│ ├── pricing.py # Model pricing data and utilities
│ ├── storage.py # In-memory storage for logs
│ └── utils.py # Utility functions for response normalization
├── fastapi_app.py # Demo FastAPI application using the package
├── requirements.txt # Python dependencies
├── .env.example # Example environment variables file
└── README.md # This documentation
from ai_token_monitor import TokenMonitor, ChatManager
monitor = TokenMonitor()
chat_manager = ChatManager()
chat_id = chat_manager.create_chat(user_id="user123")
# Simulate adding a message and AI response tracking
chat_manager.add_message(chat_id, role="user", content="Hello AI!")
response = {
"usage": {"total_tokens": 50},
"model": "openai-gpt-4",
"choices": [{"message": {"content": "Hello user!"}}]
}
monitor.track(response, model="openai-gpt-4", chat_manager=chat_manager, chat_id=chat_id)
print(f"Total tokens used: {monitor.total_tokens}")
print(f"Total cost estimate USD: {monitor.total_cost:.4f}")import os
from fastapi import FastAPI, Request
from openai import OpenAI
from dotenv import load_dotenv
from ai_token_monitor import TokenMonitor, ChatManager
load_dotenv()
app = FastAPI(title="AI Token Monitor Demo")
monitor = TokenMonitor()
chat_manager = ChatManager()
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ["OPENROUTER_API_KEY"]
)
@app.post("/chat/{user_id}")
async def chat(user_id: str, request: Request):
body = await request.json()
message = body.get("message", "")
chat_id = chat_manager.create_chat(user_id)
chat_manager.add_message(chat_id, role="user", content=message)
# Call AI provider
response = client.chat.completions.create(
model="openai-gpt-4o-mini",
messages=[{"role": "user", "content": message}]
)
# Track tokens and cost
monitor.track(response, model="openai-gpt-4o-mini", chat_manager=chat_manager, chat_id=chat_id)
chat_manager.add_message(chat_id, role="assistant", content=response.choices[0].message.content)
return {
"response": response.choices[0].message.content,
"total_tokens": monitor.total_tokens,
"total_cost": monitor.total_cost
}Thank you for using AI-Token-Monitor! For issues, feature requests, or contributions, please open an issue or pull request on GitHub.
This project is licensed under the MIT License.
🔗 GitHub Repo: https://github.com/Tharanika-R-Git/AI-Token-Monitor