Adds full observability to the Project 1 RAG pipeline:
- Langfuse tracing — one trace per query with spans, token counts, cost
- Latency tracking — p50/p95/p99 per pipeline stage (retrieval, reranking, generation)
- Cost-per-request — dollar cost calculated from token counts, logged to Langfuse
- LLM-as-judge — async quality scoring (1-5) after every response
- CI regression gate — GitHub Actions benchmark blocks PRs if p95 or quality drops
py -3.11 -m venv venv
source venv/Scripts/activate # Git Bash on Windows
pip install -r requirements.txtInstall Docker Desktop from https://docker.com/products/docker-desktop
Then:
docker compose -f docker-compose.langfuse.yml up -dOpen http://localhost:3000 in your browser. Create an account → go to Settings → API Keys → create a new key pair. Copy the public key and secret key.
cp .env.example .envFill in your keys:
ANTHROPIC_API_KEY=your_key
NEO4J_URI=bolt://localhost:7687
NEO4J_PASSWORD=your_password
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=http://localhost:3000
Copy these folders from Project 1 into this project:
app/ingestion/
app/retrieval/
app/generation/
app/evaluation/
data/documents/
eval/golden_set/
python -m app.ingestion.ingest --docs-dir data/documentsuvicorn app.main:app --reload --port 8000curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"question": "What is Reciprocal Rank Fusion?"}'Response now includes latency_ms and cost_usd.
Open http://localhost:3000 → Traces. You should see one trace per query with full span breakdown.
curl http://localhost:8000/metricsReturns p50/p95/p99 latency per stage and cost statistics.
python scripts/benchmark.pyExits 0 (pass) or 1 (fail) based on p95 latency and quality thresholds.
Add these secrets to your repo (Settings → Secrets → Actions):
ANTHROPIC_API_KEYNEO4J_URINEO4J_PASSWORDLANGFUSE_PUBLIC_KEYLANGFUSE_SECRET_KEYLANGFUSE_HOST(use your cloud Langfuse URL if not self-hosted)
Every PR now runs both the RAGAS eval (Project 1) and the latency/quality benchmark (Project 2).
| Endpoint | Method | Description |
|---|---|---|
| /query | POST | Full RAG pipeline with tracing + cost tracking |
| /metrics | GET | Live p50/p95/p99 latency + cost stats |
| /health | GET | Liveness + Langfuse enabled status |
observability/
├── app/
│ ├── config.py # Extended config with Langfuse settings
│ ├── main.py # FastAPI with observability instrumentation
│ ├── observability/
│ │ ├── tracer.py # Langfuse trace context manager
│ │ ├── latency.py # p50/p95/p99 decorator + tracker
│ │ ├── cost.py # Cost-per-request calculator
│ │ └── judge.py # LLM-as-judge async quality scorer
│ ├── ingestion/ # (copied from Project 1)
│ ├── retrieval/ # (copied from Project 1)
│ ├── generation/ # (copied from Project 1)
│ └── evaluation/ # (copied from Project 1)
├── scripts/
│ └── benchmark.py # CI regression gate
├── .github/workflows/
│ └── benchmark.yml # GitHub Actions CI
├── docker-compose.langfuse.yml # Local Langfuse setup
├── requirements.txt
└── .env.example