Project 2 — LLM Monitoring & Observability

Adds full observability to the Project 1 RAG pipeline:

Langfuse tracing — one trace per query with spans, token counts, cost
Latency tracking — p50/p95/p99 per pipeline stage (retrieval, reranking, generation)
Cost-per-request — dollar cost calculated from token counts, logged to Langfuse
LLM-as-judge — async quality scoring (1-5) after every response
CI regression gate — GitHub Actions benchmark blocks PRs if p95 or quality drops

Setup

Step 1 — Install dependencies

py -3.11 -m venv venv
source venv/Scripts/activate    # Git Bash on Windows
pip install -r requirements.txt

Step 2 — Start Langfuse locally (Docker required)

Install Docker Desktop from https://docker.com/products/docker-desktop

Then:

docker compose -f docker-compose.langfuse.yml up -d

Open http://localhost:3000 in your browser. Create an account → go to Settings → API Keys → create a new key pair. Copy the public key and secret key.

Step 3 — Configure .env

cp .env.example .env

Fill in your keys:

ANTHROPIC_API_KEY=your_key
NEO4J_URI=bolt://localhost:7687
NEO4J_PASSWORD=your_password
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=http://localhost:3000

Step 4 — Copy Project 1 modules

Copy these folders from Project 1 into this project:

app/ingestion/
app/retrieval/
app/generation/
app/evaluation/
data/documents/
eval/golden_set/

Step 5 — Run ingestion

python -m app.ingestion.ingest --docs-dir data/documents

Step 6 — Start the API

uvicorn app.main:app --reload --port 8000

Step 7 — Test a query

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What is Reciprocal Rank Fusion?"}'

Response now includes latency_ms and cost_usd.

Step 8 — View your traces

Open http://localhost:3000 → Traces. You should see one trace per query with full span breakdown.

Step 9 — Check live metrics

curl http://localhost:8000/metrics

Returns p50/p95/p99 latency per stage and cost statistics.

Step 10 — Run the benchmark

python scripts/benchmark.py

Exits 0 (pass) or 1 (fail) based on p95 latency and quality thresholds.

GitHub Actions CI

Add these secrets to your repo (Settings → Secrets → Actions):

ANTHROPIC_API_KEY
NEO4J_URI
NEO4J_PASSWORD
LANGFUSE_PUBLIC_KEY
LANGFUSE_SECRET_KEY
LANGFUSE_HOST (use your cloud Langfuse URL if not self-hosted)

Every PR now runs both the RAGAS eval (Project 1) and the latency/quality benchmark (Project 2).

New API Endpoints

Endpoint	Method	Description
/query	POST	Full RAG pipeline with tracing + cost tracking
/metrics	GET	Live p50/p95/p99 latency + cost stats
/health	GET	Liveness + Langfuse enabled status

Project Structure

observability/
├── app/
│   ├── config.py                        # Extended config with Langfuse settings
│   ├── main.py                          # FastAPI with observability instrumentation
│   ├── observability/
│   │   ├── tracer.py                    # Langfuse trace context manager
│   │   ├── latency.py                   # p50/p95/p99 decorator + tracker
│   │   ├── cost.py                      # Cost-per-request calculator
│   │   └── judge.py                     # LLM-as-judge async quality scorer
│   ├── ingestion/                       # (copied from Project 1)
│   ├── retrieval/                       # (copied from Project 1)
│   ├── generation/                      # (copied from Project 1)
│   └── evaluation/                      # (copied from Project 1)
├── scripts/
│   └── benchmark.py                     # CI regression gate
├── .github/workflows/
│   └── benchmark.yml                    # GitHub Actions CI
├── docker-compose.langfuse.yml          # Local Langfuse setup
├── requirements.txt
└── .env.example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project 2 — LLM Monitoring & Observability

Setup

Step 1 — Install dependencies

Step 2 — Start Langfuse locally (Docker required)

Step 3 — Configure .env

Step 4 — Copy Project 1 modules

Step 5 — Run ingestion

Step 6 — Start the API

Step 7 — Test a query

Step 8 — View your traces

Step 9 — Check live metrics

Step 10 — Run the benchmark

GitHub Actions CI

New API Endpoints

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
app		app
data		data
eval/golden_set		eval/golden_set
scripts		scripts
.env.example		.env.example
Project2_LLM_Observability_Guide.docx		Project2_LLM_Observability_Guide.docx
README.md		README.md
docker-compose.langfuse.yml		docker-compose.langfuse.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Project 2 — LLM Monitoring & Observability

Setup

Step 1 — Install dependencies

Step 2 — Start Langfuse locally (Docker required)

Step 3 — Configure .env

Step 4 — Copy Project 1 modules

Step 5 — Run ingestion

Step 6 — Start the API

Step 7 — Test a query

Step 8 — View your traces

Step 9 — Check live metrics

Step 10 — Run the benchmark

GitHub Actions CI

New API Endpoints

Project Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages