Skip to content
#

ai-evals

Here are 39 public repositories matching this topic...

Eval-first AI agent that triages property maintenance emails. The real work is the eval system around it: trace-driven error analysis, code graders and validated LLM-as-judge (TPR/TNR), component and end-to-end evals, a failure taxonomy, and a CI regression gate. LangGraph, FastAPI, Langfuse.

  • Updated May 26, 2026
  • Python

Dali is an open infrastructure project focused on citation integrity, evidentiary lineage, and reproducibility for legal AI systems. It evaluates whether AI generated legal citations and workflows remain attributable, reconstructable, and verifiable across modern AI environments.

  • Updated May 28, 2026
  • Python

Improve this page

Add a description, image, and links to the ai-evals topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-evals topic, visit your repo's landing page and select "manage topics."

Learn more