Domain-agnostic eval framework for AI applications. Measure retrieval quality, generation accuracy, and policy compliance across any vertical.
pip install synapt-evalOr install from source:
pip install git+https://github.com/synapt-dev/eval.gitimport asyncio
from synapt_eval import Fixture, EvalResult, CategoryMetrics
from synapt_eval.adapters import RetrievalAdapter, RetrievalCandidate
from synapt_eval.scoring import precision_at_k, recall_at_k
from synapt_eval.report_card import compose_report_card, generate_markdown
class MyRetrieval(RetrievalAdapter):
async def retrieve(self, query: str, k: int = 10) -> list[RetrievalCandidate]:
# Connect your vector store here
return [RetrievalCandidate(id="doc1", score=0.95)]
# Run eval and generate report
results = [EvalResult(
category="retrieval",
metrics=CategoryMetrics(p_at_5=0.85, r_at_10=0.72, n=50),
)]
card = compose_report_card(results, run_id="my-first-eval")
print(generate_markdown(card))See docs/quickstart.md for a complete walkthrough and examples/ for runnable code.
synapt-eval separates the eval framework (scoring, review, reporting) from domain-specific adapters (your retrieval backend, your generation pipeline, your fixtures).
Layer Module Purpose
------- ------ -------
Types synapt_eval.types Core data types (Fixture, EvalResult, CategoryMetrics)
Scoring synapt_eval.scoring Precision@K, Recall@K, Kendall's Tau
Adapters synapt_eval.adapters Customer-facing ABCs (Retrieval, Generation, Judge, Fixture)
Runner synapt_eval.runner Eval execution, orchestration, PR gate
Reviewer synapt_eval.reviewer Verdict framework, predicate chains, LLM judge bridge
Suggestion Engine synapt_eval.suggestion_engine Rule-based actionable recommendations
Report Card synapt_eval.report_card Markdown + JSON report generation
Trending synapt_eval.trending Self-hosted JSON history store + delta computation
CLI synapt_eval.cli Command-line viewer (synapt-eval trending)
Actions synapt_eval.actions GitHub Actions PR-gate adapter
| Feature | Description |
|---|---|
| Scoring primitives | Precision@K, Recall@K, Kendall's Tau rank correlation |
| Adapter pattern | Plug in any retrieval/generation backend via ABCs |
| Reviewer SDK | Composable predicate chains + LLM judge integration |
| Suggestion engine | 10 baseline rules with decorator pattern for custom rules |
| Report card | Markdown + JSON output with schema versioning |
| PR gate | Regression detection with configurable thresholds |
| Trending | Self-hosted history store with CLI viewer |
| GitHub Action | uses: synapt-dev/eval@v0.1.0 for CI integration |
Add eval gating to your PR workflow:
- name: Run eval
run: python my_eval_script.py --output results.json
- name: PR Gate
uses: synapt-dev/eval@v0.1.0
with:
results-path: results.json
baseline-path: baseline.json
threshold: "0.05"
fail-on: errorThe action posts a report card comment on the PR and fails the workflow on regressions. See docs/pr-gate.md for full configuration.
# View eval trending history
synapt-eval trending --path .synapt-eval/history --format text
# Output as markdown or JSON
synapt-eval trending --format markdown
synapt-eval trending --format json --limit 5| Guide | Description |
|---|---|
| Quickstart | End-to-end retrieval eval in 60 lines |
| Adapter API | Writing custom adapters |
| Reviewer Framework | Custom reviewers + judge integration |
| PR Gate | GitHub Actions CI integration |
| Suggestions | Writing custom suggestion rules |
| Trending | Self-hosted trending CLI |
Runnable examples in examples/:
- retrieval-eval -- mock retrieval backend + fixtures + report card
- generation-eval -- mock generation pipeline + judge
- full-pipeline -- combined retrieval + generation + reviewer + suggestions
Want vertical-specific eval packs, a hosted dashboard, or SOC2 attestations? Visit synapt.dev for synapt-eval Pro.
MIT