Evaluate LLM responses using Gemini as an automated judge. Score any AI response on accuracy, relevance, coherence, hallucination risk, and conciseness.
-
Updated
Apr 2, 2026 - Python
Evaluate LLM responses using Gemini as an automated judge. Score any AI response on accuracy, relevance, coherence, hallucination risk, and conciseness.
Automated rubric-anchored response scorer with fairness audit and explainability — QWK 0.931, adverse-impact detection, attention-rollout highlights
Add a description, image, and links to the response-scoring topic page so that developers can more easily learn about it.
To associate your repository with the response-scoring topic, visit your repo's landing page and select "manage topics."