Skip to content

Add RAGAS evaluation system with golden dataset#26

Merged
robayedl merged 1 commit into
mainfrom
eval-golden
May 5, 2026
Merged

Add RAGAS evaluation system with golden dataset#26
robayedl merged 1 commit into
mainfrom
eval-golden

Conversation

@robayedl
Copy link
Copy Markdown
Owner

@robayedl robayedl commented May 5, 2026

  • eval/golden.jsonl: 30-question golden dataset from "Attention Is All You Need" (15 factual, 8 reasoning, 5 multi_hop, 2 out_of_scope)
  • eval/run.py: evaluation runner — calls DocuMind pipeline directly, computes RAGAS metrics with Gemini 2.5 Flash as judge, prints per-question table, saves timestamped JSON + overwrites latest.json, auto-updates README scores after each run
  • eval/results/latest.json: latest evaluation results (30 questions)
  • eval/EVALUATION_GUIDE.md: dataset format, usage, cost estimates
  • Makefile: run / ui / test / lint / eval / update-readme targets
  • requirements.txt: bump ragas 0.0.22 → >=0.2.0
  • README.md: evaluation section with live scores, split code blocks,
  • Remove legacy eval/ragas_eval.py, eval/run_eval.py, eval/test_queries.json

- eval/golden.jsonl: 30-question golden dataset from "Attention Is All
  You Need" (15 factual, 8 reasoning, 5 multi_hop, 2 out_of_scope)
- eval/run.py: evaluation runner — calls DocuMind pipeline directly,
  computes RAGAS metrics with Gemini 2.5 Flash as judge, prints
  per-question table, saves timestamped JSON + overwrites latest.json,
  auto-updates README scores after each run
- eval/results/latest.json: latest evaluation results (30 questions)
- eval/EVALUATION_GUIDE.md: dataset format, usage, cost estimates
- Makefile: run / ui / test / lint / eval / update-readme targets
- requirements.txt: bump ragas 0.0.22 → >=0.2.0
- README.md: evaluation section with live scores, split code blocks,
- Remove legacy eval/ragas_eval.py, eval/run_eval.py, eval/test_queries.json
@robayedl robayedl merged commit b5b88fc into main May 5, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant