📈 FinanceBench evaluation of Mafin 2.5 (Powered by PageIndex)
-
Updated
Oct 20, 2025 - Python
📈 FinanceBench evaluation of Mafin 2.5 (Powered by PageIndex)
Advanced RAG pipelines for medical (HealthBench, MedCaseReasoning, MetaMedQA, PubMedQA) and financial (FinanceBench, Earnings Calls) QA. LangGraph orchestration + BAML structructed generation, Milvus Hybrid search (dense + BM25 + RRF), three-layer Metadata Enrichment, Contextual AI instruction-following reranker, and DeepEval evaluation.
Multi-agent LangGraph RAG for financial Q&A — 72.7% on FinanceBench under κ=0.932 calibrated judge. RBAC at the vector layer, multi-party HITL on high-stakes answers, self-hosted LLM observability. pip install financebench-rag-agent
🔍 Empower efficient retrieval with PageIndex, a reasoning-based system that eliminates the need for vector databases and chunking for human-like results.
An end-to-end RAG pipeline that answers questions about SEC 10-K filings with grounded citations instead of hallucinated numbers. Built on FinanceBench: indexing with FAISS + BGE embeddings, generation with Llama-3.3-70B, three-axis evaluation (correctness / faithfulness / page-hit@k), improvement cycles.
RAG pipeline for FinanceBench with retrieval, evaluation, improvement cycles, and chunk-size experiments.
Rigorous evaluation of contextual retrieval techniques on FinanceBench: comparing 5 embedders × 4 chunking strategies with bootstrapped confidence intervals on FinMTEB and FinanceBench.
Add a description, image, and links to the financebench topic page so that developers can more easily learn about it.
To associate your repository with the financebench topic, visit your repo's landing page and select "manage topics."