In the era of information density, extracting precise answers from large PDF collections is critical. InvenioAI is a high-performance Advanced RAG system that implements a state-of-the-art Hybrid architecture.
It transforms static enterprise PDF documents into a searchable, intelligent knowledge base, allowing users to ask complex questions and receive answers grounded in multi-stage retrieved context with verifiable source citations.
- Hugging Face Space: https://felixhrdyn-invenioai.hf.space
InvenioAI.Demo.-.CoT.Reasoning.Table.Integrity.mp4
Interactive Demo: hybrid search retrieval, and multi-stage reasoning.
- Running Header/Footer Elimination: Position-based boundary analysis that dynamically cleanses repetitive noise (page titles, section lines, and page numbers) from long PDFs without losing structural content.
- Structure-Aware Chunking: Hierarchical text splitting combining
MarkdownHeaderTextSplitter(heading levels H1-H3) andRecursiveCharacterTextSplitterto naturally preserve document outlines in vector payloads. - Cross-Page Context Propagation: Active header state machine that automatically inherits and propagates parent headings to continuation pages, preventing context starvation during retrieval.
- Hybrid Search: Combines dense semantic retrieval (MMR) with server-side sparse vector search (BM42), fused natively in Qdrant for superior accuracy and scalability.
- RAG Fusion: Implements Multi-Query generation to capture diverse user intents and improve retrieval coverage.
- Advanced Reranking: Utilizes Cross-Encoder models (
ms-marco-MiniLM-L-12-v2) via FlashRank to re-evaluate top candidates, ensuring the most relevant context is provided to the LLM.
- Chain-of-Thought (CoT) Reasoning: Implements a 4-step structured reasoning protocol (Query Deconstruction, Filtering, Synthesis, Strategy) to ensure grounded and logical answers.
- Semantic Caching: Dual-layer caching strategy (Exact Match + Semantic Similarity > 0.90) to eliminate redundant LLM API calls and provide near-instant responses for paraphrased queries.
- Async Job Orchestration: Background indexing and query execution with real-time status polling for a smooth user experience.
- Deep Analytics Dashboard: Built-in metrics tracking for retrieval accuracy (nDCG, HitRate), latency, and API usage.
- Minimalist UI/UX: Centered branding with 'Outfit' typography, glassmorphism aesthetics, and a streamlined Knowledge Base management interface.
- Cloud-Ready Architecture: Ships with an all-in-one Docker configuration optimized for Hugging Face Spaces and Azure Container Apps.
- Framework: FastAPI
- RAG Engine: LangChain
- PDF Parser: LlamaParse (High-fidelity Markdown extraction)
- LLM: Llama 3.3 70B & Llama 3.1 8B (Groq Cloud)
- Reasoning: Chain-of-Thought (CoT) structured 4-step protocol
- Embedding Model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 (Dense)
- Sparse Model: Qdrant/bm42-all-minilm-l6-v2-attentions (Sparse)
- Reranker: FlashRank (ms-marco-MiniLM-L-12-v2 Cross-Encoder)
- Search: Qdrant Native Hybrid (Dense + Sparse) + RAG Fusion (Multi-Query)
- Caching: Dual-Layer Semantic Caching (DiskCache / Redis + NumPy cosine similarity)
- Framework: Streamlit
- Visualization: Plotly, Pandas
- Styling: Vanilla CSS (Custom Design System)
- Icons: Lucide (SVG)
- Vector Database: Qdrant (Local / Server / Cloud)
- Deployment: Docker, GitHub Actions (CI/CD)
- Environment: Python 3.12
graph TD
subgraph Data_Layer [Ingestion Layer]
PDF[PDF Documents] -->|Upload| API[FastAPI Backend]
API -->|Chunking| Split[Text Splitter]
end
subgraph Intelligence_Layer [Processing & RAG]
Split -->|Dense + Sparse| QDR[Qdrant Vector DB]
API -->|Query Rewriting| Rewriter[Query Rewriter]
Rewriter -->|Semantic Lookup| Cache{Semantic Cache}
Cache -->|Miss| RAG[Hybrid Retriever]
Cache -->|Hit| LLM
RAG -->|Native Hybrid| QDR
QDR -->|Reranking| Rerank[Cross-Encoder]
Rerank -->|Context| LLM["Groq (Llama 3.1) LLM"]
end
subgraph Presentation_Layer [UI & Analytics]
UI[Streamlit Dashboard] -->|REST API| API
LLM -->|Answer| UI
API -->|Log| Metrics[Local Metrics Store]
Metrics -->|Visualize| Dashboard[Analytics Page]
end
InvenioAI is optimized for speed and retrieval precision while maintaining low operational costs.
| Parameter | Value | Description |
|---|---|---|
| Retrieval Mode | Native Hybrid | Dense (MMR) + Sparse (BM42) |
| Rerank Top-K | 5 Docs | Optimized context window for LLM |
| Avg. Response | ~15s | Total end-to-end latency (RAG Fusion + Reranking) |
| Avg. Retrieval | ~2s | Multi-query hybrid search & RRF fusion time |
- Python 3.12
- Google Groq (Llama 3.3) API Key
- Qdrant Instance (Optional, defaults to local storage)
Step 1: Environment Setup
python -m venv venv
source venv/bin/activate # venv\Scripts\activate on Windows
pip install -r requirements.txt
cp .env.example .envStep 2: Run Application
# Terminal 1: Backend API
uvicorn app.main:app --reload
# Terminal 2: Streamlit UI
streamlit run frontend/streamlit_app.pyStep 3: Docker (Production)
docker build -t invenioai .
docker run -p 7860:7860 invenioaiThe application is configured via .env. Key variables include:
GROQ_API_KEY: Required for LLM and Query Rewriting.QDRANT_URL: Optional server URL (defaults to local./qdrant_storage).INVENIOAI_ENABLE_HYBRID_SEARCH: Toggle dense+lexical mode (Default:1).INVENIOAI_DELETE_UPLOADED_PDFS: Clean up storage after indexing (Default:0).
Felix Hardyan
