InvenioAI — Advanced RAG for Document Q&A

Hybrid Search, RAG Fusion, and Chain-of-Thought (CoT) Reasoning.

Overview

In the era of information density, extracting precise answers from large PDF collections is critical. InvenioAI is a high-performance Advanced RAG system that implements a state-of-the-art Hybrid architecture.

It transforms static enterprise PDF documents into a searchable, intelligent knowledge base, allowing users to ask complex questions and receive answers grounded in multi-stage retrieved context with verifiable source citations.

Live Demo

Hugging Face Space: https://felixhrdyn-invenioai.hf.space

🎬 Demo Video

InvenioAI.Demo.-.CoT.Reasoning.Table.Integrity.mp4

Interactive Demo: hybrid search retrieval, and multi-stage reasoning.

Technical Features

Ingestion & Document Processing

Running Header/Footer Elimination: Position-based boundary analysis that dynamically cleanses repetitive noise (page titles, section lines, and page numbers) from long PDFs without losing structural content.
Structure-Aware Chunking: Hierarchical text splitting combining MarkdownHeaderTextSplitter (heading levels H1-H3) and RecursiveCharacterTextSplitter to naturally preserve document outlines in vector payloads.
Cross-Page Context Propagation: Active header state machine that automatically inherits and propagates parent headings to continuation pages, preventing context starvation during retrieval.

Retrieval & Search

Hybrid Search: Combines dense semantic retrieval (MMR) with server-side sparse vector search (BM42), fused natively in Qdrant for superior accuracy and scalability.
RAG Fusion: Implements Multi-Query generation to capture diverse user intents and improve retrieval coverage.
Advanced Reranking: Utilizes Cross-Encoder models (ms-marco-MiniLM-L-12-v2) via FlashRank to re-evaluate top candidates, ensuring the most relevant context is provided to the LLM.

Logic & Intelligence

Chain-of-Thought (CoT) Reasoning: Implements a 4-step structured reasoning protocol (Query Deconstruction, Filtering, Synthesis, Strategy) to ensure grounded and logical answers.
Semantic Caching: Dual-layer caching strategy (Exact Match + Semantic Similarity > 0.90) to eliminate redundant LLM API calls and provide near-instant responses for paraphrased queries.

Core System & UX

Async Job Orchestration: Background indexing and query execution with real-time status polling for a smooth user experience.
Deep Analytics Dashboard: Built-in metrics tracking for retrieval accuracy (nDCG, HitRate), latency, and API usage.
Minimalist UI/UX: Centered branding with 'Outfit' typography, glassmorphism aesthetics, and a streamlined Knowledge Base management interface.
Cloud-Ready Architecture: Ships with an all-in-one Docker configuration optimized for Hugging Face Spaces and Azure Container Apps.

Technology Stack

Backend

Framework: FastAPI
RAG Engine: LangChain
PDF Parser: LlamaParse (High-fidelity Markdown extraction)
LLM: Llama 3.3 70B & Llama 3.1 8B (Groq Cloud)
Reasoning: Chain-of-Thought (CoT) structured 4-step protocol
Embedding Model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 (Dense)
Sparse Model: Qdrant/bm42-all-minilm-l6-v2-attentions (Sparse)
Reranker: FlashRank (ms-marco-MiniLM-L-12-v2 Cross-Encoder)
Search: Qdrant Native Hybrid (Dense + Sparse) + RAG Fusion (Multi-Query)
Caching: Dual-Layer Semantic Caching (DiskCache / Redis + NumPy cosine similarity)

Frontend

Framework: Streamlit
Visualization: Plotly, Pandas
Styling: Vanilla CSS (Custom Design System)
Icons: Lucide (SVG)

Infrastructure

Vector Database: Qdrant (Local / Server / Cloud)
Deployment: Docker, GitHub Actions (CI/CD)
Environment: Python 3.12

System Architecture

graph TD
    subgraph Data_Layer [Ingestion Layer]
        PDF[PDF Documents] -->|Upload| API[FastAPI Backend]
        API -->|Chunking| Split[Text Splitter]
    end
    
    subgraph Intelligence_Layer [Processing & RAG]
        Split -->|Dense + Sparse| QDR[Qdrant Vector DB]
        
        API -->|Query Rewriting| Rewriter[Query Rewriter]
        Rewriter -->|Semantic Lookup| Cache{Semantic Cache}
        Cache -->|Miss| RAG[Hybrid Retriever]
        Cache -->|Hit| LLM
        RAG -->|Native Hybrid| QDR
        QDR -->|Reranking| Rerank[Cross-Encoder]
        Rerank -->|Context| LLM["Groq (Llama 3.1) LLM"]
    end
    
    subgraph Presentation_Layer [UI & Analytics]
        UI[Streamlit Dashboard] -->|REST API| API
        LLM -->|Answer| UI
        API -->|Log| Metrics[Local Metrics Store]
        Metrics -->|Visualize| Dashboard[Analytics Page]
    end

Performance & Limits

InvenioAI is optimized for speed and retrieval precision while maintaining low operational costs.

Core Metrics & Operational Limits

Parameter	Value	Description
Retrieval Mode	Native Hybrid	Dense (MMR) + Sparse (BM42)
Rerank Top-K	5 Docs	Optimized context window for LLM
Avg. Response	~15s	Total end-to-end latency (RAG Fusion + Reranking)
Avg. Retrieval	~2s	Multi-query hybrid search & RRF fusion time

Deployment Guide

Prerequisites

Python 3.12
Google Groq (Llama 3.3) API Key
Qdrant Instance (Optional, defaults to local storage)

Execution Procedures

Step 1: Environment Setup

python -m venv venv
source venv/bin/activate  # venv\Scripts\activate on Windows
pip install -r requirements.txt
cp .env.example .env

Step 2: Run Application

# Terminal 1: Backend API
uvicorn app.main:app --reload

# Terminal 2: Streamlit UI
streamlit run frontend/streamlit_app.py

Step 3: Docker (Production)

docker build -t invenioai .
docker run -p 7860:7860 invenioai

Configuration

The application is configured via .env. Key variables include:

GROQ_API_KEY: Required for LLM and Query Rewriting.
QDRANT_URL: Optional server URL (defaults to local ./qdrant_storage).
INVENIOAI_ENABLE_HYBRID_SEARCH: Toggle dense+lexical mode (Default: 1).
INVENIOAI_DELETE_UPLOADED_PDFS: Clean up storage after indexing (Default: 0).

Author

Felix Hardyan

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
.github/workflows		.github/workflows
backend		backend
docs		docs
frontend		frontend
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.hf.md		README.hf.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
readme.md		readme.md
requirements.txt		requirements.txt
run_local.bat		run_local.bat
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InvenioAI — Advanced RAG for Document Q&A

Overview

Live Demo

🎬 Demo Video

Technical Features

Ingestion & Document Processing

Retrieval & Search

Logic & Intelligence

Core System & UX

Technology Stack

Backend

Frontend

Infrastructure

System Architecture

Performance & Limits

Core Metrics & Operational Limits

Deployment Guide

Prerequisites

Execution Procedures

Configuration

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

InvenioAI — Advanced RAG for Document Q&A

Overview

Live Demo

🎬 Demo Video

Technical Features

Ingestion & Document Processing

Retrieval & Search

Logic & Intelligence

Core System & UX

Technology Stack

Backend

Frontend

Infrastructure

System Architecture

Performance & Limits

Core Metrics & Operational Limits

Deployment Guide

Prerequisites

Execution Procedures

Configuration

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages