Skip to content

flxhrdyn/InvenioAI

Repository files navigation

InvenioAI Logo

InvenioAI — Advanced RAG for Document Q&A

Hybrid Search, RAG Fusion, and Chain-of-Thought (CoT) Reasoning.

FastAPI Streamlit LangChain Qdrant Groq Python


Overview

In the era of information density, extracting precise answers from large PDF collections is critical. InvenioAI is a high-performance Advanced RAG system that implements a state-of-the-art Hybrid architecture.

It transforms static enterprise PDF documents into a searchable, intelligent knowledge base, allowing users to ask complex questions and receive answers grounded in multi-stage retrieved context with verifiable source citations.

Live Demo

🎬 Demo Video

InvenioAI.Demo.-.CoT.Reasoning.Table.Integrity.mp4

Interactive Demo: hybrid search retrieval, and multi-stage reasoning.

Technical Features

Ingestion & Document Processing

  • Running Header/Footer Elimination: Position-based boundary analysis that dynamically cleanses repetitive noise (page titles, section lines, and page numbers) from long PDFs without losing structural content.
  • Structure-Aware Chunking: Hierarchical text splitting combining MarkdownHeaderTextSplitter (heading levels H1-H3) and RecursiveCharacterTextSplitter to naturally preserve document outlines in vector payloads.
  • Cross-Page Context Propagation: Active header state machine that automatically inherits and propagates parent headings to continuation pages, preventing context starvation during retrieval.

Retrieval & Search

  • Hybrid Search: Combines dense semantic retrieval (MMR) with server-side sparse vector search (BM42), fused natively in Qdrant for superior accuracy and scalability.
  • RAG Fusion: Implements Multi-Query generation to capture diverse user intents and improve retrieval coverage.
  • Advanced Reranking: Utilizes Cross-Encoder models (ms-marco-MiniLM-L-12-v2) via FlashRank to re-evaluate top candidates, ensuring the most relevant context is provided to the LLM.

Logic & Intelligence

  • Chain-of-Thought (CoT) Reasoning: Implements a 4-step structured reasoning protocol (Query Deconstruction, Filtering, Synthesis, Strategy) to ensure grounded and logical answers.
  • Semantic Caching: Dual-layer caching strategy (Exact Match + Semantic Similarity > 0.90) to eliminate redundant LLM API calls and provide near-instant responses for paraphrased queries.

Core System & UX

  • Async Job Orchestration: Background indexing and query execution with real-time status polling for a smooth user experience.
  • Deep Analytics Dashboard: Built-in metrics tracking for retrieval accuracy (nDCG, HitRate), latency, and API usage.
  • Minimalist UI/UX: Centered branding with 'Outfit' typography, glassmorphism aesthetics, and a streamlined Knowledge Base management interface.
  • Cloud-Ready Architecture: Ships with an all-in-one Docker configuration optimized for Hugging Face Spaces and Azure Container Apps.

Technology Stack

Backend

  • Framework: FastAPI
  • RAG Engine: LangChain
  • PDF Parser: LlamaParse (High-fidelity Markdown extraction)
  • LLM: Llama 3.3 70B & Llama 3.1 8B (Groq Cloud)
  • Reasoning: Chain-of-Thought (CoT) structured 4-step protocol
  • Embedding Model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 (Dense)
  • Sparse Model: Qdrant/bm42-all-minilm-l6-v2-attentions (Sparse)
  • Reranker: FlashRank (ms-marco-MiniLM-L-12-v2 Cross-Encoder)
  • Search: Qdrant Native Hybrid (Dense + Sparse) + RAG Fusion (Multi-Query)
  • Caching: Dual-Layer Semantic Caching (DiskCache / Redis + NumPy cosine similarity)

Frontend

  • Framework: Streamlit
  • Visualization: Plotly, Pandas
  • Styling: Vanilla CSS (Custom Design System)
  • Icons: Lucide (SVG)

Infrastructure

  • Vector Database: Qdrant (Local / Server / Cloud)
  • Deployment: Docker, GitHub Actions (CI/CD)
  • Environment: Python 3.12

System Architecture

graph TD
    subgraph Data_Layer [Ingestion Layer]
        PDF[PDF Documents] -->|Upload| API[FastAPI Backend]
        API -->|Chunking| Split[Text Splitter]
    end
    
    subgraph Intelligence_Layer [Processing & RAG]
        Split -->|Dense + Sparse| QDR[Qdrant Vector DB]
        
        API -->|Query Rewriting| Rewriter[Query Rewriter]
        Rewriter -->|Semantic Lookup| Cache{Semantic Cache}
        Cache -->|Miss| RAG[Hybrid Retriever]
        Cache -->|Hit| LLM
        RAG -->|Native Hybrid| QDR
        QDR -->|Reranking| Rerank[Cross-Encoder]
        Rerank -->|Context| LLM["Groq (Llama 3.1) LLM"]
    end
    
    subgraph Presentation_Layer [UI & Analytics]
        UI[Streamlit Dashboard] -->|REST API| API
        LLM -->|Answer| UI
        API -->|Log| Metrics[Local Metrics Store]
        Metrics -->|Visualize| Dashboard[Analytics Page]
    end
Loading

Performance & Limits

InvenioAI is optimized for speed and retrieval precision while maintaining low operational costs.

Core Metrics & Operational Limits

Parameter Value Description
Retrieval Mode Native Hybrid Dense (MMR) + Sparse (BM42)
Rerank Top-K 5 Docs Optimized context window for LLM
Avg. Response ~15s Total end-to-end latency (RAG Fusion + Reranking)
Avg. Retrieval ~2s Multi-query hybrid search & RRF fusion time

Deployment Guide

Prerequisites

  • Python 3.12
  • Google Groq (Llama 3.3) API Key
  • Qdrant Instance (Optional, defaults to local storage)

Execution Procedures

Step 1: Environment Setup

python -m venv venv
source venv/bin/activate  # venv\Scripts\activate on Windows
pip install -r requirements.txt
cp .env.example .env

Step 2: Run Application

# Terminal 1: Backend API
uvicorn app.main:app --reload

# Terminal 2: Streamlit UI
streamlit run frontend/streamlit_app.py

Step 3: Docker (Production)

docker build -t invenioai .
docker run -p 7860:7860 invenioai

Configuration

The application is configured via .env. Key variables include:

  • GROQ_API_KEY: Required for LLM and Query Rewriting.
  • QDRANT_URL: Optional server URL (defaults to local ./qdrant_storage).
  • INVENIOAI_ENABLE_HYBRID_SEARCH: Toggle dense+lexical mode (Default: 1).
  • INVENIOAI_DELETE_UPLOADED_PDFS: Clean up storage after indexing (Default: 0).

Author

Felix Hardyan

About

InvenioAI is a document Q&A system built with FastAPI (backend) and Streamlit (frontend). It runs a Advanced RAG pipeline backed by LangChain and Qdrant.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors