DeepSearch Assistant

A multimodal RAG (Retrieval-Augmented Generation) desktop application for intelligent personal document search. Built with llama.cpp for universal hardware support and PyQt6 for a native desktop experience.

Find anything in your personal files — documents, images, videos, audio — using natural language.

Features

Semantic Search: Ask questions in natural language, get accurate answers with source citations
Multimodal: Index and search across DOCX, PDF, PPTX, XLSX, TXT, Markdown, images, video, and audio
Three Search Modes:
- Fast Search (<3s): Direct hybrid retrieval + LLM answer
- Deep Search (5-15s): Query rewriting + multi-round retrieval + reranking
- Cloud Deep (15-40s): Local retrieval + cloud LLM synthesis for complex queries
Universal Hardware: Runs on any PC with CPU. GPU acceleration optional (NVIDIA CUDA, Intel via OpenVINO)
Privacy-First: All data stays on your device. Cloud mode only sends PII-scrubbed summaries
Hybrid Retrieval: Dense vectors + BM25 sparse search + Reciprocal Rank Fusion
Auto-Indexing: File watcher monitors folders and re-indexes changed files automatically

Architecture Overview

User Files (.docx, .pdf, .jpg, .mp4, ...)
    |
    v
[Ingestion Pipeline] -- parsers per file type
    |                    (PyMuPDF, python-docx, PaddleOCR, faster-whisper)
    v
[Chunker] -- hierarchical chunking (1024 -> 256 -> 64 tokens)
    |
    v
[Embedding] -- sentence-transformers (all-MiniLM-L6-v2)
    |
    v
[Qdrant Vector Store] -- dense + BM25 sparse + metadata
    |
    v
[Query Pipeline] -- hybrid search -> reranker -> context builder
    |
    v
[LLM Generation] -- llama.cpp (Qwen2.5-7B GGUF) / OpenVINO (optional)
    |
    v
[PyQt6 Desktop UI] -- streaming answers + source citations

System Requirements

Minimum

OS: Windows 10/11, Linux (Ubuntu 20.04+), macOS 12+
CPU: Any x86-64 processor (Intel or AMD)
RAM: 16 GB
Storage: 10 GB free (for models + index)
Python: 3.10+

Installation

1. Clone the repository

git clone https://github.com/your-username/DeepSearchAssistant.git
cd DeepSearchAssistant

2. Create virtual environment

python -m venv .venv
# Windows
.venv\Scripts\activate
# Linux/macOS
source .venv/bin/activate

3. Install dependencies

# CPU-only (works everywhere)
pip install -e .

# With NVIDIA GPU support
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124
pip install -e .

# With Intel OpenVINO support (optional)
pip install -e ".[openvino]"

4. Download models

python scripts/download_models.py

This downloads the required models (~5 GB for Stage 1):

Qwen2.5-7B-Instruct Q4_K_M (4.4 GB) — main LLM
all-MiniLM-L6-v2 (22 MB) — embedding model

5. Initialize vector store

python scripts/setup_qdrant.py

6. Launch the application

python -m deepsearch
# or
python src/deepsearch/app.py

Quick Start

Launch the application
Drag and drop a folder or files into the file browser panel
Wait for indexing to complete (progress bar shows status)
Ask a question in the search bar — e.g., "What was the budget for Project Alpha?"
View the streamed answer with source citations in the chat panel

Model Stack

Role	Model	Format	Size	Backend
Main LLM	Qwen2.5-7B-Instruct	Q4_K_M GGUF	4.4 GB	llama-cpp-python
Small LLM	Phi-3.5-mini-instruct	Q4_K_M GGUF	2.2 GB	llama-cpp-python
Embedding	all-MiniLM-L6-v2	PyTorch	22 MB	sentence-transformers
Reranker	ms-marco-MiniLM-L-6-v2	PyTorch	22 MB	sentence-transformers
VLM	LLaVA-v1.6-mistral-7B	Q4_K_M GGUF	4.1 GB	llama-cpp-python
ASR	faster-whisper-medium	CTranslate2	1.5 GB	faster-whisper
OCR	PaddleOCR PP-OCRv4	PaddlePaddle	150 MB	PaddleOCR

Search Modes

Fast Search

Direct query against hybrid index. Best for simple factual questions.

"What is the project deadline?" → <3 seconds

Deep Search

LLM rewrites the query for clarity, performs multi-round retrieval with reranking.

"Compare all notes about React vs Vue from 2023" → 5-15 seconds

Cloud Deep (Optional)

Local retrieval + PII scrubbing + cloud LLM for complex synthesis. Requires API key.

"Analyze the relationship between all Q3 budget items and the strategic plan" → 15-40 seconds

Configuration

Edit config/default.yaml to customize:

models:
  llm_device: "auto"        # auto, cpu, cuda, openvino
  n_gpu_layers: -1           # -1 = all layers on GPU, 0 = CPU only
  context_length: 4096       # LLM context window

retrieval:
  top_k_retrieval: 50        # candidates from hybrid search
  top_k_rerank: 5            # final chunks after reranking
  rrf_k: 60                  # RRF fusion constant

indexing:
  watch_folders: []          # auto-index these folders
  chunk_size: 512            # tokens per chunk
  chunk_overlap: 50          # overlap between chunks

cloud:
  enabled: false
  provider: "openai"         # openai, anthropic
  api_key: ""                # set via env var DEEPSEARCH_CLOUD_API_KEY
  confidence_threshold: 0.65 # below this, escalate to cloud

Project Structure

DeepSearchAssistant/
├── config/              # YAML configuration files
├── docs/                # Architecture and implementation docs
├── scripts/             # Model download and setup utilities
├── src/deepsearch/      # Main application source
│   ├── backends/        # LLM backend abstraction (llama.cpp, OpenVINO)
│   ├── core/            # Config, resource management, device detection
│   ├── ingestion/       # File parsers, chunking, embedding pipeline
│   ├── retrieval/       # Hybrid search, reranking, query routing
│   ├── generation/      # LLM pipeline, cloud fallback, confidence scoring
│   ├── storage/         # Qdrant vectors, SQLite metadata, caching
│   └── ui/              # PyQt6 desktop interface
├── tests/               # 4-layer test suite
├── models/              # Downloaded model files (gitignored)
└── data/                # Vector store and database (gitignored)

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run unit tests
pytest -m unit

# Run integration tests (requires models)
pytest -m integration

# Run all tests
pytest

# Lint
ruff check src/

Hardware Acceleration

NVIDIA CUDA

Install llama-cpp-python with CUDA support. The app auto-detects CUDA and offloads LLM layers to GPU.

Intel OpenVINO (Optional)

pip install -e ".[openvino]"

When Intel GPU/NPU is detected and OpenVINO is installed, the app automatically uses OpenVINO for accelerated inference.

CPU Only

Works out of the box on any x86-64 CPU. Expect ~5-10 tokens/second for LLM generation.

License

MIT License

Acknowledgments

llama.cpp — Universal LLM inference
Qdrant — Vector search engine
sentence-transformers — Text embeddings
OpenVINO — Intel hardware acceleration
faster-whisper — Audio transcription
PaddleOCR — Optical character recognition

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
data		data
models		models
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepSearch Assistant

Features

Architecture Overview

System Requirements

Minimum

Recommended

Installation

1. Clone the repository

2. Create virtual environment

3. Install dependencies

4. Download models

5. Initialize vector store

6. Launch the application

Quick Start

Model Stack

Search Modes

Fast Search

Deep Search

Cloud Deep (Optional)

Configuration

Project Structure

Development

Hardware Acceleration

NVIDIA CUDA

Intel OpenVINO (Optional)

CPU Only

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DeepSearch Assistant

Features

Architecture Overview

System Requirements

Minimum

Recommended

Installation

1. Clone the repository

2. Create virtual environment

3. Install dependencies

4. Download models

5. Initialize vector store

6. Launch the application

Quick Start

Model Stack

Search Modes

Fast Search

Deep Search

Cloud Deep (Optional)

Configuration

Project Structure

Development

Hardware Acceleration

NVIDIA CUDA

Intel OpenVINO (Optional)

CPU Only

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages