ragrep

ripgrep for your team's knowledge base. Hybrid retrieval, self-hosted, single command.

Search across your Slack, Confluence, Jira, Google Drive, Git history, Bitbucket PRs, and local files from one CLI. Dense vectors + BM25 + reranking. No SaaS, no per-seat pricing, no API quotas you didn't choose.

curl -fsSL https://ragrep.cc/install.sh | sh

ragrep "how does the auth flow work"

Why ragrep

Hybrid retrieval, not just embeddings. FAISS cosine + BM25 fused with Reciprocal Rank Fusion, then optionally reranked. The combo recalls cross-source content that pure dense or pure sparse misses.
Self-hosted, no lock-in. Your data stays on your laptop, your VM, or your Cloud Run. The index is three files (faiss.index, chunks.pkl, bm25.pkl) — back them up with cp.
Real ingestion that scales. Content-hash deduped embedding cache means re-indexing only embeds what changed. Adaptive rate-limiting learns the embedding API's actual ceiling at runtime, with checkpoint resumability for multi-hour runs.

Install

curl -fsSL https://ragrep.cc/install.sh | sh

The installer pulls uv if you don't have it, then installs ragrep with the [full] extras (FAISS, embedding providers, retrieval).

Alternatives:

# Direct via uv
uv tool install 'ragrep[full] @ git+https://github.com/fntune/ragrep'

# From source
git clone https://github.com/fntune/ragrep && cd ragrep
uv sync --extra full

Then create ~/.config/ragrep/.env (or a .env in your CWD) with your API keys — see .env.example.

30-second demo

# Search modes — defaults to hybrid
ragrep "rate limit handling"
ragrep "deploy process" -m grep         # exact substring
ragrep "sales pipeline" -m semantic     # FAISS only

# Filter
ragrep "incident" -s slack              # one source
ragrep "release notes" --after 2w       # last 2 weeks
ragrep "auth" -f author=alice           # metadata filter
ragrep "config" -n 20                   # top 20

# Output
ragrep "auth" --json                    # JSON for agents/scripts
ragrep "auth" --scores                  # include retrieval scores
ragrep "auth" --full                    # full chunk content

# Server mode (no local index needed)
export RAGREP_SERVER=http://your-server:8321
ragrep "query"

Compared to

	ragrep	LlamaIndex / LangChain	Pinecone / Vectara	ripgrep
Install	`curl ragrep.cc/install.sh \| sh`	`pip install` + write code	SaaS account	`brew install`
Hybrid retrieval	✅ FAISS + BM25 + rerank	⚠️ DIY composition	⚠️ varies	❌ exact match only
Multi-source ingest	✅ Slack, Confluence, Drive, Git, files	DIY	DIY or paid connectors	filesystem only
Self-hosted	✅	✅ (you assemble)	❌	✅
Pricing	free + your own embedding API	free + your own	per-vector / per-query	free
Best for	Search your team's knowledge fast	Building custom RAG pipelines	Production-grade managed RAG	Searching code

Data sources

Source	What it ingests	Required credentials
`slack`	Messages, threads, pins, bookmarks, file contents	`SLACK_TOKEN`
`atlassian`	Confluence pages, Jira issues + comments	`CONFLUENCE_URL`, `JIRA_URL`, `ATLASSIAN_USERNAME`, `ATLASSIAN_API_TOKEN`
`gdrive`	Docs, Sheets, Slides, PDFs	`gcloud auth application-default login`
`git`	Commit messages from local repos	— (local filesystem)
`bitbucket`	PR descriptions + comments	`BITBUCKET_OAUTH_SECRET`
`files`	PDF, DOCX, PPTX, XLSX (extracted via Gemini)	`GOOGLE_API_KEY`

ragrep scrape                       # all sources configured in config.toml
ragrep scrape --source slack,git    # subset
ragrep ingest                       # build the index (incremental)
ragrep ingest --force               # re-embed everything
ragrep stats                        # show index stats

Configure source-specific options in config.toml:

[scrape.slack]
date_cutoff = "2024-01-01"

[scrape.git]
repos = ["../myrepo", "../otherrepo"]
since = "2024-01-01"

[scrape.bitbucket]
workspace = "your-workspace"
states = ["MERGED", "OPEN"]
since = "2024-01-01"

Python SDK

from ragrep import Index, Document, EmbedModel

index = Index(
    index_dir="data/index",
    embedding_models=[
        EmbedModel(provider="voyage", model_name="voyage-code-3"),
    ],
    reranker_provider="voyage",
    reranker_model="rerank-2.5",
    dedup_threshold=0.5,
)

stats = index.ingest([
    Document(id="doc1", source="docs", content="...", title="Getting Started"),
    Document(id="doc2", source="code", content="...", title="auth.py"),
])

results = index.query("how does authentication work", top_k=5)
for r in results:
    print(r.score, r.title, r.snippet)

Multi-model embedding for better recall:

index = Index(
    "data/index",
    embedding_models=[
        EmbedModel(provider="voyage", model_name="voyage-code-3"),
        EmbedModel(provider="sentence-transformers", model_name="Qwen3-Embedding-0.6B"),
    ],
)

HTTP server

Serve search over HTTP for multi-client use, or to back an internal app:

make serve    # localhost:8321

curl "http://localhost:8321/search?q=auth+flow&mode=hybrid&n=5"
curl "http://localhost:8321/search?q=deploy&mode=grep"

# CLI against the server (no local index needed on the client)
export RAGREP_SERVER=http://localhost:8321
ragrep "query"

Set RAGREP_GCS_BUCKET to auto-download the index from GCS at server startup. Or set RAGREP_INDEX_DIR to point at a pre-positioned local index.

Embedding providers

Provider	Default model	Notes
`voyage` (default)	`voyage-code-3`	API, 1024d, code + text in same space
`sentence-transformers`	`Qwen3-Embedding-0.6B`	local, GPU/MPS recommended

Rerankers: voyage/rerank-2.5 (API) or sentence-transformers/cross-encoder/ms-marco-MiniLM-L-6-v2 (local).

Remote ingestion (Modal)

For embedding many sources in parallel without burning your laptop:

pip install modal && modal token new
modal volume create ragrep-index
modal secret create voyage-api-key VOYAGE_API_KEY=pa-...

modal run --detach modal_ingest.py
modal volume get ragrep-index index data/index

Override the volume / secret names via RAGREP_MODAL_VOLUME and RAGREP_VOYAGE_SECRET env vars before modal run. Voyage embedder has adaptive rate limiting and checkpoint resumability — multi-hour runs survive crashes.

Architecture

scrape → normalize → chunk → embed (incremental) → store (FAISS + BM25)
                                                         |
                                           query → retrieve → rerank → generate

Retrieval pipeline:

Embed query with each configured model
FAISS cosine search + BM25 per model
RRF fusion (k=60) across all result lists
Voyage / cross-encoder reranking
Jaccard dedup on content overlap

Incremental ingestion: embed_cache.pkl maps sha256(content) → vector. Only new/changed chunks hit the embedding API. First run bootstraps from existing FAISS via reconstruct_n.

src/ragrep/
  __init__.py      public API: Index, Document, EmbedModel
  index.py         SDK: ingest() and query()
  models.py        Document, Chunk, SearchResult, QueryResult
  cli.py           CLI
  server.py        FastAPI HTTP server
  ingest/
    pipeline.py    orchestrator
    scrape_*.py    source scrapers
    embed.py       providers + adaptive rate limiting + checkpointing
    store.py       FAISS + BM25 persistence
  query/
    retrieve.py    hybrid retrieval + RRF
    rerank.py      reranking providers
    generate.py    Ollama generation
  eval/
    harness.py     per-stage metrics (dense / BM25 / RRF / rerank)

Development

make check    # ruff + mypy
make test     # pytest

See CONTRIBUTING.md for setup and conventions.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
docs		docs
src/ragrep		src/ragrep
tests		tests
.env.example		.env.example
.gcloudignore		.gcloudignore
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.toml		config.toml
modal_ingest.py		modal_ingest.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ragrep

Why ragrep

Install

30-second demo

Compared to

Data sources

Python SDK

HTTP server

Embedding providers

Remote ingestion (Modal)

Architecture

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ragrep

Why ragrep

Install

30-second demo

Compared to

Data sources

Python SDK

HTTP server

Embedding providers

Remote ingestion (Modal)

Architecture

Development

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages