A high-performance Rust library and CLI suite featuring an FST-backed phrase matcher, hybrid lexical/semantic search engine, and agentic document exploration loop.
- π Installation & Quick Start
- π Architecture & Core Components
- π οΈ CLI Subcommands & Tool Reference
- π Python Extractor & Q&A Generator
- π» Codebase Indexing & Search Demo
- π The Backstory: How Lume Connects
- π‘ Acknowledgements & Inspiration
- Rust & Cargo (v1.75+ recommended)
- Ollama running locally or accessible in your environment (defaults to using the cloud-backed model
gemma4:31b-cloud). - Python 3.10+ with
requestsandpypdfinstalled (for PDF indexing/Q&A generation).
Build the release profile binary:
cargo build --releaseThe compiled binary will be located at target/release/lume.
All capabilities of Lume are exposed to the autonomous agent as JSON RPC tools and map directly to CLI commands that a user can run manually to see the raw results.
This diagram represents the hybrid search pipeline executed by the lume_search tool:
graph TD
subgraph lume_search ["Tool: lume_search | CLI: lume search"]
User([User Prompt / Query]) --> Search[Hybrid Search Engine]
Search -->|1. BM25 Lexical Search| BM25[(BM25 Index)]
Search -->|2. Dense Semantic Embeddings| Vector[(Semantic Vector Cache)]
Search -->|3. Graph Boost| Graph[(Semantic Knowledge Graph)]
BM25 --> Hits[Merged & Scored Hits]
Vector --> Hits
Graph --> Hits
end
Hits --> Synthesis[Ollama/Cloud LLM Synthesis]
Synthesis --> Output([Coherent Response])
This diagram shows how keyterms are extracted via lume_index and later used to guide query planning during document summarization:
graph TD
subgraph Indexing ["Tool: lume_index | CLI: lume index"]
Doc[Raw Documents] --> Parse[Text Chunking]
Parse -->|If -o flag enabled| EntExt[LLM Keyterm & Entity Extraction]
EntExt -->|Build Entity Edges| SKG[(entity_graph.json)]
end
subgraph Summarization ["CLI: lume summarize"]
SKG -->|Extract Top 12 Keyterms by Freq| Prior[Keyterm Priority Prior]
Prior -->|Inject as prompt guide| Planner[LLM Search Planner]
Planner -->|Generate Guided Queries| Queries[Search Queries]
Queries -->|Execute lume_search Tool| Retrieval[Retrieve Passage Snippets]
Retrieval -->|Deduplicate & Aggregate| Context[Aggregated Context]
Context -->|Synthesize Summary| FinalSummary[Executive Summary]
end
This diagram represents the stateful tool-calling loop (lume agent) where the LLM plans and executes commands iteratively:
graph TD
User([User Question]) --> Agent[Agent Chat Loop]
Agent --> LLM{Ollama / Cloud LLM}
LLM -->|Wants to call a tool| Tool[Tool Dispatcher]
Tool -->|query| SearchTool["Tool: lume_search | CLI: lume search"]
Tool -->|dir, db| IndexTool["Tool: lume_index | CLI: lume index"]
Tool -->|seed, steer| GenTool["Tool: lume_generate | CLI: lume generate"]
SearchTool --> Result[Capture CLI Output]
IndexTool --> Result
GenTool --> Result
Result -->|Feed output back into history| Agent
LLM -->|Decides it has the answer| Answer[Return Final Response]
Answer --> Output([Coherent, Fact-Verified Answer])
The system is organized into the following core Rust and Python modules:
- FST-Backed Phrase Tagger: Performs longest-dominant-right matching using Lucene-style separator bytes. Built on Tagger and Entry in src/lib.rs.
- Hybrid Search Engine: Integrates BM25 lexical retrieval (Bm25Index), spelling correction (SpellIndex), and dense embeddings (src/hybrid.rs) with graph-steered query expansion (src/graph_search.rs) to boost matches based on Semantic Knowledge Graph connections.
- Steered Markov Chain Synthesizer: Under the hood, Lume uses a trigram MarkovChain to generate text. However, it goes beyond random walks by steering/biasing trigram transitions using FST tags, local attention feedback, and GTR-T5 semantic vector inversion (src/inversion.rs).
- Agent & Summarization Engine: Runs autonomous query planning, search exploration, and structured synthesis. Main entry points are run_agent_loop and summarize_document in src/agent.rs. Supports failure recovery via lume_not_found.
- Model Context Protocol (MCP): Implements an MCP server over HTTP transport in serve to expose indexing and search tools directly to AI agents.
- Python Document Extractor: A high-efficiency parser (lib/lume_extractor.py) that handles PDF page text extraction and generates Q&A benchmark datasets using concurrent Ollama threads.
Each tool exposed to the agent maps to a CLI subcommand. A user can run these directly to see raw search hits, index logs, or Markov generated texts.
Indexes a directory containing text, markdown, or PDF files.
# Basic lexical indexing
./target/release/lume index docs/my_documents
# Semantic indexing with dense vectors (-s) and Ollama Entity Graph extraction (-o)
./target/release/lume index -s -o docs/my_documents- Raw Output: Prints file indexing progress, chunk counts, semantic cache updates, and entity extraction timings.
- Flags:
-s, --semantic: Enables dense vector search (requires a NUTS token).-o, --ollama-entities: Extract central entities and constructentity_graph.json.-f, --force: Forces re-indexing of all documents.
- Options:
--db <PATH>: Destination directory for the index metadata [default:.lume-index].--ollama-model <MODEL>: Ollama model for entity extraction [default:gemma4:2b].
Queries the persisted index using lexical (BM25) or hybrid search:
# Basic BM25 search
./target/release/lume search "Edmond Dantes"
# Hybrid search (weighting: 0.5 BM25, 0.5 vector semantic) with spelling correction (-c)
./target/release/lume search -c -a 0.5 "Edmond Dantes"- Raw Output: Prints raw retrieved document passages accompanied by match scores (BM25 + Semantic + SKG Boost).
- Options:
-a, --alpha <VAL>: Hybrid weight.0.0is lexical-only;1.0is semantic-only [default:0.5].-g, --graph <VAL>: Entity graph boost weight [default:0.4]. Enables graph-steered expansion: Lume resolves entities in the query, walks one hop to their strongest neighbors inentity_graph.json, and boosts matching passage scores by the related-entity mass.-l, --limit <LIMIT>: Maximum search hits [default:10].
Synthesizes style-faithful text based on the indexed corpus using a trigram Markov Chain:
# Generate styled text starting with Dantes and guided by concept keywords
./target/release/lume generate "Dantes" --steer "revenge,castle"- Raw Output: Prints a block of synthesized text in the style of the indexed corpus.
- Modes:
- Tag-Steered Mode: Biases transitions towards the
--steertags using co-occurrence weights from the index's posting lists. - Vector-Steered Inversion Mode: Automatically embeds the target seed, inverts it into its closest semantic tags, and runs multiple candidate generation rounds to find the closest cosine-similarity match to the target prompt.
- Tag-Steered Mode: Biases transitions towards the
Summarize an entire document using an agentic planning-and-retrieval loop guided by the highest-ranking nodes in the Semantic Knowledge Graph:
./target/release/lume summarize docs/my_documents/book.pdf- How it works:
- Reads
entity_graph.jsonto identify the top 12 central concepts. - Passes these concepts as priors to the Ollama model.
- Plans a series of distinct search queries targeting the key concepts.
- Executes queries, aggregates unique passages, and synthesizes a high-level executive summary.
- Reads
Spawn an autonomous agent to research and resolve a complex question by executing indexing and search tools iteratively:
./target/release/lume agent "Explain the relationship between Villefort and Mercedes"- Structured Failure Recovery: If the agent's searches do not yield the required information, it calls a dedicated
lume_not_foundtool. The system then provides structured guidance prompting the agent to refine its query keywords or search broad/narrow variations, preventing premature halts or false answers.
Start the Model Context Protocol HTTP server to connect Lume to external AI agents:
./target/release/lume serve --port 8080Crawls a target website to extract its text/markdown representation and saves the file to the local personal search engine directory (examples/crawled/):
# Crawl a webpage
./target/release/lume crawl https://example.com
# Crawl a Hacker News story
./target/release/lume crawl https://news.ycombinator.com/item?id=8863- How it works:
- Local Crawling (Tokenless): If
GRUB_BASE_URLis set to a local endpoint (such ashttp://localhost:6792or when running locally), Lume connects to the local Grub instance and crawls without requiring any authentication orNUTS_SERVICES_TOKEN. - Remote Crawling (Authenticated): If
GRUB_BASE_URLpoints to a remote endpoint (e.g.grub.nuts.services), Lume uses theNUTS_SERVICES_TOKENenvironment variable to authenticate. If the token is missing, it falls back to direct HTTP GET (no JavaScript execution). - Hacker News Special Handling: If a Hacker News story URL is detected, Lume queries the public HN Firebase API to retrieve both the story post and its top-level discussion comments, assembling them into a clean Markdown file.
- Local Crawling (Tokenless): If
Located at lib/lume_extractor.py, this tool can extract text and generate Q&A evaluation datasets from document chunks:
# Extract text from a PDF
python lib/lume_extractor.py pdf my_doc.pdf
# Generate a Q&A evaluation benchmark using Ollama
python lib/lume_extractor.py qna my_doc.txt output_qna.json --model gemma4:31b-cloudLume can index and search programming code repositories (like Lume's own Rust source files).
Index the src/ directory containing Lume's Rust modules into a separate index database folder:
./target/release/lume index --db .lume-code-index srcQuery the code index for the run_agent_loop function to find raw code definitions:
./target/release/lume search --db .lume-code-index "run_agent_loop"[1] Score: 8.4109 | Lines 700-725 (File: src/agent.rs, Line: 703)
pub fn run_agent_loop(
question: &str,
ollama_url: &str,
ollama_model: &str,
db_dir: &str,
verbose: bool,
) -> Result<(), String> {
let url = format!("{}/api/chat", ollama_url.trim_end_matches('/'));
Lume is the story of ideas moving from one person to anotherβa search meme carried through years of crawling systems, open-source heritage, industrial search consulting, and modern AI capability.
It all started with web crawling. Back in the early days of distributed search, Kord Campbell created Grubβa massively distributed web crawler. After installing Lucene, Kord sent an email to Eric Schmidt (then-CEO of Google), saying: "Hey, I've got this super fast distributed crawler." Schmidt replied with a classic search insight: "That's not the problem. We've got crawling figured out. Indexing is the challenge."
Decades later, that conversation has come full circle. In the age of AI, crawling is everything again. To feed frontier LLMs, you have to crawl to get the content, and you need a crawler that you can control.
But once you crawl it, where do you put it?
You can't crawl the web fresh every single time you need an answer. Web pages are a type of document memory. Unlike bot or conversational memory (like an LLM remembering that a user's parrot is blue), document memory is about capturing the precise text you just saw. Some of these pages never update, while others update every minute. You need a dedicated, extremely fast local document store to hold and index this memory.
That's when the pieces fell into place. Kord was watching LinkedIn and saw Steve Harris post about porting his zero-dependency JavaScript FST tagger to Rust (released as rust-fstguardrails). Steve had run Portaltown, a search consultancy, and had worked for Lucidworks. His background as a U.S. Marine Corps air traffic controller deeply influenced how he designed systems: a focus on safety, extreme precision, and bare-metal performance.
Kord saw Steve's post and realized: "That FST tagger is the first part of our document index."
To turn that FST tagger into a complete, lightweight search engine, Kord drew on years of shared search history. During his time consulting at Lucidworks, Kord had met OG search veterans Trey Grainger and Erik Hatcher.
Trey's work on Solr's Semantic Knowledge Graph (SKG) had always stuck with Kord. The concept seemed complex, but Erik Hatcher had delivered the ultimate "aha" moment by putting it simply:
Facets are just counts of the occurrences of something in a document. The Knowledge Graph is simply looking at those counts across all documents to perform document intersections. It is just counting the counts of things.
That was the magic of Erik Hatcherβhe has always had the unique gift of taking complex technology and showing everyone how it actually works under the hood. (We throw affectionate shade at Trey for making it look complicated, and at Erik for making it look too simple!)
Understanding that primitive meant realizing a high-speed search engine didn't need millions of lines of code. It just needed to do simple things incredibly fast: FSTs for words, roaring bitmaps for set intersections, spell correction for misspellings, and additive hybrid boosting for vector context.
This hybrid design philosophy aligns with the pioneering search relevance and education work championed by Doug Turnbull, demonstrating that combining precise keyword matching, semantic embeddings, and structural graphs yields a far more reliable context for agentic search than simple vector retrieval.
Working in a continuous human-AI feedback loop, Lume's core and extended capabilities (like its stateful agent loops, MCP servers, and HTML/markdown crawling module) were constructed using state-of-the-art AI coding assistants (like Google's pair-programmer Antigravity). This collaborative process directly addresses the AI Slop Effort Problem: AI-generated code is not bad by default if it is carefully annealed, iterated, and fact-checked; what is sloppy is the quick, dismissive use of the term "slop" by software engineers who have yet to throw themselves into the deep end of human-AI pair programming.
Lume was inspired by the foundational FST-based tagging work in jsclosures/rust-fstguardrails.