LiveOS

A knowledge graph and multi-hop question-answering system. Notes — including text, audio, images, PDFs, and other documents — are automatically extracted into a structured knowledge graph. A conversational interface then answers questions over the graph using an iterative retrieval loop that traverses entity relationships across multiple notes.

What It Is

LiveOS is an AI-powered knowledge base. You write notes — plain text, voice recordings, images, PDFs, Word documents, spreadsheets — and the system:

Extracts entities, relationships, and concepts from the note using an LLM
Deduplicates entities across notes using node IDs and name normalisation
Builds a property graph where entities are nodes and LLM-extracted predicates are edges
Detects communities of related entities using the Leiden algorithm
Indexes everything into a vector store (Qdrant), a full-text search engine (Typesense), and a graph database (Kuzu)
Answers questions conversationally via an iterative retrieval loop that walks the graph, accumulates findings across hops, and synthesises a final answer
Highlights entity mentions live in notes and chat — every entity name found in ingested content is automatically underlined; clicking any mention opens an inline detail panel without leaving the page
Isolates knowledge into multiple named knowledge bases — each with its own graph, vector store, and full-text index

The system is designed to run entirely locally. All LLM inference, embedding, and reranking can run on local hardware via Ollama or LM Studio. Cloud LLM providers (Gemini, OpenAI, Anthropic, HuggingFace) are also supported and switchable via environment variables.

Quick Start

Use this path if you just want to run LiveOS on your computer.

1. Install Requirements

Install:

Docker Desktop
Python 3

On macOS, Homebrew is recommended because the setup script can use it to install missing tools.

2. Run Setup

Open a terminal in the LiveOS folder and run the command for your computer.

macOS / Linux

./setup.sh

Windows PowerShell

.\setup.ps1

The setup script downloads the required models, starts the database/search/storage services, starts Ollama, and opens the model services.

On Apple Silicon macOS, setup.sh automatically enables the native MPS inference services with macOS launchd. That means Marlin and local-models start again after login/reboot. You do not need to run a separate model command after each restart.

On Windows and Linux, the setup scripts run Marlin and local-models as Docker containers. Docker handles restart behavior.

3. Open LiveOS

When setup finishes, open:

http://localhost:3700

Useful Commands

Check whether everything is running:

docker compose ps

On macOS, check the native model services:

./scripts/host-inference.sh status

Turn off the macOS auto-start model services:

./scripts/host-inference.sh disable

Remove LiveOS containers and downloaded models:

./teardown.sh

On Windows:

.\teardown.ps1

Screenshots


Chat interface	Notes editor


Node-centred view	Node detail panel


Knowledge base manager	Runtime model settings

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         Next.js Frontend                        │
│     Notes editor · Chat interface · 3D graph visualisation      │
└────────────────────────────┬────────────────────────────────────┘
                             │ HTTP (REST, /api/v1/*)
┌────────────────────────────▼────────────────────────────────────┐
│                    FastAPI Backend (uvicorn)                    │
│                                                                 │
│          ┌─────────────────┐   ┌──────────────────┐             │
│          │ Ingestion       │   │ Chat / Retrieval │             │
│          │ Workflow        │   │ Workflow         │             │
│          │ (LangGraph)     │   │ (iterative loop) │             │
│          └────────┬────────┘   └──────────┬───────┘             │
│                   │                       │                     │
│          ┌────────▼───────────────────────▼───────┐             │
│          │              Service Layer             │             │
│          │  LLM · Embedding · Reranker · Graph    │             │
│          │  Qdrant · Typesense · Multimedia       │             │
│          └────────┬────────────────────────┬──────┘             │
└───────────────────┼────────────────────────┼────────────────────┘
                    │                        │
┌───────────────────▼──────┐   ┌─────────────▼────────────────────┐
│      Kuzu (embedded)     │   │      Docker-managed services     │
│      graph database      │   │      Qdrant · Typesense          │
│                          │   │      PostgreSQL · RustFS         │
└──────────────────────────┘   └──────────────────────────────────┘

Key design decisions

Decision	Choice	Reason
Graph database	Kuzu (embedded)	Replaces Neo4j — no container, no Cypher syntax constraints on dynamic edge types
Full-text search	Typesense	Replaces Elasticsearch — lighter, BM25 + exact match, much simpler to operate
Vector store	Qdrant	Three collections: `node_cores`, `node_relationships`, `node_isolated_contexts`
Relational DB	PostgreSQL	Stores raw notes with processing status; asyncpg for async access
Object store	RustFS	S3-compatible, Apache 2.0; stores uploaded files (audio, images, PDFs)
Embedding	`qwen3-embedding:0.6b` (local)	1024-dim; requires Qwen3 instruction prefix for queries
Reranker	`qwen3-reranker-0.6b` (local)	Cross-encoder; filters top-10 candidates before LLM context window
LLM	Configurable	Ollama, LM Studio, Gemini, OpenAI, Anthropic, or HuggingFace
Multi-KB	`KBRegistry` (JSON-persisted)	Each KB has its own Kuzu graph, Qdrant collections, and Typesense collection; notes are isolated per KB in Postgres

Ingestion Pipeline

When a note is saved, the backend triggers a LangGraph workflow:

Note saved
    │
    ▼
[1] Multimedia node
    ├── Audio (.webm/.m4a/.mp3/.wav) → Whisper large-v3-turbo transcription
    ├── Images → Florence-2-large captioning + OCR
    ├── PDF → PyMuPDF text extraction
    └── Word/Excel → python-docx / openpyxl extraction
    │
    ▼
[2] LLM extraction (single call)
    ├── Entities (name, type, description)
    ├── Relationships (subject → predicate → object, confidence, strength)
    ├── Concepts (abstract ideas from the note)
    └── Note title (folded into same call — no second round-trip)
    │
    ▼
[3] Graph write (Kuzu)
    ├── Upsert entity nodes by normalised name + type
    ├── Write SEMANTIC_REL edges with full provenance
    ├── Predicate cleaning (strip entity name tokens from rel types)
    └── Link note node → entity nodes via REFERENCES edges
    │
    ▼
[4] Vector indexing (Qdrant)
    ├── node_cores: entity summary embeddings
    ├── node_relationships: NL relationship sentence embeddings
    └── node_isolated_contexts: per-entity context embeddings
    │
    ▼
[5] Full-text indexing (Typesense)
    └── node name, type, isolated contexts, relationship NL text
    │
    ▼
[6] Community detection (Leiden, batched)
    └── Recomputed in background after each ingestion batch

Extraction stats from benchmark runs (990 HotPotQA notes):

Metric	Value
Avg entities extracted per note	9.22
Total entity nodes (post-dedup)	7,284
Deduplication rate	20.2% (9,129 instances → 7,284 unique)
Total relationships written	8,238
Unique predicate types	3,168
Predicates auto-cleaned	607
Community nodes (Leiden)	1,362
Avg ingestion time per note (local LLM)	49.71 s (Gemma3:4b via Ollama)
Avg ingestion time per note (cloud LLM)	34.02 s (Gemini Flash Lite)

Retrieval Pipeline

Chat queries go through a multi-iteration research loop, not a single vector lookup:

User query
    │
    ▼
[1] LLM query analysis
    ├── Intent classification
    ├── Entity extraction (names explicitly mentioned)
    ├── Keywords + concepts
    ├── date_filter (YYYY-MM-DD) — set when query targets a specific day
    └── period_filter (YYYY-MM) — set when query targets a whole month
    │
    ▼
[2] HYBRID RETRIEVAL (three independent result lists)
    │
    ├── entity_nodes  — exact name match via graph lookup
    │   └── Enriched from Qdrant; isolated_contexts filtered to date_filter / period_filter
    │
    ├── typesense_nodes  — BM25 keyword search (Typesense)
    │   ├── Enriched from Qdrant (description + isolated_contexts)
    │   ├── isolated_contexts filtered to matching date(s)
    │   └── Temporal post-filter applied (see below)
    │
    └── vector_nodes  — semantic search (Qdrant cosine, threshold 0.45 pre-rerank)
        ├── Date annotation appended to content: "Had coffee with John - 2026-05-10"
        └── Temporal post-filter applied (see below)
    │
    ▼
[3] TEMPORAL POST-FILTER (applied to typesense_nodes and vector_nodes)
    │
    ├── temporal_digest nodes — include only when period_filter matches period_key
    ├── community nodes      — exclude entirely for any temporal query
    └── all other nodes      — include only if isolated_contexts contain a matching
                               date string after enrichment
    │
    ▼
[4] ITERATIVE LOOP (up to 10 iterations, configurable)
    │
    ├── all_found_nodes = entity_nodes + typesense_nodes + vector_nodes
    │
    ├── Graph neighbour expansion
    │   ├── 1-hop Kuzu graph traversal from top candidates
    │   └── Qdrant NL relationship lookup for neighbour context
    │
    ├── Reranking (qwen3-reranker-0.6b, top-10)
    │
    └── LLM step
        ├── Extract FINDING from accumulated context, or
        └── Emit NEXT_QUERY for next iteration
    │
    ▼
[5] Loop exits when can_answer = True or iteration limit reached
    └── Returns best FINDING + top-6 scored context docs
    │
    ▼
[6] Response with inline note citations
    └── Model thinking exposed in collapsible dropdown (when available)

Candidate types

Type	Source	Description
`entity_match`	Entity name lookup	Nodes whose name was explicitly mentioned in the query
`keyword_match`	Typesense BM25	Nodes found by full-text keyword search
`vector_match`	Qdrant vector search	Nodes found by semantic/cosine similarity
`community_summary`	Qdrant vector search	Community rollup nodes (excluded for temporal queries)

Temporal filtering

When the LLM detects a date or month in the query it sets date_filter (YYYY-MM-DD) or period_filter (YYYY-MM). These are never passed to Qdrant — all searches run unrestricted and temporal filtering is applied as a post-filter on the accumulated results:

entity_nodes: always included; their isolated_contexts are filtered to the matching date so only relevant context is surfaced.
typesense_nodes / vector_nodes: filtered by the rules above. A node passes only if its enriched isolated_contexts contain at least one entry whose date suffix matches ("content - YYYY-MM-DD").
temporal_digest nodes: kept only for month queries where the node's period_key equals period_filter.
community nodes: excluded for all temporal queries (they hold aggregate, undated summaries).

The loop accumulates findings across iterations. On exhaustion (no can_answer=True), the last non-empty FINDING is returned without an additional synthesis call.

If the active LLM exposes reasoning (e.g. reasoning_content from LM Studio, or <think> tags from models like Gemma4), the thinking is passed through to the frontend and shown in a collapsible "Model thinking" section above the answer.

Infrastructure

Most services run as Docker containers:

docker compose up -d

Service	Image	Port	Purpose
PostgreSQL	`postgres:latest`	15432	Notes + processing status
RustFS	`rustfs/rustfs:latest`	9000 / 9001	File storage (S3-compatible)
Qdrant	`qdrant/qdrant:latest`	6333 / 6334	Vector search
Typesense	`typesense/typesense:27.1`	8108	Full-text search (BM25)
Backend	built from `./backend`	8700	FastAPI API server
Local Models	native macOS host (default) or Docker profile `docker-models`	8791	Florence, Whisper, reranker, PDF visual extraction
Marlin	native macOS host (default) or Docker profile `docker-models`	8790	Video visual understanding
Frontend	built from `./frontend`	3700	Next.js UI

Kuzu is embedded — no container needed. The database files live at data/kuzu/kuzu_graph. Marlin runs as a separate service because it requires transformers>=5.7, while Florence-2 image/PDF extraction stays on the local-models service's transformers 4.x stack.

On Apple Silicon macOS, ./setup.sh runs Marlin and local-models natively on the host using MPS. The Docker backend calls them via host.docker.internal:8790/8791. This is much faster than CPU inference inside Docker Desktop. Setup also enables them as a per-user launchd service so they restart automatically after login/reboot.

Manage them from the repo root with:

./scripts/host-inference.sh status
./scripts/host-inference.sh restart
./scripts/host-inference.sh disable
./scripts/host-inference.sh enable

Use ./scripts/host-inference.sh disable to turn off auto-start and stop the host services. Use ./scripts/host-inference.sh start for a temporary session-only start. To fall back to Dockerized model containers instead, run docker compose --profile docker-models up -d.

On Windows, setup.ps1 uses the Docker docker-models profile by default, so the model containers restart with Docker rather than using the macOS host-inference service.

Frontend

Built with Next.js 16 (App Router) and Tailwind CSS.

Route	Purpose
`/`	Landing page with navigation
`/notes`	Notes editor — create, edit, search, filter by ingestion status; supports file attachments and voice recording
`/chat`	Conversational interface — markdown rendering, file previews, thumbs up/down feedback, source citations
`/graph-3d`	3D graph visualisation
`/kb`	Knowledge base manager — create, rename, switch, and delete knowledge bases
`/settings`	Runtime LLM settings — switch provider, chat model, ingestion model, and server URL without restarting; also has maintenance controls to manually trigger community detection and temporal digest builds

The notes editor supports:

Plain text with Markdown preview
Entity mention highlighting — after ingestion, known entity names are scanned in the note at load time and highlighted as clickable badges in both edit and preview modes; no special markup is stored in the note
Entity autocomplete — typing a capitalised word surfaces matching entities from the knowledge graph as inline suggestions
Entity detail panel — clicking any highlighted entity name slides in a panel showing the entity's type, description, relationships, and isolated contexts, with a link to its node in the 3D graph
File attachments (images, audio, PDF, Word, Excel)
In-browser voice recording — audio is transcoded to AAC/M4A server-side on upload, ensuring playback works across all browsers including Safari
Per-note ingestion status filter (all / ingested / ingesting / saved / failed)
Auto-save on edit
Note content segmentation — image, PDF, and audio sections are visually separated with labelled dividers in preview mode

The chat interface supports:

Multi-hop answers with inline source citations
Entity highlighting in AI responses — entity names mentioned in answers are scanned and rendered as clickable badges; the same entity detail panel slides in on click
Collapsible model thinking display (for models that expose reasoning tokens)
Note preview modal with segmented content display and entity highlighting

LLM & Model Support

The LLM provider is set via LLM_PROVIDER in backend/.env. The same config file controls the ingestion model, chat model, embedding model, and reranker.

Supported providers:

Provider	`LLM_PROVIDER` value	Notes
Ollama	`ollama`	Default. Runs locally at `http://127.0.0.1:11434`
LM Studio	`lm_studio`	Local OpenAI-compatible server; exposes `reasoning_content` for thinking models
OpenAI	`openai`	Requires `OPENAI_API_KEY`
Google Gemini	`gemini`	Requires `GEMINI_API_KEY`
Anthropic	`anthropic`	Requires `ANTHROPIC_API_KEY`
HuggingFace	`huggingface`	Requires `HUGGINGFACE_API_KEY`

A separate INGESTION_LLM_MODEL can be set to use a different (usually smaller) model for extraction vs. chat — for example, gemma4:e4b for ingestion and gemma4:latest for chat.

An optional LLM_FALLBACK_PROVIDER kicks in if the primary provider fails.

Runtime Model Switching

The provider, chat model, ingestion model, and server URL can be changed live from the /settings page — no server restart or .env edit required.

Changes are persisted to data/runtime_config.json and re-applied automatically on the next server start, so they survive restarts. API keys are never stored here — those remain in backend/.env.

Field	What it controls
Provider	Active LLM backend (`ollama`, `lm_studio`, `gemini`, `openai`, `anthropic`, `huggingface`)
Chat model	Model used for all chat queries (`CHAT_MODEL` — highest-priority override)
Ingestion model	Model used during note ingestion — extraction, entity reasoning (`INGESTION_MODEL`)
Server URL	Base URL for local providers (LM Studio / Ollama)

Model-only changes take effect on the next request with zero overhead. Provider or server URL changes trigger a full LLM client reinitialisation in-process.

Benchmark Results

All benchmarks use the HotPotQA multi-hop QA dataset (100 questions, 990-note knowledge graph). HotPotQA requires bridging facts from two or more documents to answer correctly — it is a strong stress test for multi-hop retrieval.

Final Implementation — Model Comparison

Knowledge graph: 9,636 nodes / 8,238 relationships — ingested with gemma3:4b (local, Ollama). Infrastructure identical across all three runs; only the chat LLM varies.

Metric	Gemini 3.1 Flash Lite	Gemma3:4b (local)	Gemma4:e4b (local)
Inference	Google Cloud API	Ollama local	Ollama local
Exact Match (EM)	59.0%	30.0%	58.0%
Fuzzy Match	76.0%	41.0%	81.0%
Token F1	0.705	0.383	0.737
Contains expected	67.0%	36.0%	74.0%
Hard failures	24	59	19
Avg response time	18.10 s	91.66 s	212.10 s
Retrieval Recall	0.665	0.625	0.715
Retrieval Precision	0.330	0.351	0.349
Retrieval F1	0.441	0.449	0.469
Full-recall questions	42	37	52
EM at full recall	62%	24%	50%
Wall clock (100 Qs)	~30 min	~2.5 h	~8.75 h
Error count	0	0	0

Key takeaways:

The LLM is the dominant variable. The 29 pp EM spread (30% → 59%) is entirely attributable to reasoning quality — retrieval precision across all three runs is nearly identical (0.330–0.351).
Gemma4:e4b matches cloud accuracy locally. At 58% EM vs. 59% for Gemini Flash Lite and 81% fuzzy match (best of any run), a sufficiently large local model can match cloud API quality. The cost is 212 s mean latency.
Gemini Flash Lite is the interactive production choice. 18.10 s mean, 59% EM, 76% fuzzy — 4.4× faster than Gemma3:4b with nearly double the accuracy.
Gemma3:4b under-performs on reasoning, not retrieval. Its retrieval precision (0.351) is the highest of the three — but EM at full-recall is only 24%, versus 62% for Flash Lite. The model retrieves correctly and then fails to reason over what it retrieved.

After Optimizations — Gemma4:e4b

Post-Final Implementation pipeline optimizations evaluated with Gemma4:e4b against the same HotPotQA benchmark. Improvements include refined answer synthesis, more targeted retrieval, reduced inter-query overhead, and an updated knowledge graph (fixing the Animorphs KB gap).

Metric	Final Implementation	After Optimizations	Δ
Exact Match (EM)	58.0%	62.0%	+4 pp
Fuzzy Match	81.0%	81.0%	—
Token F1	0.7366	0.7358	−0.001
Contains expected	74.0%	74.0%	—
Hard failures	19	19	—
Fuzzy-only	23	19	−4
Avg response time	212.10 s	231.17 s	+19 s
Retrieval Recall	0.715	0.610	−0.105
Retrieval Precision	0.349	0.361	+0.012
Retrieval F1	0.469	0.453	−0.016
EM at full recall (Rc=1.0)	50%	75%	+25 pp
EM at zero recall (Rc=0.0)	56%	71%	+15 pp
Wall clock (100 Qs)	~8.75 h	~6.5 h	−2.25 h

Key takeaways:

EM improves 4 pp (58% → 62%) with no change in fuzzy match — gains are at the answer precision boundary, not in semantic correctness.
Answer synthesis at full recall dramatically improved. EM at Rc=1.0 jumped 25 pp (50% → 75%), reversing the prior counter-intuitive result where excessive context hurt exact match.
Retrieval is more precise but less exhaustive. Precision improved (+0.012) while recall dropped (−0.105) — the pipeline is more targeted, surfacing fewer gold documents overall but synthesising more accurately from the ones it finds.
Wall clock cut by 26% — near-elimination of inter-query overhead reduced total wall clock from ~8.75 h to ~6.5 h.
Animorphs KB gap resolved — the one question that failed in every prior run is now correctly answered (EM ✓, Rc=1.0, 146 s).

Historical Architecture Progression

Approach	Graph DB	Search	LLM	EM	Fuzzy	Avg response
Sub Questions (Feb 2026)	Neo4j	Elasticsearch	Gemini 3 Flash Preview	62%	75%	50.5 s
Joint Approach (Mar 2026)	Neo4j	Elasticsearch	Gemini Flash Lite	61%	—	~43 s
Final Implementation (May 2026)	Kuzu	Typesense	Gemini Flash Lite	59%	76%	18.10 s
Final Implementation (May 2026)	Kuzu	Typesense	Gemma4:e4b (local)	58%	81%	212.10 s
After Optimizations (May 2026)	Kuzu	Typesense	Gemma4:e4b (local)	62%	81%	231.17 s

The move from Neo4j + Elasticsearch to Kuzu + Typesense cut response time from ~43 s to 18.10 s (2.4×) while maintaining near-identical accuracy, and eliminated all external graph database and search service dependencies. Post-Final Implementation pipeline optimizations pushed Gemma4:e4b to 62% EM — the highest exact match of any fully local configuration evaluated.

Docker Deployment

This is the recommended way to run LiveOS. For most users, the automated setup scripts are the only commands needed.

The setup scripts start:

the LiveOS web app
the backend API
Postgres, RustFS, Qdrant, and Typesense
Ollama and the default local LLM/embedding models
local multimedia models for PDFs, images, audio, reranking, and video

Prerequisites

Docker Desktop
Python 3
Internet access for the first run, because models and Docker images are downloaded

The setup scripts install and use Ollama by default. You can switch to LM Studio or a cloud provider later from configuration.

If Docker is not installed, install Docker Desktop first, or let the setup script try to install it:

./setup.sh --install-docker

.\setup.ps1 -InstallDocker

On macOS this uses Homebrew if available. On Windows this uses winget. On Linux this uses Docker's official install script and may require logging out and back in before the docker command works without sudo.

Automated setup

Run the script for your platform from the repository root.

macOS / Linux

./setup.sh

Windows PowerShell

.\setup.ps1

The setup script:

creates backend/.env if it does not exist
configures Docker-friendly Ollama defaults in a newly created .env
installs Ollama when possible
starts Ollama and pulls gemma4:e4b plus qwen3-embedding:0.6b
downloads Florence, Whisper, Qwen reranker, and optional Marlin models into backend/models/
starts Docker services
on Apple Silicon macOS, enables native MPS model services with launchd
on Windows and Linux, starts the Docker docker-models profile

When the script finishes, open http://localhost:3700.

Platform Notes

Apple Silicon macOS: setup.sh automatically uses native MPS inference for Marlin and local-models. This is faster than running those models inside Docker. The services auto-start after login/reboot.

Manage them with:

./scripts/host-inference.sh status
./scripts/host-inference.sh restart
./scripts/host-inference.sh disable
./scripts/host-inference.sh enable

Windows / Linux: setup runs Marlin and local-models inside Docker by default. Docker restart policies keep them running when Docker Desktop or Docker Engine starts.

Optional Flags

Most users do not need these.

./setup.sh --install-docker --skip-ollama --skip-models --skip-compose --no-build --force-env --with-marlin

.\setup.ps1 -InstallDocker -SkipOllama -SkipModels -SkipCompose -NoBuild -ForceEnv -WithMarlin -NoDockerModels

--with-marlin / -WithMarlin downloads the larger optional Marlin video-visual model. If automatic Ollama installation is unavailable on your machine, install Ollama from ollama.com/download, then rerun the setup script.

Manual setup

Use these steps if you prefer to configure your provider and models yourself.

1. Clone the repo

git clone https://github.com/josetseph/LiveOS.git
cd LiveOS

2. Create your environment file

cp backend/.env.example backend/.env

Edit backend/.env and set your LLM provider and API keys. For Docker, leave the database, Qdrant, Typesense, and RustFS hostnames as their defaults; Docker Compose overrides them to service names that resolve inside the container network.

The frontend calls the API and file storage through same-origin routes (/api/v1, /health, and /files/...). Next.js proxies those routes to the backend and RustFS containers, so remote HTTPS deployments do not need to expose backend port 8700 or RustFS port 9000 to browsers.

If you are using a local model server on your machine, containers cannot reach it at 127.0.0.1. Use host.docker.internal instead:

Variable	Local machine value	Docker value
`LLM_BASE_URL`	`http://127.0.0.1:1234`	`http://host.docker.internal:1234`
`EMBEDDING_BASE_URL`	`http://127.0.0.1:11434`	`http://host.docker.internal:11434`

3. Start everything

docker compose up -d

Open http://localhost:3700.

The init container runs automatically before the backend starts. It runs migrations and initializes the storage bucket, Qdrant collections, Typesense schema, and Kuzu graph. If startup fails, check it first:

docker compose logs init

If you choose to expose the backend on a separate public origin instead of using the same-origin proxy, set CORS_ORIGINS in backend/.env to include every frontend origin, for example:

CORS_ORIGINS=http://localhost:3700,https://your-domain.com

Uninstall / Cleanup

Use the teardown script for your platform from the repository root.

macOS / Linux

./teardown.sh

Windows PowerShell

.\teardown.ps1

The teardown script removes the LiveOS Docker containers/networks, removes local Docker images by default, deletes backend/models/, and removes the Ollama models pulled by the setup script. It asks for confirmation before doing anything.

Useful options:

./teardown.sh --remove-data --uninstall-ollama --remove-all-ollama-models -y

.\teardown.ps1 -RemoveData -UninstallOllama -RemoveAllOllamaModels -Yes

Notes:

--remove-data / -RemoveData deletes ./data, including local Postgres, RustFS, Qdrant, Typesense, and Kuzu state.
--uninstall-ollama / -UninstallOllama tries to remove the Ollama app itself.
--remove-all-ollama-models / -RemoveAllOllamaModels deletes the whole Ollama model store, including models unrelated to LiveOS.
The scripts do not delete the project folder. After teardown, delete the repository directory manually if you want to remove the source code too.

Local Setup

Use this path only if you want to run the backend and frontend directly on your machine while keeping Postgres, RustFS, Qdrant, and Typesense in Docker.

Prerequisites

Docker Desktop
Python 3.11+
Node.js 20+
ffmpeg (for server-side audio transcoding — brew install ffmpeg on macOS)
Ollama or LM Studio if you are using local models

1. Start infrastructure services

docker compose up -d postgres qdrant typesense rustfs

2. Configure backend environment

cp backend/.env.example backend/.env

Because this setup runs Python from your host machine, uncomment the localhost overrides in backend/.env for Postgres, Qdrant, Typesense, and RustFS before running initialization.

3. Backend

cd backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python scripts/init_local.py
uvicorn app.main:app --reload --port 8700

4. Frontend

cd frontend
npm install
npm run dev

Open http://localhost:3700.

Local Model Setup

Three models must be downloaded manually and placed in backend/models/ before starting the backend. These run locally — no cloud API needed for multimedia processing or reranking.

Required models

Directory	Hugging Face source	Purpose
`backend/models/florence-2-large/`	`microsoft/Florence-2-large`	Vision — image captioning, OCR-like descriptions, and visual PDF pages
`backend/models/whisper-large-v3-turbo/`	`openai/whisper-large-v3-turbo`	Audio — speech-to-text transcription
`backend/models/qwen3-reranker-0.6b/`	`Qwen/Qwen3-Reranker-0.6B`	Retrieval — reranking search results

Download instructions

pip install huggingface_hub

huggingface-cli download microsoft/Florence-2-large \
  --local-dir backend/models/florence-2-large

huggingface-cli download openai/whisper-large-v3-turbo \
  --local-dir backend/models/whisper-large-v3-turbo

huggingface-cli download Qwen/Qwen3-Reranker-0.6B \
  --local-dir backend/models/qwen3-reranker-0.6b

Note: Florence-2-large requires trust_remote_code=True and includes custom modeling code. Do not rename or restructure the downloaded files.

The backend reads models from the path configured by MODELS_PATH in backend/app/core/config.py, which defaults to models relative to the backend root.

LLM models

The LLM used for chat and ingestion is configured separately and served by one of the supported providers.

Ollama (local)

ollama pull gemma4:latest          # chat model
ollama pull gemma4:e4b              # ingestion model (smaller, faster)
ollama pull qwen3-embedding:0.6b   # embedding model (required)

LM Studio (local)

Open LM Studio and go to the Discover tab
Search for a model by name (e.g. google/gemma-3-4b-it) and download it
Load the model and start the local server
Set LLM_PROVIDER=lm_studio in backend/.env — or switch provider live from /settings

Models can also be downloaded from Hugging Face and loaded into LM Studio via My Models → Load from disk.

Cloud providers

Set the API key in backend/.env and configure LLM_PROVIDER:

LLM_PROVIDER=gemini
GEMINI_API_KEY=your_key_here
LLM_MODEL=gemini-2.0-flash-lite

Environment Variables

Copy backend/.env.example to backend/.env and configure:

# LLM Provider — "ollama" | "lm_studio" | "openai" | "gemini" | "anthropic" | "huggingface"
LLM_PROVIDER=ollama
LLM_MODEL=gemma4:latest
INGESTION_LLM_MODEL=gemma4:e4b

# Embedding — "ollama" | "lm_studio" | "auto"
EMBEDDING_PROVIDER=ollama
EMBEDDING_MODEL=qwen3-embedding:0.6b

# PDF visual extraction — native text is kept for every page; 0 means render
# every visually relevant page with Florence.
PDF_VISUAL_EXTRACTION_ENABLED=true
PDF_VISUAL_EXTRACTION_MAX_PAGES=0
INGESTION_PIPELINE_CONCURRENCY=1     # FIFO: one full note ingestion at a time
MULTIMEDIA_CONCURRENCY=1            # keep local model jobs serialized on CPU

# Cloud providers (only needed if using cloud LLMs)
GEMINI_API_KEY=
OPENAI_API_KEY=
ANTHROPIC_API_KEY=

# Web fallback search (optional)
TAVILY_API_KEY=

# Retrieval tuning
RERANKER_ENABLED=true
RERANKER_TOP_K=10
MAX_LOOP_ITERATIONS=10
VECTOR_SIMILARITY_THRESHOLD=0.50
VECTOR_PRE_RERANK_THRESHOLD=0.45

# Benchmark mode (set true only when testing against external datasets)
BENCHMARK_MODE=false

Full reference: backend/app/core/config.py

Running the Stack

Reset everything (start fresh)

cd backend
source venv/bin/activate
python scripts/reset_all.py

Individual reset scripts exist for each store: reset_vectors.py, reset_index.py, reset_graph.py, reset_database.py, reset_storage.py, reset_ingestion.py.

Note: If the backend is running in Docker, restart it after reset_ingestion.py to release stale Kuzu file handles: docker compose restart backend.

Benchmark evaluation

Before running the benchmark, enable benchmark mode in backend/.env:

BENCHMARK_MODE=true

Full instructions — dataset ingestion, evaluation flags, and LLM model setup — are in backend/tests/benchmark/README.md.

cd backend
source venv/bin/activate

# Ingest the benchmark notes
python tests/benchmark/prepare_dataset.py --dataset hotpotqa

# Run evaluation
python tests/benchmark/evaluate.py --dataset hotpotqa --verbose

Results are written to backend/tests/benchmark/results/ as timestamped JSON files, and to Results/ as Markdown reports.

Knowledge Bases

LiveOS supports multiple isolated knowledge bases. Each KB maintains its own:

Kuzu graph — separate database directory under data/kuzu/<slug>/
Qdrant collections — <slug>_node_cores, <slug>_node_relationships, <slug>_node_isolated_contexts
Typesense collection — <slug>_nodes
Notes — filtered by kb_id in PostgreSQL

KBs are managed by KBRegistry (backend/app/services/kb_registry.py), a JSON-persisted singleton at data/kb_registry.json. The default KB always exists and cannot be deleted or renamed — it maps to the original single-KB configuration.

The active KB is selected via a ?kb=<slug> query parameter on all API endpoints. The frontend stores the active KB in localStorage and displays its name in the sidebar. All routes — notes, chat, graph — are automatically scoped to the active KB.

To manage knowledge bases, open /kb in the frontend or use the API:

# List all KBs
GET /api/v1/kb

# Create a KB
POST /api/v1/kb          { "name": "Work" }

# Rename a KB
PATCH /api/v1/kb/{id}    { "name": "New Name" }

# Delete a KB (drops all data)
DELETE /api/v1/kb/{id}

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
Platform Images		Platform Images
Results		Results
backend		backend
frontend		frontend
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
setup.ps1		setup.ps1
setup.sh		setup.sh
teardown.ps1		teardown.ps1
teardown.sh		teardown.sh

Folders and files

Latest commit

History

Repository files navigation

LiveOS

Table of Contents

What It Is

Quick Start

1. Install Requirements

2. Run Setup

3. Open LiveOS

Useful Commands

Screenshots

Architecture

Key design decisions

Ingestion Pipeline

Retrieval Pipeline

Candidate types

Temporal filtering

Infrastructure

Frontend

LLM & Model Support

Runtime Model Switching

Benchmark Results

Final Implementation — Model Comparison

After Optimizations — Gemma4:e4b

Historical Architecture Progression

Docker Deployment

Prerequisites

Automated setup

Platform Notes

Optional Flags

Manual setup

1. Clone the repo

2. Create your environment file

3. Start everything

Uninstall / Cleanup

Local Setup

Prerequisites

1. Start infrastructure services

2. Configure backend environment

3. Backend

4. Frontend

Local Model Setup

Required models

Download instructions

LLM models

Environment Variables

Running the Stack

Reset everything (start fresh)

Benchmark evaluation

Knowledge Bases

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages