Skip to content

Obsidian68/Engram

Repository files navigation

Engram

Give your AI agents a memory that persists, searches by meaning, and lives in plain files on your own machine.

Why Engram?

AI agents forget everything between sessions. Every conversation starts from scratch — no recall of past decisions, no accumulated knowledge, no continuity. You can wire up a database, but then you're running infrastructure and writing queries instead of building your agent.

Engram gives agents persistent memory through a REST API and an MCP server. Write a memory, search for it later by meaning, and everything is stored as readable Markdown files you control. No cloud service, no API keys, no database to manage. Point it at a directory and start storing memories.

When you write a memory, Engram checks whether you already have a similar one. If it's genuinely new, it's added. If it's a duplicate, the existing one is kept. If it's an update, the old memory is replaced — preserving its importance score. You decide how strict the deduplication is, and you can bring your own LLM to make the call when similarity is ambiguous.

Each agent gets its own namespace, so multiple agents can share the same Engram instance without stepping on each other. Search combines vector similarity with keyword matching, weighted by importance. If embeddings aren't available, CRUD still works — search returns 503 until you fix the embedding provider.

Glossary

  • vault: A directory where Engram stores memory files. Each memory is one Markdown file with YAML metadata at the top. You choose where this directory lives.
  • agent: A program or tool that reads and writes memories through Engram's API. Each agent gets its own isolated namespace within the vault.
  • agent_id: A string that identifies an agent's namespace (for example, my-agent). It becomes a subdirectory name inside the vault, so it cannot contain path separators (/, \, ..) or Windows-illegal filename characters (< > : " &#124; ? *). The string "shared" is reserved and cannot be used as an agent_id.
  • memory: A piece of text stored in the vault. Each memory is a Markdown file with YAML frontmatter containing metadata (agent ID, importance score, timestamps, tags).
  • slug: A URL-safe identifier generated from the memory's content and agent ID. Combined with the date to form the memory's unique ID. Example: deployed-v2-to-production-my-agent-2026-05-07.
  • frontmatter: YAML metadata at the top of each memory file, between --- delimiters. Contains agent, created, id, importance, tags, type, and updated fields.
  • daemon: A background process that runs the Engram server without tying up your terminal. Started with engram start --daemon.
  • endpoint: A URL path that accepts HTTP requests. Engram's REST endpoints include /agents/{agent_id}/memories, /agents/{agent_id}/memories/search, /agents/{agent_id}/inject, /agents/{agent_id}/system-prompt, and /health.
  • MCP (Model Context Protocol): A protocol that lets AI tools call Engram's memory operations as tools. The MCP server runs as a separate process on port 7778 by default.

Features

  • Persistent memory — store text memories that survive across sessions, each one a human-readable Markdown file
  • Smart deduplication — every write checks for similar memories: add new ones, ignore duplicates, or update existing ones with preserved importance
  • LLM-assisted decisions — when similarity is ambiguous, consult a local LLM to decide whether to add, update, or ignore
  • Semantic search — find memories by meaning, not just exact keyword matches
  • Importance scoring — tag memories with priority; scores decay over time and get bumped on retrieval
  • Multi-agent isolation — each agent gets its own namespace; no overlap, no conflicts
  • Shared memories — when shared mode is on, private writes are also copied to a shared namespace that all agents can search
  • Memory injection — an endpoint that automatically loads the most relevant memories into an agent's context, with score filtering and importance updates
  • Local-first and private — no cloud, no API keys, no telemetry. Your data stays on your machine
  • Human-readable storage — every memory is a Markdown file you can read, edit, and version-control
  • Automatic indexing — memories are chunked and indexed as you write them, no manual rebuilds
  • Graceful degradation — CRUD works even without embeddings; search returns 503 until the provider is available
  • MCP server — 7 memory operations (write, search, read, delete, list, inject, get_system_prompt) available via MCP tools, running as a separate process on its own port
  • File watching — detects changes to vault files (edits from Obsidian, other tools) and re-indexes automatically

How It Works

Engram runs a local HTTP server. Agents interact with it through a REST API or MCP tools — create, read, list, delete, search, and inject memories. Each memory is stored as a Markdown file with YAML frontmatter inside a vault directory you choose. A LanceDB index handles search, combining vector embeddings with keyword matching and importance-weighted reranking.

When you create a memory, the smart write pipeline runs: the content is embedded, compared against existing memories, and a decision is made — add it as new, update an existing one, or ignore it as a duplicate. When you search, results are ranked by relevance and importance, and each result's importance score is decayed and bumped so frequently accessed memories stay fresh.

When shared mode is on, every private write also creates a copy in the shared namespace (agent_id shared). The private write always happens first — if it fails, no shared copy is created. Agents can then search the shared namespace alongside their own, or inject memories from both namespaces at once.

When you create a memory, it looks like this on disk:

---
agent: my-agent
created: "2026-05-06T04:39:16.923211+00:00"
id: deployed-v2-to-production-my-agent-2026-05-07
importance: 0.9
importance_updated: "2026-05-06T04:39:16.923211+00:00"
tags:
  - deploy
  - production
type: memory
updated: "2026-05-06T04:39:16.923211+00:00"
---

Deployed v2 to production on Saturday

You can open this file in any text editor, edit it directly, or put the vault directory under version control. The file watcher detects external changes and re-indexes automatically.

Quick Start

1. Install dependencies

This command works on all platforms:

uv sync --extra dev

It creates a virtual environment and installs Engram with all dependencies. Do not create a virtual environment manually — uv manages its own .venv.

2. Create a vault directory

Engram stores memories as Markdown files inside a vault directory. Choose any location. These are examples, not required paths:

Platform Example path
macOS /Users/you/.engram/vault
Linux /home/you/.engram/vault
Windows C:\Users\you\.engram\vault

You can also point Engram at an existing Obsidian vault — any directory works.

Create the directory you chose:

# macOS / Linux
mkdir -p ~/.engram/vault
# Windows PowerShell
New-Item -ItemType Directory -Path "$env:USERPROFILE\.engram\vault" -Force
# Windows CMD
mkdir "%USERPROFILE%\.engram\vault"

3. Configure environment variables

Copy the example configuration file:

# macOS / Linux
cp .env.example .env
# Windows PowerShell
Copy-Item .env.example .env
# Windows CMD
copy .env.example .env

Then edit .env and set ENGRAM_VAULT_PATH to the directory you created:

# macOS / Linux
ENGRAM_VAULT_PATH=~/.engram/vault
# Windows
ENGRAM_VAULT_PATH=C:\Users\you\.engram\vault

ENGRAM_VAULT_PATH is the only required variable. All others have defaults.

4. Start the server

This command works on all platforms:

uv run engram start

When the server starts, you will see:

2026-05-06 14:07:35.000 | INFO     | engram.cli.cli:start:148 - Starting Engram on 127.0.0.1:7777
INFO:     Started server process [13952]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:7777 (Press CTRL+C to quit)

The timestamp, line number, and PID vary each time. The key line is Uvicorn running on http://127.0.0.1:7777.

5. Verify the server is running

This command works on all platforms:

curl http://127.0.0.1:7777/health

Response (200):

{
  "status": "healthy",
  "version": "1.4.3",
  "components": {
    "vault": "healthy",
    "lancedb": "healthy",
    "embeddings": "healthy",
    "mcp_port": 7778,
    "shared": "disabled",
    "watcher": "healthy"
  }
}

On Windows PowerShell, use Invoke-RestMethod http://127.0.0.1:7777/health | ConvertTo-Json -Depth 5 instead.

The mcp_port component shows the MCP port number (an integer) when ENGRAM_MCP_ENABLED=true, or "disabled" when ENGRAM_MCP_ENABLED=false. The shared component shows "disabled" when ENGRAM_SHARED_MODE is false (the default), and "healthy" when enabled and the shared directory exists. Disabled components (mcp_port, shared, watcher) do not affect the overall health status.

6. Stop the server

Press Ctrl+C in the terminal running the server. If running as a daemon:

uv run engram stop

This command works on all platforms.

CLI Reference

engram start

Start the Engram server.

# Start in foreground (default)
uv run engram start

# Start on a custom host and port
uv run engram start --host 0.0.0.0 --port 9000

# Start as a background daemon
uv run engram start --daemon
Flag Default Description
--host TEXT 127.0.0.1 (from ENGRAM_HOST) Host address to bind to
--port INTEGER 7777 (from ENGRAM_PORT) Port to bind to
--daemon, -d off Run as a background daemon

In foreground mode, press Ctrl+C to stop. In daemon mode, use engram stop.

On Windows, daemon mode uses CREATE_NEW_PROCESS_GROUP and CREATE_NO_WINDOW. If it does not work as expected, use foreground mode (the default).

engram stop

Stop a running Engram daemon.

uv run engram stop

If no server is running:

No running Engram server found

Exit code: 1. On Windows, the stop command uses taskkill /F /PID instead of SIGTERM.

REST API Reference

All memory endpoints are prefixed with /agents/{agent_id}. The agent_id is a string that identifies the agent namespace (for example, my-agent). The following characters are rejected with a 400 error: path separators (/, \, ..) and Windows-illegal filename characters (< > : " &#124; ? *). The string shared is also rejected as a reserved namespace.

Health Check

curl http://127.0.0.1:7777/health

This command works on all platforms. Response (200):

{
  "status": "healthy",
  "version": "1.4.3",
  "components": {
    "vault": "healthy",
    "lancedb": "healthy",
    "embeddings": "healthy",
    "mcp_port": 7778,
    "shared": "disabled",
    "watcher": "healthy"
  }
}

Response when the vault directory is missing (503):

{
  "status": "unhealthy",
  "version": "1.4.3",
  "components": {
    "vault": "unhealthy",
    "lancedb": "unhealthy",
    "embeddings": "unhealthy",
    "mcp_port": "disabled",
    "shared": "disabled",
    "watcher": "disabled"
  }
}

Write a Memory

When you write a memory, Engram checks for similar existing memories first. There are four possible outcomes:

  • added — no similar memory found, or the similarity is below the add threshold. A new memory is created. Returns 201.
  • merged — a similar memory exists and the LLM decides the incoming content adds new facts to it. The incoming content is appended to the existing memory's body. Returns 200 with the existing memory's ID.
  • updated — a similar memory exists and the LLM decides the incoming content replaces it. The old memory is deleted and a new one is created with preserved importance. Returns 200.
  • ignored — a very similar memory already exists (above the ignore threshold and ENGRAM_SIMILARITY_IGNORE_ENABLED is true). No new memory is written. Returns 200 with the existing memory's ID.

macOS / Linux:

curl -X POST http://127.0.0.1:7777/agents/my-agent/memories \
  -H "Content-Type: application/json" \
  -d '{"content":"Deployed v2 to production on Saturday","tags":["deploy","production"],"importance":0.9}'

Windows CMD:

curl -X POST http://127.0.0.1:7777/agents/my-agent/memories -H "Content-Type: application/json" -d "{\"content\":\"Deployed v2 to production on Saturday\",\"tags\":[\"deploy\",\"production\"],\"importance\":0.9}"

Request body fields:

Field Type Required Description
content string yes Memory text (minimum 1 character)
tags string[] no List of tags (default: [])
importance float no Importance score 0.0 to 1.0 (default: 0.5)

Response for added (201):

{
  "decision": "added",
  "id": "deployed-v2-to-production-my-agent-2026-05-07",
  "similarity_score": null
}

The id is generated from the content, agent ID, and current date — your id will contain today's date. The similarity_score shows how similar the incoming content was to the best match (null for added memories with no match).

Response for merged (200):

{
  "decision": "merged",
  "id": "deployed-v2-to-production-my-agent-2026-05-07",
  "similarity_score": 0.72
}

Response for updated (200):

{
  "decision": "updated",
  "id": "new-slug-my-agent-2026-05-07",
  "similarity_score": 0.72
}

When the decision is "merged", the incoming content is appended to the existing memory's body. The id is the existing memory's ID — it does not change. When the decision is "updated", a new slug is generated from the incoming content, so the id differs from the original.

Response for ignored (200):

{
  "decision": "ignored",
  "id": "deployed-v2-to-production-my-agent-2026-05-07",
  "similarity_score": 0.95
}

List Memories

curl http://127.0.0.1:7777/agents/my-agent/memories

This command works on all platforms. Response (200): an array of memory objects. Returns [] if the agent has no memories.

[
  {
    "id": "deployed-v2-to-production-my-agent-2026-05-07",
    "agent": "my-agent",
    "type": "memory",
    "importance": 0.9,
    "tags": ["deploy", "production"],
    "created": "2026-05-07T12:41:49.276903+00:00",
    "updated": "2026-05-07T12:41:49.276903+00:00",
    "importance_updated": "2026-05-07T12:41:49.276903+00:00",
    "body": "Deployed v2 to production on Saturday"
  }
]

The created, updated, and importance_updated timestamps vary each time.

Read a Memory

curl http://127.0.0.1:7777/agents/my-agent/memories/deployed-v2-to-production-my-agent-2026-05-07

This command works on all platforms. Returns a single memory object in the same format as List Memories.

Response (404):

{ "detail": "Memory not found" }

Delete a Memory

curl -X DELETE http://127.0.0.1:7777/agents/my-agent/memories/deployed-v2-to-production-my-agent-2026-05-07

This command works on all platforms. Response (204): empty body on success.

Response (404):

{ "detail": "Memory not found" }

Search Memories

macOS / Linux:

curl "http://127.0.0.1:7777/agents/my-agent/memories/search?q=production+deploy&limit=5"

Windows CMD:

curl "http://127.0.0.1:7777/agents/my-agent/memories/search?q=production+deploy&limit=5"

Query parameters:

Parameter Type Required Description
q string yes Search query (minimum 1 character)
limit integer no Maximum results to return, 1–100 (default: 10)

Response (200): an array of search result objects ranked by relevance. Importance scores are updated on each retrieval — decayed by time since last access, then bumped by the hit increment.

[
  {
    "id": "deployed-v2-to-production-my-agent-2026-05-07",
    "score": 0.3036363672126423,
    "importance": 0.85,
    "chunk": "Deployed v2 to production on Saturday",
    "agent": "my-agent",
    "created": "2026-05-07T12:41:49.276903+00:00"
  }
]

The score and importance values vary based on the query, the age of the memory, and how many times it has been retrieved.

Missing query parameter (422):

{
  "detail": [
    {
      "type": "missing",
      "loc": ["query", "q"],
      "msg": "Field required",
      "input": null
    }
  ]
}

Search unavailable (503):

{ "detail": "Search index not available" }
{ "detail": "Search unavailable: embedding provider is not configured" }

Inject Memories

The inject endpoint returns the most relevant memories for a query, filtered by a minimum score and capped by a maximum count. It is designed for auto-loading context into an agent before generating a response.

Agent-specific inject:

macOS / Linux:

curl "http://127.0.0.1:7777/agents/my-agent/inject?q=production+deploy&limit=5"

Windows CMD:

curl "http://127.0.0.1:7777/agents/my-agent/inject?q=production+deploy&limit=5"

When ENGRAM_SHARED_MODE is enabled, this endpoint searches both the agent's namespace and the shared namespace, merges results by score plus importance, and returns the top matches. When shared mode is off, it searches only the agent's namespace.

Shared inject:

curl "http://127.0.0.1:7777/shared/inject?q=deployment+procedures&limit=5"

This command works on all platforms. Searches only the shared namespace regardless of shared mode setting.

Query parameters:

Parameter Type Required Description
q string yes Injection query (minimum 1 character)
limit integer no Maximum results to return, 1–100 (capped by injection_top_n)

Response (200):

{
  "memories": [
    {
      "id": "deployed-v2-to-production-my-agent-2026-05-07",
      "body": "Deployed v2 to production on Saturday",
      "importance": 0.9,
      "score": 0.3186363171447407,
      "agent": "my-agent",
      "tags": ["deploy", "production"],
      "created": "2026-05-07T12:41:49.276903+00:00"
    }
  ],
  "count": 1,
  "query": "production deploy"
}

Results are filtered by injection_min_score (default: 0.3) and capped by injection_top_n (default: 5). Importance scores are updated on each retrieval — decayed by time since last access, then bumped by the hit increment.

Inject unavailable (503):

{ "detail": "Search index not available" }
{ "detail": "Search unavailable: embedding provider is not configured" }

Missing query parameter (422):

{
  "detail": [
    {
      "type": "missing",
      "loc": ["query", "q"],
      "msg": "Field required",
      "input": null
    }
  ]
}

Invalid agent_id (400):

{ "detail": "agent_id contains illegal characters: 'bad<agent'" }

Error Responses

Status When
400 Agent ID or memory ID contains path separators, Windows-illegal characters, or shared
404 Memory not found (on read or delete)
422 Request body validation failed (for example, empty content)

Agent ID with illegal characters (400):

{ "detail": "agent_id contains illegal characters: 'bad<agent'" }

Memory ID with path traversal (400):

{ "detail": "Invalid memory_id: 'test-bad..agent-2026-05-06'" }

Memory not found (404):

{ "detail": "Memory not found" }

Empty content (422):

{
  "detail": [
    {
      "type": "string_too_short",
      "loc": ["body", "content"],
      "msg": "String should have at least 1 character",
      "input": "",
      "ctx": { "min_length": 1 }
    }
  ]
}

MCP Server

Engram includes a Model Context Protocol (MCP) server that exposes 7 memory operations as tools. It runs as a separate process on port 7778 by default, alongside the REST API on port 7777.

The MCP server is enabled by default (ENGRAM_MCP_ENABLED=true). To disable it, set ENGRAM_MCP_ENABLED=false in your .env file. The MCP port is configured with ENGRAM_MCP_PORT (default: 7778).

MCP Tools

Tool Parameters Description
memory_write content, agent_id, tags?, importance? Write a new memory
memory_search query, agent_id, limit? Search memories
memory_read agent_id, memory_id Read a single memory
memory_delete agent_id, memory_id Delete a memory
memory_list agent_id List all agent memories
memory_inject query, agent_id, limit? Inject relevant memories into context
memory_get_system_prompt agent_id Get the system prompt block for an agent

memory_write returns the same MemoryWriteResponse format as the REST write endpoint. memory_search returns the same result format as the REST search endpoint. memory_read returns the same memory object as the REST read endpoint. memory_delete returns {"deleted": "memory-id"} on success or {"error": "error message"} on failure. memory_list returns an array of memory objects. memory_inject returns the same InjectionResponse format as the REST inject endpoint. memory_get_system_prompt returns a SystemPromptResponse with agent_id, version, and system_prompt_block.

When shared_mode is enabled, memory_inject searches both the agent's namespace and the shared namespace.

System Prompt Kit

The system_prompt_kit.md file at the project root is a ready-to-use system prompt for agents connecting to Engram via MCP. It covers:

  • MEMORY MANDATE — 5 absolute rules for exclusive Engram usage
  • When to write memories (and when not to)
  • How to write effective memories
  • How to search effectively
  • Memory recall — memories are injected automatically; do not mention the memory system unless asked
  • Full reference for all 7 MCP tools with parameters and examples

Copy it into your agent's system prompt configuration to give the agent structured access to Engram's memory tools.

System Prompt Endpoint

GET /agents/{agent_id}/system-prompt

Returns a pre-rendered system prompt block for the given agent. Used by orchestrators to prepend Engram's behavioral mandate to the agent's system prompt before session start.

Response:

{
  "agent_id": "my-agent",
  "version": "1.4.3",
  "system_prompt_block": "## 0. MEMORY MANDATE\n\nEngram is the memory system for this session. It stores and retrieves\nknowledge on your behalf.\n\n1. No built-in memory tools exist. All persistent memory goes through\nEngram.\n2. Writes go through memory_write. Reads go through memory_search or\nmemory_read.\n3. If Engram is unreachable, say so — do not fall back silently.\n4. Engram decides what to keep. Your job is to write, not to judge what\nis worth storing.\n5. Do not mention the memory system, memory tools, or memory operations\nin your responses unless the user asks about memory.\n\n## 1. What Engram Is\n\n...\n\n## 7. MCP Tool Reference\n\n..."
}

The full system_prompt_block contains 8 sections: the MEMORY MANDATE (section 0), behavioral guidance (sections 1-6), and tool reference for all 7 MCP tools (section 7). The complete text is in system_prompt_kit.md at the project root. Identical output is available via the memory_get_system_prompt MCP tool.

Copy it into your agent's system prompt configuration to give the agent structured access to Engram's memory tools.

OpenClaw Bridge

Engram includes a TypeScript plugin for OpenClaw-compatible agents. The bridge auto-recalls relevant memories before each LLM call, auto-captures conversations when a turn ends, and blocks the agent's native memory tools so all memory flows through Engram.

Setup

The bridge lives in the openclaw-bridge/ directory. Build it before use:

# All platforms — requires Node.js
cd openclaw-bridge
npm install
npm run build

Verify the build:

npm test

This produces dist/index.js. See docs/connect-openclaw.md for the 9-step connection guide.

What the Bridge Does

  1. Auto-recall (before_prompt_build) — before each LLM call, queries Engram for memories matching the prompt, wraps them in <engram_recalled> tags, and prepends them as context (capped at 1500 characters)

  2. Auto-capture (agent_end) — after a turn ends, extracts all user, assistant, and tool messages, serializes structured content blocks, strips <engram_recalled> blocks, and writes the exchange to Engram with ["auto-capture"] tags

  3. Native memory blocking (before_tool_call) — blocks 6 native OpenClaw memory tools (memory_search, memory_get, memory_add, memory_delete, memory_list, memory_flush)

Both hooks require agentId to be set in the OpenClaw context — if it is undefined, the hooks return early without calling Engram.

Configuration

The bridge reads restEndpoint from the OpenClaw plugin config (openclaw.plugin.json). Default: http://localhost:7777.

Smart Write Pipeline

When you write a memory, the smart write pipeline runs automatically. You don't configure it separately — it's built into the write endpoint. Here's how it decides what to do:

  1. Embed the incoming content
  2. Find similar — search the index for the top 3 most similar memories for the same agent. When shared mode is on, also search the shared namespace
  3. Threshold check:
    • Similarity below SIMILARITY_ADD_THRESHOLD (default 0.25) → add as new
    • Similarity at or above SIMILARITY_IGNORE_THRESHOLD (default 0.85) and SIMILARITY_IGNORE_ENABLED is true → ignore as duplicate
    • Similarity at or above SIMILARITY_IGNORE_THRESHOLD (default 0.85) and SIMILARITY_IGNORE_ENABLED is false (default) → consult LLM
    • Similarity between the add threshold and the ignore threshold → consult LLM
  4. LLM consultation — send the incoming content and similar memories to a local Ollama model, which decides add, merge, update, or ignore
  5. Execute — add a new memory, merge the incoming content into an existing one (appending with \n\n), update the existing one (preserving its importance), or do nothing

If the LLM is unreachable, the pipeline falls back to "add" — keeping your data is always preferred over losing it.

Auto-ignore toggle

By default, high-similarity memories are sent to the LLM for a decision rather than auto-ignored. Set ENGRAM_SIMILARITY_IGNORE_ENABLED=true to restore the old behavior where scores at or above the ignore threshold are automatically discarded without LLM consultation.

Merge decision

When the LLM decides "merge", the incoming content is appended to the existing memory's body with a double-newline separator. The same memory ID, importance score, importance_updated timestamp, and created timestamp are preserved. Only the updated timestamp changes. This is distinct from "update", which replaces the content entirely and generates a new slug.

Shared Mode

When ENGRAM_SHARED_MODE is enabled, every private write also creates a shared copy with agent: "shared" and a new slug. The private write always happens first — if it fails, no shared copy is created. Shared copy failures log a warning but never fail the operation.

Agents can access shared memories in two ways:

  1. Inject endpoint (/agents/{agent_id}/inject) — when shared mode is on, automatically searches both the agent's namespace and the shared namespace
  2. Shared inject endpoint (/shared/inject) — searches only the shared namespace directly

The string "shared" is a reserved agent_id. Attempting to use it directly in a write, read, or delete request returns a 400 error.

Importance Scoring

Every memory has an importance score between 0.0 and 1.0. You set it when you create a memory (default: 0.5). The score changes in two ways:

  • Decay — importance decreases over time based on a half-life (default: 7 days). A memory that hasn't been accessed in 7 days has its importance halved.
  • Retrieval bump — every time a memory appears in search or inject results, its importance is bumped by the hit increment (default: 0.05), then clamped to 1.0.

Decay is lazy — it's only calculated when a memory is retrieved, not on a schedule. This means importance stays accurate without any background jobs.

When the smart write pipeline updates a memory, the old memory's importance is preserved on the new one.

File Watcher

Engram watches the vault directory for external changes (edits from Obsidian, other tools, or direct file manipulation). When a file is created, modified, or deleted, the watcher re-indexes the affected memory. This keeps the search index in sync with the vault even when changes happen outside Engram's API.

The file watcher is enabled by default (ENGRAM_WATCHER_ENABLED=true). To disable it, set ENGRAM_WATCHER_ENABLED=false in your .env file.

Self-write suppression: when Engram's API writes a file, it registers the write with the watcher so it can ignore its own change event. This prevents redundant re-indexing.

On startup, Engram also scans the vault for files that changed since the last indexed time and re-indexes them. If no previous scan timestamp exists, it performs a full reindex.

Embedding Providers

Search requires an embedding provider to vectorize memories. Engram supports two providers:

  1. Ollama (default) — runs locally at http://localhost:11434 using the nomic-embed-text model. Start Ollama before Engram: ollama serve, then pull the model: ollama pull nomic-embed-text.

  2. fastembed — runs in-process with no external service. Uses the BAAI/bge-small-en-v1.5 model. Fallback only; set ENGRAM_EMBEDDING_PROVIDER=fastembed to use it directly.

When Ollama is unavailable and ENGRAM_EMBEDDING_AUTOFALLBACK=true (the default), Engram automatically falls back to fastembed. If both providers fail, the server starts without search — CRUD still works, search returns 503.

On Windows, the onnxruntime dependency that fastembed requires may fail to load. If you see a 503 error from search, start Ollama and let Engram use it as the embedding provider instead.

Configuration

All configuration uses environment variables with the ENGRAM_ prefix. Set them directly or via a .env file in the working directory.

Required:

Variable Default Description
ENGRAM_VAULT_PATH Path to the vault directory where memory files are stored

Optional:

Variable Default Description
ENGRAM_HOST 127.0.0.1 Server bind address
ENGRAM_PORT 7777 Server bind port
ENGRAM_IMPORTANCE_INITIAL_SCORE 0.5 Default importance score for new memories
ENGRAM_LOG_LEVEL INFO Log level: DEBUG, INFO, WARNING, ERROR, CRITICAL
ENGRAM_LOG_FILE ~/.engram/logs/engram.log Path to the log file
ENGRAM_LOG_ROTATION 10 MB Log rotation size threshold
ENGRAM_LOG_RETENTION 7 days Log retention period
ENGRAM_STATE_FILE ~/.engram/state.json Path to the PID state file (used by start and stop)
ENGRAM_EMBEDDING_PROVIDER ollama Embedding provider: ollama or fastembed
ENGRAM_EMBEDDING_MODEL nomic-embed-text Embedding model name (provider-specific)
ENGRAM_EMBEDDING_AUTOFALLBACK true Auto-fallback to fastembed if Ollama is unavailable
ENGRAM_CHUNK_MAX_TOKENS 512 Maximum tokens per chunk for semantic chunking
ENGRAM_CHUNK_OVERLAP_TOKENS 50 Overlap tokens between adjacent chunks
ENGRAM_RRF_K 10 RRF constant for hybrid search fusion
ENGRAM_IMPORTANCE_RERANK_WEIGHT 0.3 Weight for importance score in reranking (0.0 to 1.0)
ENGRAM_INDEX_PATH ~/.engram/index Path to the LanceDB index directory
ENGRAM_SIMILARITY_ADD_THRESHOLD 0.25 Below this similarity, always add as new memory
ENGRAM_SIMILARITY_IGNORE_THRESHOLD 0.85 At or above this similarity, treat as duplicate (if ignore enabled)
ENGRAM_SIMILARITY_IGNORE_ENABLED false Whether to auto-ignore duplicates above ignore threshold (default: off — ambiguous memories go to the LLM)
ENGRAM_IMPORTANCE_DECAY_HALFLIFE 7.0 Half-life in days for importance decay
ENGRAM_IMPORTANCE_HIT_INCREMENT 0.05 Importance bump on each search retrieval
ENGRAM_LLM_MODEL llama3 Ollama model name for smart write LLM consultation
ENGRAM_LLM_HOST http://localhost:11434 Ollama host URL for smart write LLM consultation
ENGRAM_MCP_ENABLED true Enable MCP server (separate process)
ENGRAM_MCP_PORT 7778 Port for the standalone MCP server
ENGRAM_WATCHER_ENABLED true Enable file watcher for automatic vault sync
ENGRAM_WATCHER_DEBOUNCE_MS 2000 Debounce time in milliseconds for file watcher events
ENGRAM_SHARED_MODE false Enable shared mode — private writes are also copied to shared namespace
ENGRAM_INJECTION_MIN_SCORE 0.3 Minimum search score for inject endpoint results
ENGRAM_INJECTION_TOP_N 5 Maximum results returned by inject endpoint
ENGRAM_EMBEDDING_CACHE_SIZE 1024 LRU cache size for embedding vectors
ENGRAM_FTS_REBUILD_INTERVAL 50 Number of adds before FTS index rebuild
ENGRAM_SEARCH_CACHE_TTL 30 TTL in seconds for search result cache
ENGRAM_VAULT_CACHE_SIZE 512 LRU cache size for vault read/list operations

Reserved (accepted but unused):

Variable Default Note
ENGRAM_OBSIDIAN_MODE true No effect in current version
ENGRAM_MCP_PATH /mcp MCP is now a standalone server — this path is not used

The .env.example file in the repository root contains all variables with their defaults.

End-to-End Walkthrough

This walkthrough creates a memory, reads it, searches for it, injects it, and deletes it. Use the my-agent agent ID throughout.

Step 1: Start the server

uv run engram start

Step 2: Create a memory

POST requests with JSON bodies require different quoting on Windows CMD. See Write a Memory for the Windows CMD variant.

macOS / Linux:

curl -X POST http://127.0.0.1:7777/agents/my-agent/memories \
  -H "Content-Type: application/json" \
  -d '{"content":"Deployed v2 to production on Saturday","tags":["deploy","production"],"importance":0.9}'

The response includes a decision and id field. Your id will contain today's date:

{
  "decision": "added",
  "id": "deployed-v2-to-production-my-agent-2026-05-07",
  "similarity_score": null
}

Step 3: Read the memory

Use the id from step 2. Your id will contain today's date:

curl http://127.0.0.1:7777/agents/my-agent/memories/deployed-v2-to-production-my-agent-2026-05-07

This command works on all platforms. Response:

{
  "id": "deployed-v2-to-production-my-agent-2026-05-07",
  "agent": "my-agent",
  "type": "memory",
  "importance": 0.9,
  "tags": ["deploy", "production"],
  "created": "2026-05-06T04:39:16.923211+00:00",
  "updated": "2026-05-06T04:39:16.923211+00:00",
  "importance_updated": "2026-05-06T04:39:16.923211+00:00",
  "body": "Deployed v2 to production on Saturday"
}

The created, updated, and importance_updated timestamps vary each time.

Step 4: Search for the memory

macOS / Linux:

curl "http://127.0.0.1:7777/agents/my-agent/memories/search?q=production+deploy&limit=5"

Windows CMD:

curl "http://127.0.0.1:7777/agents/my-agent/memories/search?q=production+deploy&limit=5"

The search returns ranked results with relevance scores. The score and importance values will differ from this example:

[
  {
    "id": "deployed-v2-to-production-my-agent-2026-05-07",
    "score": 0.3036363672126423,
    "importance": 0.85,
    "chunk": "Deployed v2 to production on Saturday",
    "agent": "my-agent",
    "created": "2026-05-07T12:41:49.276903+00:00"
  }
]

Step 5: Inject relevant memories

macOS / Linux:

curl "http://127.0.0.1:7777/agents/my-agent/inject?q=production+deploy&limit=5"

Windows CMD:

curl "http://127.0.0.1:7777/agents/my-agent/inject?q=production+deploy&limit=5"

Response:

{
  "memories": [
    {
      "id": "deployed-v2-to-production-my-agent-2026-05-07",
      "body": "Deployed v2 to production on Saturday",
      "importance": 0.9,
      "score": 0.3486183356155048,
      "agent": "my-agent",
      "tags": ["deploy", "production"],
      "created": "2026-05-07T12:41:49.276903+00:00"
    }
  ],
  "count": 1,
  "query": "production deploy"
}

The importance and score values will differ from this example — they change with each retrieval.

Step 6: List all memories

curl http://127.0.0.1:7777/agents/my-agent/memories

Returns an array containing the memory from step 2. This command works on all platforms.

Step 7: Delete the memory

Use the id from step 2. Your id will contain today's date:

curl -X DELETE http://127.0.0.1:7777/agents/my-agent/memories/deployed-v2-to-production-my-agent-2026-05-07

Returns 204 with an empty body. This command works on all platforms.

Step 8: Verify deletion

curl http://127.0.0.1:7777/agents/my-agent/memories

Returns []. This command works on all platforms.

Step 9: Stop the server

Press Ctrl+C in the terminal running the server, or:

uv run engram stop

This command works on all platforms.

Troubleshooting

ValidationError: vault_path field required

pydantic_core._pydantic_core.ValidationError: 1 validation error for Settings
vault_path
  Field required

The ENGRAM_VAULT_PATH environment variable is not set. Set it before starting the server.

# macOS / Linux
export ENGRAM_VAULT_PATH="$HOME/.engram/vault"
# Windows PowerShell
$env:ENGRAM_VAULT_PATH = "$env:USERPROFILE\.engram\vault"
# Windows CMD
set ENGRAM_VAULT_PATH=%USERPROFILE%\.engram\vault

Or edit the .env file and set ENGRAM_VAULT_PATH to the path you chose for your vault directory.

No running Engram server found

No running Engram server found

The engram stop command cannot find a running server. Either the server was never started, or it crashed without cleaning up its state file. If a stale state file exists, engram start removes it automatically before starting.

Port 7777 already in use

ERROR:    [Errno 98] Address already in use

On Windows, the error message is:

ERROR:    [WinError 10048] Only one usage of each socket address is permitted

Another process is using port 7777. Use a different port:

uv run engram start --port 8080

Or find and stop the process using port 7777:

# macOS / Linux
lsof -i :7777
kill <PID>
# Windows PowerShell
Get-NetTCPConnection -LocalPort 7777 | Select-Object OwningProcess
Stop-Process -Id <PID>
# Windows CMD
netstat -ano | findstr :7777
taskkill /PID <PID> /F

Empty content rejected with 422

{
  "detail": [
    {
      "type": "string_too_short",
      "loc": ["body", "content"],
      "msg": "String should have at least 1 character",
      "input": "",
      "ctx": { "min_length": 1 }
    }
  ]
}

The content field is required and must be at least 1 character. Provide non-empty content in the request body.

agent_id contains illegal characters (400)

{ "detail": "agent_id contains illegal characters: 'bad<agent'" }

The agent_id contains characters that are not allowed. Allowed characters are letters, digits, hyphens, underscores, and dots. Path separators (/, \, ..) and Windows-illegal filename characters (< > : " &#124; ? *) are rejected. The string "shared" is also reserved.

agent_id with path traversal (400)

{ "detail": "Invalid agent_id: '../hack'" }

The agent_id contains path traversal characters (.., /, \).

Daemon fails to start on first attempt

Daemon failed to start on 127.0.0.1:7777. Process may have exited (PID 12345).

On the first run, the server may need more than a few seconds to initialize (embedding model downloads, index creation). The daemon timeout is 30 seconds. If it still fails, try running in foreground mode first to see startup logs:

uv run engram start

If foreground mode works, the daemon should work on subsequent attempts since model files are cached.

Search returns 503

{ "detail": "Search unavailable: embedding provider is not configured" }

Neither Ollama nor fastembed could be loaded. On Windows, this is typically caused by the onnxruntime DLL failing to load. The server starts without search, but CRUD operations still work. To resolve:

  • Start Ollama: ollama serve (then pull the model: ollama pull nomic-embed-text)
  • Or set ENGRAM_EMBEDDING_PROVIDER=fastembed in your .env file (may require Visual C++ Redistributable on Windows)
{ "detail": "Search index not available" }

The search index has not been initialized. This means the server started without embedding support. See the resolution steps above.

Smart write always adds (never deduplicates)

If Ollama is not running, the LLM consultation falls back to "add" every time. When ENGRAM_SIMILARITY_IGNORE_ENABLED is false (the default), no memories are auto-ignored — they go to the LLM instead, which falls back to "add". When ENGRAM_SIMILARITY_IGNORE_ENABLED is true, memories with similarity at or above ENGRAM_SIMILARITY_IGNORE_THRESHOLD (default 0.85) are still ignored. Only the ambiguous zone between 0.25 and 0.85 defaults to "add" instead of consulting the LLM.

To enable LLM-assisted decisions in the ambiguous zone:

  1. Install Ollama: see ollama.com
  2. Pull a model: ollama pull llama3
  3. Start Ollama: ollama serve
  4. If Ollama runs on a non-default host, set ENGRAM_LLM_HOST in your .env file

Environment variable changes not taking effect

The get_settings() function caches configuration on first call. If you change environment variables after starting the server, restart Engram for changes to take effect:

Press Ctrl+C to stop the server, then start it again:

uv run engram start

Known Runtime Warnings

When running uv sync --extra dev, you may see output like:

Resolved 121 packages in 10ms
Installed 8 packages in 9.49s

The exact package count and time vary. This is normal — uv is resolving and checking dependencies. No action required.

When running engram start, you may see a deprecation warning from FastAPI:

DeprecationWarning: on_event is deprecated, use lifespan event handlers instead.

This is caused by the file watcher startup/shutdown hooks using FastAPI's deprecated on_event API. It does not affect functionality. A migration to the lifespan API is planned for a future version.

When running engram start for the first time, Engram creates the engram subdirectory inside your vault path. This is expected — the health check verifies this directory exists.

If you created a virtual environment manually before running uv sync, you may see:

warning: `VIRTUAL_ENV=venv` does not match the project environment path `.venv` and will be ignored

This is harmless. uv run uses its own .venv and ignores the manual environment. You can delete your manually created virtual environment directory.

Known Limitations

  • GET /shared/memories/search deferred — only agent-scoped search exists; the shared inject endpoint provides an alternative
  • Vector dimension is determined at first table creation (auto-detected from embedding model)
  • FTS index is created lazily on first search, not proactively on every add
  • Index sync failures on write/delete are logged as warnings but don't fail the request
  • Evals runner/sweep depend on running server with search endpoints
  • fastembed cannot import on some Windows machines (onnxruntime DLL issue) — server degrades gracefully, search returns 503
  • Module-level imports of ollama and fastembed mean both must be installed
  • Same-day same-agent duplicate slug silently overwrites
  • get_settings() uses lru_cache — stale after env var changes
  • ENGRAM_OBSIDIAN_MODE accepted but unused
  • LLM consultation requires Ollama running with the configured model
  • Concurrent searches on the same memory can cause lost importance updates (no optimistic locking)
  • Watcher uses deprecated FastAPI on_event lifecycle hooks (lifespan refactor is future scope)
  • Blocking sync I/O in async watcher loop (file reads in _handle_add_or_update); acceptable for v1.4
  • ENGRAM_MCP_PATH config field accepted but unused — MCP is now a standalone server, not a mounted sub-app
  • Shared mode writes use content-based slug reconstruction, not provenance tracking
  • Shared inject endpoint does not validate agent_id (it uses hardcoded "shared" namespace)
  • shared agent_id is reserved and cannot be used directly via the REST API
  • Ollama LLM client singleton ignores host parameter after first creation
  • Result cache is an unbounded dict (TTL + mutation-based eviction; acceptable for v1.4.3)
  • Vault caches use threading.Lock for safety (GIL provides additional protection for single ops)
  • Unbounded memory growth in bridge cache if calls stop (expired entries only evicted on call)
  • _build_system_prompt(agent_id, version) parameters are unused — the prompt is identical for all agents
  • Importance updates on search/inject return current importance; updated values are visible on the next request

Evaluation Results

Retrieval evaluation across versions (23 queries, golden set):

Metric v1.1 v1.2 v1.3 v1.4
P@1 0.3478 0.6957 0.6522 0.8696
R@5 1.0 1.0 1.0 1.0
MRR@10 0.5841 0.8152 0.7877 0.9239
Latency@10 5324 ms 18561 ms 18843 ms 19123 ms

v1.4 shows major gains in precision and MRR over v1.3, driven by the inject endpoint's score filtering and the hybrid search improvements. Recall remains perfect at 1.0 across all versions. Latency is essentially flat from v1.2 onward because importance updates dominate the cost.

What's New

See CHANGELOG.md for the full history.

v1.4.3 adds a "merge" decision to the smart write pipeline (LLM can now decide to append incoming content to an existing memory), makes auto-ignore opt-in via ENGRAM_SIMILARITY_IGNORE_ENABLED (default off — ambiguous memories go to the LLM instead of being silently discarded), lowers similarity thresholds to reduce false deduplication, and fixes OpenClaw bridge auto-capture to include all messages, handle structured content, and include tool role messages.

v1.4.2 adds a caching layer for speed optimization (Ollama client singleton, LRU embedding cache, FTS rebuild interval, TTL search result cache, vault read/list cache), converts all REST handlers and MCP tools to async with run_in_executor, defers importance updates to background tasks, fixes the system prompt to use auto-recall guidance instead of instructing models to call memory_inject before every response, and adds a TTL cache to the OpenClaw bridge auto-recall hook.

v1.4 adds an inject endpoint for auto-loading relevant memories into agent context, a memory_inject MCP tool, shared mode for cross-agent memory sharing (private-first writes to a shared namespace), per-file write locking for thread safety, and two new configuration fields (injection_min_score, injection_top_n).

v1.3 adds an MCP server with 5 tools for agent memory access, a system prompt kit for configuring agent behavior, file watching for vault synchronization, and a startup vault scan.

v1.2 adds smart write deduplication with LLM consultation, importance scoring with time-based decay and retrieval bumps, configurable similarity thresholds, and 6 new environment variables for the intelligence features.

v1.1 adds semantic search with LanceDB, embedding providers (Ollama and fastembed), semantic chunking with configurable overlap, importance-weighted reranking, and a health endpoint that reports component status. Search works alongside CRUD — if embeddings aren't available, CRUD still works and search returns 503.

Development

uv sync --extra dev
uv run pytest --cov=engram -v

542 tests pass with over 90% coverage. The test run time varies by machine.

Lint and format:

uv run ruff check src/ tests/ evals/
uv run ruff format src/ tests/ evals/