Skip to content

aiming-lab/SimpleMem

Repository files navigation

simplemem_logo

Efficient Lifelong Memory for LLM Agents — Text & Multimodal

Store, compress, and retrieve long-term memories with semantic lossless compression. Now with multimodal support for text, image, audio & video. Works across Claude, Cursor, LM Studio, and more.

Works with any AI platform that supports MCP or Python integration

Claude Desktop
Claude Desktop
Cursor
Cursor
LM Studio
LM Studio
Cherry Studio
Cherry Studio
PyPI
PyPI Package
+ Any MCP
Client

🔥 News

  • [05/14/2026] 🧬 EvolveMem (v3.0) — Self-Evolving Memory via AutoResearch! The retrieval infrastructure itself now self-evolves through LLM-driven closed-loop diagnosis. On LoCoMo, EvolveMem outperforms the strongest baseline by +25.7% relative; on MemBench, by +18.9% relative. The system discovers entirely new retrieval dimensions not present in the original design. View EvolveMem →
  • [04/02/2026] 🧠 Omni-SimpleMem (v2.0) — Multimodal Memory is Here! SimpleMem now supports text, image, audio & video memory. Achieving new SOTA on LoCoMo (F1=0.613, +47%) and Mem-Gallery (F1=0.810, +51%) over previous best. View Omni-SimpleMem →
  • [02/09/2026] 🚀 Cross-Session Memory — Outperforming Claude-Mem by 64%! View Cross-Session Documentation →
  • [01/20/2026] 📦 SimpleMem is now available on PyPI! Install via pip install simplemem. View Package Usage Guide →
  • [01/14/2026] 🎉 SimpleMem MCP Server is LIVE! Cloud-hosted at mcp.simplemem.cloud. View MCP Documentation →
  • [01/05/2026] SimpleMem paper was released on arXiv!

📑 Table of Contents


🚀 Quick Start

🧠 Understanding the Basic Workflow

At a high level, SimpleMem works as a long-term memory system for LLM-based agents. The workflow consists of three simple steps:

  1. Store information – Dialogues or facts are processed and converted into structured, atomic memories.
  2. Index memory – Stored memories are organized using semantic embeddings and structured metadata.
  3. Retrieve relevant memory – When a query is made, SimpleMem retrieves the most relevant stored information based on meaning rather than keywords.

This design allows LLM agents to maintain context, recall past information efficiently, and avoid repeatedly processing redundant history.

🎓 Basic Usage

SimpleMem provides a unified entry point via simplemem_router. The default mode="auto" automatically detects which backend to use based on what you call — no manual configuration needed:

import simplemem_router as simplemem

mem = simplemem.create()  # mode="auto" — backend chosen by first call

The first method you call determines the backend:

First call Backend selected Why
add_dialogue() Text (SimpleMem) Dialogue-based API → text mode
add_text() / add_image() / add_audio() / add_video() Omni (Omni-SimpleMem) Multimodal API → omni mode

📝 Auto → Text (pure text input)

import simplemem_router as simplemem

mem = simplemem.create()  # auto mode

# add_dialogue() → text backend auto-selected
mem.add_dialogue(
    "Alice",
    "Bob, let's meet at Starbucks tomorrow at 2pm",
    "2025-11-15T14:30:00",
)
mem.add_dialogue(
    "Bob",
    "Sure, I'll bring the market analysis report",
    "2025-11-15T14:31:00",
)
mem.finalize()

answer = mem.ask("When and where will Alice and Bob meet?")
# → "16 November 2025 at 2:00 PM at Starbucks"

🧠 Auto → Omni (multimodal input)

import simplemem_router as simplemem

mem = simplemem.create()  # auto mode

# add_image() → omni backend auto-selected
mem.add_text(
    "User loves hiking in the Rocky Mountains.",
    tags=["session_id:D1"],
)
mem.add_image("photo.jpg", tags=["session_id:D1"])
mem.add_audio("voice_note.wav", tags=["session_id:D1"])

result = mem.query("What does the user enjoy?", top_k=5)
for item in result.items:
    print(item["summary"])

mem.close()

💡 Tip: Auto mode picks the lightest backend that fits your data. You can still use mode="text" or mode="omni" explicitly if you prefer.


🚄 Advanced: Parallel Processing

For large-scale dialogue processing, enable parallel mode:

import simplemem_router as simplemem

mem = simplemem.create(
    mode="text",
    clear_db=True,
    enable_parallel_processing=True,  # ⚡ Parallel memory building
    max_parallel_workers=8,
    enable_parallel_retrieval=True,   # 🔍 Parallel query execution
    max_retrieval_workers=4
)

💡 Pro Tip: Parallel processing significantly reduces latency for batch operations!


🌟 Overview

SimpleMem is a family of efficient memory frameworks — SimpleMem for text and Omni-SimpleMem for multimodal (text, image, audio, video) — based on semantic lossless compression that addresses the fundamental challenge of efficient long-term memory for LLM agents. Unlike existing systems that either passively accumulate redundant context or rely on expensive iterative reasoning loops, SimpleMem maximizes information density and token utilization through a three-stage pipeline:

🔍 Stage 1

Semantic Structured Compression

Distills unstructured interactions into compact, multi-view indexed memory units

🗂️ Stage 2

Online Semantic Synthesis

Intra-session process that instantly integrates related context into unified abstract representations to eliminate redundancy

🎯 Stage 3

Intent-Aware Retrieval Planning

Infers search intent to dynamically determine retrieval scope and construct precise context efficiently

For multimodal memory, see Omni-SimpleMem below.

🏆 Performance Comparison

Performance vs Efficiency Trade-off

SimpleMem achieves superior F1 score (43.24%) with minimal token cost (~550), occupying the ideal top-left position.

Speed Comparison Demo

SimpleMem vs. Baseline: Real-time speed comparison demonstration

LoCoMo-10 Benchmark Results (GPT-4.1-mini)

Model ⏱️ Construction Time 🔎 Retrieval Time ⚡ Total Time 🎯 Average F1
A-Mem 5140.5s 796.7s 5937.2s 32.58%
LightMem 97.8s 577.1s 675.9s 24.63%
Mem0 1350.9s 583.4s 1934.3s 34.20%
SimpleMem 92.6s 388.3s 480.9s 43.24%

📈 Results

📊 Benchmark Results (LoCoMo)

🏆 Cross-Session Memory Comparison
System LoCoMo Score vs SimpleMem
SimpleMem 48
Claude-Mem 29.3 +64%
🔬 High-Capability Models (GPT-4.1-mini)
Task Type SimpleMem F1 Mem0 F1 Improvement
MultiHop 43.46% 30.14% +43.8%
Temporal 58.62% 48.91% +19.9%
SingleHop 51.12% 41.3% +23.8%
⚙️ Efficient Models (Qwen2.5-1.5B)
Metric SimpleMem Mem0 Notes
Average F1 25.23% 23.77% Competitive with 99× smaller model

🧬 EvolveMem Results

🏆 0.543 F1
LoCoMo GPT-4o (+25.7% over SimpleMem)
🏆 0.572 F1
LoCoMo GPT-5.1 (+36.8% over SimpleMem)
🏆 71.4% Acc
MemBench (+18.9% over best baseline)
🧬 Self-evolving
7 autonomous rounds

🧠 Omni-SimpleMem Results

🏆 0.613 F1
LoCoMo (+47% over prev. SOTA)
🏆 0.810 F1
Mem-Gallery (+51% over prev. SOTA)
3.5x faster
retrieval throughput
🧠 4 modalities
Text · Image · Audio · Video

📝 SimpleMem: Text Memory

1️⃣ Semantic Structured Compression

SimpleMem applies an implicit semantic density gating mechanism integrated into the LLM generation process to filter redundant interaction content. The system reformulates raw dialogue streams into compact memory units — self-contained facts with resolved coreferences and absolute timestamps. Each unit is indexed through three complementary representations for flexible retrieval:

🔍 Layer 📊 Type 🎯 Purpose 🛠️ Implementation
Semantic Dense Conceptual similarity Vector embeddings (1024-d)
Lexical Sparse Exact term matching BM25-style keyword index
Symbolic Metadata Structured filtering Timestamps, entities, persons

✨ Example Transformation:

- Input:  "He'll meet Bob tomorrow at 2pm"  [❌ relative, ambiguous]
+ Output: "Alice will meet Bob at Starbucks on 2025-11-16T14:00:00"  [✅ absolute, atomic]

2️⃣ Online Semantic Synthesis

Unlike traditional systems that rely on asynchronous background maintenance, SimpleMem performs synthesis on-the-fly during the write phase. Related memory units are synthesized into higher-level abstract representations within the current session scope, allowing repetitive or structurally similar experiences to be denoised and compressed immediately.

✨ Example Synthesis:

- Fragment 1: "User wants coffee"
- Fragment 2: "User prefers oat milk"
- Fragment 3: "User likes it hot"
+ Consolidated: "User prefers hot coffee with oat milk"

This proactive synthesis ensures the memory topology remains compact and free of redundant fragmentation.


3️⃣ Intent-Aware Retrieval Planning

Instead of fixed-depth retrieval, SimpleMem leverages the reasoning capabilities of the LLM to generate a comprehensive retrieval plan. Given a query, the planning module infers latent search intent to dynamically determine retrieval scope and depth:

$${ q_{\text{sem}}, q_{\text{lex}}, q_{\text{sym}}, d } \sim \mathcal{P}(q, H)$$

The system then executes parallel multi-view retrieval across semantic, lexical, and symbolic indexes, and merges results through ID-based deduplication:

🔹 Simple Queries

  • Direct fact lookup via single memory unit
  • Minimal retrieval depth
  • Fast response time

🔸 Complex Queries

  • Aggregation across multiple events
  • Expanded retrieval depth
  • Comprehensive coverage

📈 Result: 43.24% F1 score with 30× fewer tokens than full-context methods.


🧠 Omni-SimpleMem: Multimodal Memory

NEW — SimpleMem now handles text, image, audio & video.

Omni-SimpleMem extends SimpleMem to unified multimodal memory — supporting text, image, audio, and video experiences with state-of-the-art accuracy across all five LLM backbones tested.

Built on three principles: Selective Ingestion (entropy-driven filtering for each modality), Progressive Retrieval (hybrid FAISS + BM25 search with pyramid token-budget expansion), and Knowledge Graph Augmentation (multi-hop cross-modal reasoning).

📖 Full documentation, benchmarks, and architecture details: Omni-SimpleMem →


🧬 EvolveMem: Self-Evolving Memory

EvolveMem (v3.0) makes the retrieval infrastructure itself a first-class optimization target. While SimpleMem and Omni-SimpleMem keep retrieval configurations frozen, EvolveMem autonomously evolves its retrieval policy through an LLM-driven closed-loop:

EvaluateDiagnose failures → Propose config changes → Guard against regression → Repeat

This self-evolution constitutes an AutoResearch process: the system conducts iterative research cycles on its own architecture, discovering new retrieval dimensions (query decomposition, entity-swap, answer verification) that were not in the original design.

Benchmark Backbone EvolveMem Best Baseline Relative Gain
LoCoMo (F1) GPT-4o 0.543 0.432 (SimpleMem) +25.7%
LoCoMo (F1) GPT-5.1 0.572 0.418 (SimpleMem) +36.8%
MemBench (Acc) GPT-4o 67.9% 57.1% +18.9%
MemBench (Acc) GPT-5.1 71.4% 64.3% +11.0%

📖 Full documentation, architecture, and usage: EvolveMem →


📦 Installation

📝 Notes for First-Time Users

  • Ensure you are using Python 3.10 in your active environment, not just installed globally.
  • An OpenAI-compatible API key must be configured before running any memory construction or retrieval, otherwise initialization may fail.
  • When using non-OpenAI providers (e.g., Qwen or Azure OpenAI), verify both the model name and OPENAI_BASE_URL in config.py.
  • For large dialogue datasets, enabling parallel processing can significantly reduce memory construction time.

📋 Requirements

  • 🐍 Python 3.10
  • 🔑 OpenAI-compatible API (OpenAI, Qwen, Azure OpenAI, etc.)

🛠️ Setup

# 📥 Clone repository
git clone https://github.com/aiming-lab/SimpleMem.git
cd SimpleMem

# 📦 Install dependencies
pip install -r requirements.txt

# ⚙️ Configure API settings
cp config.py.example config.py
# Edit config.py with your API key and preferences

⚙️ Configuration Example

# config.py
OPENAI_API_KEY = "your-api-key"
OPENAI_BASE_URL = None  # or custom endpoint for Qwen/Azure

LLM_MODEL = "gpt-4.1-mini"
EMBEDDING_MODEL = "Qwen/Qwen3-Embedding-0.6B"  # State-of-the-art retrieval

🐳 Run with Docker

The MCP Server can be run in Docker for a consistent, isolated environment. Data (LanceDB and user DB) is persisted in a host volume.

Prerequisites

Quick run

# From the repository root
docker compose up -d

Data is stored in ./data on the host (created automatically).

Custom configuration

  1. Copy the environment template and edit it:
    cp .env.example .env
    # Edit .env: set JWT_SECRET_KEY, ENCRYPTION_KEY, LLM_PROVIDER, model URLs, etc.
  2. Run with the env file:
    docker compose --env-file .env up -d

Using Ollama on the host

When LLM_PROVIDER=ollama and Ollama runs on your machine (not in Docker), set in .env:

LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://host.docker.internal:11434/v1

On Linux, host.docker.internal is enabled automatically via the Compose file.

Useful commands

docker compose logs -f simplemem   # Follow logs
docker compose down                 # Stop and remove containers

📖 For self-hosting the MCP server (Docker or bare metal), see MCP Documentation.


🔌 MCP Server (text memory)

SimpleMem is available as a cloud-hosted memory service via the Model Context Protocol (MCP), enabling seamless integration with AI assistants like Claude Desktop, Cursor, and other MCP-compatible clients.

🌐 Cloud Service: mcp.simplemem.cloud — or self-host the MCP server locally using Docker.

Key Features

Feature Description
Streamable HTTP MCP 2025-03-26 protocol with JSON-RPC 2.0
Multi-tenant Isolation Per-user data tables with token authentication
Hybrid Retrieval Semantic search + keyword matching + metadata filtering
Production Optimized Faster response times with OpenRouter integration

Quick Configuration

{
  "mcpServers": {
    "simplemem": {
      "url": "https://mcp.simplemem.cloud/mcp",
      "headers": {
        "Authorization": "Bearer YOUR_TOKEN"
      }
    }
  }
}

📖 For detailed setup instructions and self-hosting guide, see MCP Documentation


📊 Evaluation

🧪 Run Benchmark Tests

# 🎯 Full LoCoMo benchmark
python test_locomo10.py

# 📉 Subset evaluation (5 samples)
python test_locomo10.py --num-samples 5

# 💾 Custom output file
python test_locomo10.py --result-file my_results.json

🔬 Reproduce Paper Results

Use the exact configurations in config.py:

  • 🚀 High-capability: GPT-4.1-mini, Qwen3-Plus
  • ⚙️ Efficient: Qwen2.5-1.5B, Qwen2.5-3B
  • 🔍 Embedding: Qwen3-Embedding-0.6B (1024-d)

📝 Citation

If you use SimpleMem in your research, please cite:

@article{simplemem2026,
  title={SimpleMem: Efficient Lifelong Memory for LLM Agents},
  author={Liu, Jiaqi and Su, Yaofeng and Xia, Peng and Zhou, Yiyang and Han, Siwei and  Zheng, Zeyu and Xie, Cihang and Ding, Mingyu and Yao, Huaxiu},
  journal={arXiv preprint arXiv:2601.02553},
  year={2026},
  url={https://arxiv.org/abs/2601.02553}
}
@article{evolvemem2026,
  title={EvolveMem: Self-Evolving Memory Architecture via AutoResearch for LLM Agents},
  author={Liu, Jiaqi and Ye, Xinyu and Xia, Peng and Zheng, Zeyu and Xie, Cihang and Ding, Mingyu and Yao, Huaxiu},
  journal={arXiv preprint arXiv:2605.13941},
  year={2026},
  url={https://arxiv.org/abs/2605.13941}
}
@article{omnisimplemem2026,
  title   = {Omni-SimpleMem: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory},
  author  = {Liu, Jiaqi and Ling, Zipeng and Qiu, Shi and Liu, Yanqing and Han, Siwei and Xia, Peng and Tu, Haoqin and Zheng, Zeyu and Xie, Cihang and Fleming, Charles and Ding, Mingyu and Yao, Huaxiu},
  journal = {arXiv preprint arXiv:2604.01007},
  year    = {2026},
}

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.


🙏 Acknowledgments

We would like to thank the following projects and teams:

  • 🔍 Embedding Model: Qwen3-Embedding - State-of-the-art retrieval performance
  • 🗄️ Vector Database: LanceDB - High-performance columnar storage
  • 📊 Benchmark: LoCoMo - Long-context memory evaluation framework