AI Agent Memory Tools — Comparison

Agents forget by default. This page maps tools whose main job is to persist and retrieve context across sessions: dedicated memory layers, temporal graphs, MCP servers for coding assistants, and framework memory subsystems. Updated 21 June 2026 after a seed-plus-discovery sweep; benchmark numbers are cited with conditions, not ranked.

Mem0

What it is

Mem0 is a framework-agnostic memory layer: it extracts facts from conversations, stores them (vector search plus entity linking and BM25 in recent versions), and injects relevant memories on later turns. It ships as Python/JS SDKs, a self-hosted Docker stack, or a managed platform at mem0.ai.

Pros

Large community (~59k GitHub stars as of June 2026) and broad integrations (LangChain, CrewAI, LlamaIndex, Vercel AI SDK, CLI, agent signup flow) per README
Apache 2.0 OSS plus optional managed cloud; self-hosted server with API keys (docs)
April 2026 algorithm refresh: Mem0 reports 91.6% on LoCoMo and 94.8% on LongMemEval with ~7k tokens per query on their production stack (README benchmark table); open evaluation framework to reproduce
YC-backed; company cites a $24M Series A (October 2025) in third-party comparisons; SOC 2 / HIPAA on managed tier (vectorize.io comparison)

Cons / limitations

Requires an external LLM and embedding model for extraction and search (defaults include OpenAI models; other providers configurable) (docs)
Managed advanced features (graph mode, scale) sit behind paid plans; pricing not verified line-by-line here
Benchmark scores are vendor-reported on their own stack; independent LongMemEval numbers cited by competitors differ (e.g. Vectorize cites ~49% in one independent eval for Mem0 vs higher self-reported figures)

Best fit

Teams that want the fastest path to per-user memory in an existing chatbot or agent stack without adopting a full agent runtime. Less ideal if bi-temporal “what was true when?” queries are the core requirement (Graphiti/Zep is stronger there).

MemPalace

What it is

MemPalace is a local-first memory system for AI assistants. It stores verbatim conversation and project text, indexes it with semantic search (ChromaDB by default, pluggable backends), and exposes 33+ MCP tools plus CLI workflows (`mine`, `search`, `wake-up`). Optional temporal knowledge-graph layer on SQLite.

Pros

No API key required for core retrieval path; embeddings run locally (README)
MIT license; PyPI package and Docker image; explicit impostor-site warning in official repo
Strong retrieval benchmarks on published harness: 96.6% R@5 on LongMemEval (raw semantic, no LLM) and reproducible scripts in `benchmarks/` (BENCHMARKS.md)
Hooks for Claude Code, Codex, and Cursor to auto-save sessions before compaction

Cons / limitations

Verbatim storage can grow large; hybrid/rerank modes may call an external LLM for the last mile
Palace metaphor and MCP surface have a learning curve vs a single `add/search` API
README deliberately avoids head-to-head QA accuracy comparisons with Mem0/Zep because metrics differ (retrieval recall vs end-to-end QA)

Best fit

Developers who want offline or self-hosted coding-assistant memory with auditable benchmarks. Wrong choice if you need a managed multi-tenant SaaS with compliance paperwork out of the box.

Supermemory

What it is

Supermemory is a memory and context engine: it extracts user facts, maintains profiles, runs hybrid RAG+memory search, and offers connectors (Google Drive, Gmail, Notion, GitHub, etc.). Available as cloud API, MCP server, plugins (Claude Code, Cursor, OpenClaw), and Supermemory local (single binary, `localhost:6767`).

Pros

One API covers memory extraction, profiles, document ingestion, and hybrid search (README)
Local install path: `curl -fsSL https://supermemory.ai/install | bash` with Ollama support for fully offline use (self-hosting docs)
Publishes MemoryBench for comparing providers; claims #1 on LongMemEval (81.6%), LoCoMo, and ConvoMem on their leaderboard (self-reported, June 2026 README)
First-party integrations including `@supermemory/tools/mastra` wrapper

Cons / limitations

License for the full product stack was not verified as pure OSS in this pass; hosted platform is the primary commercial path despite a local binary
Benchmark leadership claims are vendor-published; competitors dispute comparability of LoCoMo vs LongMemEval scoring (Vectorize Hindsight article)
Enterprise-only features may apply for some deployment modes (local vs enterprise docs distinguish tiers)

Best fit

Product teams wanting memory + connectors + profiles in one vendor, with a credible local prototype path. Less ideal if you require a simple Apache-licensed library with no hosted upsell.

Graphiti / Zep

What it is

Graphiti (getzep/graphiti) is an open-source temporal context graph engine: episodes become entities and relationships with validity intervals, so facts can be superseded without erasing history. Zep is the managed platform built on Graphiti for production context retrieval (Graphiti README). Paper: Zep: A Temporal Knowledge Graph Architecture for Agent Memory (January 2025).

Pros

First-class bi-temporal model (valid time vs transaction time) for “what did we believe on date X?” (Graphiti docs)
Graphiti is self-hostable with Neo4j, FalkorDB, or Neptune backends; Python/TS/Go SDKs
Zep reports 63.8% on LongMemEval (GPT-4o) and sub-200ms cloud retrieval in vendor materials (Vectorize comparison table); Graphiti ~24k stars
Strong fit for evolving user state (CRM, support, compliance) where facts change

Cons / limitations

Full Zep user/thread storage and SLAs are managed-only; Graphiti alone requires you to build auth, tenancy, and ops
Needs graph DB ops expertise for self-hosted Graphiti
Requires LLM calls for episode ingestion and entity extraction

Best fit

Applications where facts change over time and you must query history, not just similarity. Overkill for a static FAQ bot; use Mem0 or LangMem for simpler profile memory.

Hindsight

What it is

Hindsight (MIT, ~16.7k stars) is an agent memory server from Vectorize.io. It organizes memories into world facts, experiences, and mental models, with three operations: `retain`, `recall`, and `reflect`. Retrieval merges semantic, keyword, graph, and temporal strategies.

Pros

Docker one-liner or embedded Python (`hindsight-all`); optional Oracle AI Database for enterprise (README)
LLM wrapper path: two lines to wrap an existing client for automatic retain/recall
Vendor claims state-of-the-art LongMemEval performance; README states scores were independently reproduced by Virginia Tech Sanghani Center and The Washington Post (other vendor scores listed as self-reported)
MIT license; managed Hindsight Cloud optional

Cons / limitations

Requires LLM API for retain/reflect pipelines (OpenAI default in quickstart)
Heavier than a flat vector store; may be overkill for simple chat personalization
Production footprint at Fortune 500 cited by vendor; independent third-party benchmark reproduction details not fully verified here beyond README claim

Best fit

Long-running autonomous or employee-style agents that must learn from feedback and reflect on past work, not just recall chat snippets. Not the lightest option for a weekend MCP experiment.

Mastra Memory

What it is

Mastra is a TypeScript agent framework; `@mastra/memory` adds conversation history, working memory (structured user fields), semantic recall, and Observational Memory (background agents compress old turns into dense observations) (Memory docs). Storage via LibSQL, Postgres, etc.

Pros

Memory is a first-class package with thread/resource scoping and multi-agent isolation rules (docs)
Observational Memory targets context-window limits without losing long-horizon notes
Integrates with Mastra Studio tracing for debugging what entered context
Supermemory ships an official Mastra wrapper (`@supermemory/tools/mastra`)

Cons / limitations

Memory features are coupled to Mastra; not a drop-in layer for arbitrary stacks (contrast Mem0)
Requires configuring a Mastra storage provider
Benchmarks as a standalone memory product were not found; compare as framework subsystem

Best fit

Teams already building on Mastra who want opinionated memory processors in one stack. Wrong choice if you only need a memory API inside LangChain or raw Express.

iai-mcp (iai-pme)

What it is

iai-mcp (iai-pme: personal memory engine) is a local MCP server that captures sessions verbatim, clusters them into a personal map, and injects relevant context at session start. Custom storage, community detection, and hyperdimensional substrate (not a thin Chroma wrapper) per README.

Pros

Fully local: no API key, account, or telemetry for the memory engine (README)
Ships reproducible benchmark harness; author reports daily personal use
MCP-native for Claude Code and other hosts

Cons / limitations

Smaller community (~282 GitHub stars vs tens of thousands for MemPalace/Mem0) as of June 2026
Single-maintainer feel; status/limitations section in README should be read before production bets
License and latest release cadence not re-fetched in this pass (verify on repo)

Best fit

Privacy-maximalists running MCP coding assistants who want a bespoke local engine with published micro-benchmarks. Not the default pick for team-wide managed memory.

agentmemory

What it is

agentmemory (package/site: agent-memory.dev) is an MCP-native persistent memory server for coding agents, built on the iii engine. It targets Claude Code, Cursor, Copilot CLI, Codex, Windsurf, and other MCP clients with hybrid search, lifecycle, and knowledge-graph features (README).

Pros

Very active project: ~23.5k+ stars, Apache 2.0, 49 releases (latest v0.9.27 June 2026 per GitHub metadata)
One MCP server shared across tools; skills/hooks ecosystem
Implements Karpathy-style LLM wiki patterns with confidence scoring and graph-backed retrieval (vendor README)

Cons / limitations

Focused on developer/coding-agent workflows, not general customer-support chatbots
Requires running the agentmemory server (Node/iii stack); not a three-line pip install
Independent benchmark reproduction not verified in this pass (project claims benchmark leadership on site)

Best fit

Teams living in MCP coding agents who want shared memory across Cursor, Claude Code, and CLI tools. Skip if you need enterprise temporal KG semantics.

Letta

What it is

Letta (formerly MemGPT research) is an agent runtime with OS-inspired memory: agents use tools to read/write memory blocks across tiers (core vs archival/recall). Apache 2.0, ~23k stars. Letta Code CLI adds terminal agents with skills and “dreaming”/sleep-time compute.

Pros

Agents self-edit memory via tools (MemGPT pattern) rather than passive extraction (Letta docs)
Self-hostable Docker server (`letta/letta`) with REST API and Python/TS SDKs
Felicis $10M seed cited in comparisons; strong research lineage (UC Berkeley MemGPT paper)

Cons / limitations

You adopt the Letta runtime, not a pluggable memory library (Mem0 vs Letta comparison)
Published LongMemEval scores were not found in primary docs (competitors note “not published”)
Managed Letta Cloud pricing ($20–200/mo cited by third parties; verify at letta.com)

Best fit

Long-horizon autonomous agents where the model should decide what to remember and evict. Poor fit if you only need to bolt memory onto an existing LangGraph graph.

LangMem

What it is

LangMem (MIT) provides memory primitives and tools for LangGraph agents: `create_manage_memory_tool`, `create_search_memory_tool`, plus background memory managers. Stores via LangGraph’s `BaseStore` (Postgres, Redis, in-memory) (README).

Pros

Native to LangGraph Platform deployments; no separate memory microservice required
Supports hot-path (agent-managed) and background extraction modes
MIT license; small, focused scope (~1.5k stars)

Cons / limitations

LangGraph lock-in for the integrated path; not framework-agnostic
No bi-temporal graph; episodic memory noted as evolving in launch materials
Smaller ecosystem than Mem0 or Letta

Best fit

Agents already on LangGraph/LangChain who want memory tools without operating a second vendor. Wrong choice for a Rust or Mastra-only stack.

Cognee

What it is

Cognee (~18.4k stars) is an open-source AI memory platform: ingest documents and events, run an Extract → Cognify → Load pipeline into a self-hosted knowledge graph plus vectors, then `remember` / `recall` / `forget` / `improve` (README). MCP and Claude Code plugin available.

Pros

Graph-native memory with ontology generation; multimodal ingestion paths (docs)
Self-host (Docker, Railway, Fly.io, etc.) or Cognee Cloud; Python SDK and CLI
$7.5M seed and 70+ production deployments claimed on vendor blog (Cognee guide)
Research paper: Optimizing the Interface Between Knowledge Graphs and LLMs (2025)

Cons / limitations

Heavier operational footprint than Mem0; LLM API required for cognify pipeline
MCP UI path expects Docker (README note)
Open-core vs cloud feature split should be checked on current pricing page

Best fit

Teams building institutional / company-brain agents that must unify documents and sessions in a evolving graph. Overkill for single-user coding MCP memory.

Comparison table

Tool	License/Model	Architecture	Self-host?	Best fit
Mem0	Apache 2.0 + managed cloud	Vector + entity/BM25 hybrid extraction	Yes (Docker server)	Fast per-user memory in any framework
MemPalace	MIT	Verbatim text + vector (pluggable backend) + optional temporal KG	Yes (local default)	Offline coding-assistant memory with published retrieval benchmarks
Supermemory	OSS local binary + hosted platform	Memory engine + profiles + hybrid RAG	Yes (`supermemory-server` local)	Memory + connectors + profiles in one API
Graphiti / Zep	Graphiti OSS; Zep managed	Temporal knowledge graph	Graphiti yes; Zep cloud-first	Facts that change over time; audit history
Hindsight	MIT + cloud	World / experience / mental model + multi-strategy recall	Yes (Docker / embedded)	Reflective agents that learn from outcomes
Mastra Memory	Apache 2.0 (framework)	Thread history + working memory + semantic recall + observational compression	Yes (with Mastra storage)	Mastra-native agents needing long threads
iai-mcp	Verify on repo	Local MCP; custom HD / clustering store	Yes (local only)	Solo dev MCP hosts, maximum privacy
agentmemory	Apache 2.0	MCP server; hybrid + graph (iii engine)	Yes	Shared memory across coding MCP clients
Letta	Apache 2.0 + cloud	Tiered self-editing memory blocks in agent runtime	Yes (Docker)	Autonomous agents managing their own memory
LangMem	MIT	LangGraph store + agent memory tools	Yes (Postgres/Redis backends)	LangGraph agents without extra service
Cognee	OSS + Cognee Cloud	ECL pipeline → knowledge graph + vectors	Yes	Company-brain / multi-source institutional memory

How to choose

Drop-in layer vs full runtime. Mem0, Supermemory, and Hindsight expose memory as a service or library you attach to existing code. Letta and Mastra Memory expect you to live inside their agent harness. LangMem sits in between for LangGraph only.

Where data must live. MemPalace, iai-mcp, agentmemory (self-hosted), Graphiti, and Cognee can keep data on your machine or VPC. Supermemory local and Mem0 OSS do too, but many teams default to vendor clouds for speed.

Hardest requirement. If time and contradiction dominate (CRM, compliance, “when did status change?”), prioritize Graphiti/Zep or Cognee. If coding-session continuity dominates, prioritize MemPalace, agentmemory, or iai-mcp MCP paths. If agent self-improvement dominates, look at Letta or Hindsight’s `reflect`.

Budget. Count LLM calls for extraction, reflection, and reranking, not just storage. MemPalace’s core retrieval path avoids LLM tax; Mem0, Hindsight, Cognee, and Graphiti typically invoke models on write paths.

Benchmarks. LoCoMo and LongMemEval measure conversational recall under specific splits and graders; retrieval R@k ≠ end-to-end answer accuracy. Treat vendor leaderboard posts as claims until reproduced on MemoryBench or project harnesses.

Also surveyed

Discovered but excluded from full sections: LangGraph/LlamaIndex Memory (composable buffers inside larger frameworks), Dakera (Rust on-device MCP engine, 44MB binary), teddashh/mcp-memory-server (4-layer MCP audit trail, infra-heavy), Pieces MCP (proprietary long-term coding telemetry), TrueMem/Memori (commercial or paper-stage layers). Bare vector DBs (Pinecone, Qdrant, Chroma alone) were skipped per inclusion bar; MemPalace and Mem0 can use them as backends. Seed list had no stale entries removed this run; iai-mcp remains small but active. Four additions beyond seeds: agentmemory, Letta, LangMem, Cognee.