Agents forget by default. This page maps tools whose main job is to persist and retrieve context across sessions: dedicated memory layers, temporal graphs, MCP servers for coding assistants, and framework memory subsystems. Updated 21 June 2026 after a seed-plus-discovery sweep; benchmark numbers are cited with conditions, not ranked.
Mem0
What it is
Mem0 is a framework-agnostic memory layer: it extracts facts from conversations, stores them (vector search plus entity linking and BM25 in recent versions), and injects relevant memories on later turns. It ships as Python/JS SDKs, a self-hosted Docker stack, or a managed platform at mem0.ai.
Pros
- Large community (~59k GitHub stars as of June 2026) and broad integrations (LangChain, CrewAI, LlamaIndex, Vercel AI SDK, CLI, agent signup flow) per README
- Apache 2.0 OSS plus optional managed cloud; self-hosted server with API keys (docs)
- April 2026 algorithm refresh: Mem0 reports 91.6% on LoCoMo and 94.8% on LongMemEval with ~7k tokens per query on their production stack (README benchmark table); open evaluation framework to reproduce
- YC-backed; company cites a $24M Series A (October 2025) in third-party comparisons; SOC 2 / HIPAA on managed tier (vectorize.io comparison)
Cons / limitations
- Requires an external LLM and embedding model for extraction and search (defaults include OpenAI models; other providers configurable) (docs)
- Managed advanced features (graph mode, scale) sit behind paid plans; pricing not verified line-by-line here
- Benchmark scores are vendor-reported on their own stack; independent LongMemEval numbers cited by competitors differ (e.g. Vectorize cites ~49% in one independent eval for Mem0 vs higher self-reported figures)
Best fit
Teams that want the fastest path to per-user memory in an existing chatbot or agent stack without adopting a full agent runtime. Less ideal if bi-temporal “what was true when?” queries are the core requirement (Graphiti/Zep is stronger there).
MemPalace
What it is
MemPalace is a local-first memory system for AI assistants. It stores verbatim conversation and project text, indexes it with semantic search (ChromaDB by default, pluggable backends), and exposes 33+ MCP tools plus CLI workflows (`mine`, `search`, `wake-up`). Optional temporal knowledge-graph layer on SQLite.
Pros
- No API key required for core retrieval path; embeddings run locally (README)
- MIT license; PyPI package and Docker image; explicit impostor-site warning in official repo
- Strong retrieval benchmarks on published harness: 96.6% R@5 on LongMemEval (raw semantic, no LLM) and reproducible scripts in `benchmarks/` (BENCHMARKS.md)
- Hooks for Claude Code, Codex, and Cursor to auto-save sessions before compaction
Cons / limitations
- Verbatim storage can grow large; hybrid/rerank modes may call an external LLM for the last mile
- Palace metaphor and MCP surface have a learning curve vs a single `add/search` API
- README deliberately avoids head-to-head QA accuracy comparisons with Mem0/Zep because metrics differ (retrieval recall vs end-to-end QA)
Best fit
Developers who want offline or self-hosted coding-assistant memory with auditable benchmarks. Wrong choice if you need a managed multi-tenant SaaS with compliance paperwork out of the box.
Supermemory
What it is
Supermemory is a memory and context engine: it extracts user facts, maintains profiles, runs hybrid RAG+memory search, and offers connectors (Google Drive, Gmail, Notion, GitHub, etc.). Available as cloud API, MCP server, plugins (Claude Code, Cursor, OpenClaw), and Supermemory local (single binary, `localhost:6767`).
Pros
- One API covers memory extraction, profiles, document ingestion, and hybrid search (README)
- Local install path: `curl -fsSL https://supermemory.ai/install | bash` with Ollama support for fully offline use (self-hosting docs)
- Publishes MemoryBench for comparing providers; claims #1 on LongMemEval (81.6%), LoCoMo, and ConvoMem on their leaderboard (self-reported, June 2026 README)
- First-party integrations including `@supermemory/tools/mastra` wrapper
Cons / limitations
- License for the full product stack was not verified as pure OSS in this pass; hosted platform is the primary commercial path despite a local binary
- Benchmark leadership claims are vendor-published; competitors dispute comparability of LoCoMo vs LongMemEval scoring (Vectorize Hindsight article)
- Enterprise-only features may apply for some deployment modes (local vs enterprise docs distinguish tiers)
Best fit
Product teams wanting memory + connectors + profiles in one vendor, with a credible local prototype path. Less ideal if you require a simple Apache-licensed library with no hosted upsell.
Graphiti / Zep
What it is
Graphiti (getzep/graphiti) is an open-source temporal context graph engine: episodes become entities and relationships with validity intervals, so facts can be superseded without erasing history. Zep is the managed platform built on Graphiti for production context retrieval (Graphiti README). Paper: Zep: A Temporal Knowledge Graph Architecture for Agent Memory (January 2025).
Pros
- First-class bi-temporal model (valid time vs transaction time) for “what did we believe on date X?” (Graphiti docs)
- Graphiti is self-hostable with Neo4j, FalkorDB, or Neptune backends; Python/TS/Go SDKs
- Zep reports 63.8% on LongMemEval (GPT-4o) and sub-200ms cloud retrieval in vendor materials (Vectorize comparison table); Graphiti ~24k stars
- Strong fit for evolving user state (CRM, support, compliance) where facts change
Cons / limitations
- Full Zep user/thread storage and SLAs are managed-only; Graphiti alone requires you to build auth, tenancy, and ops
- Needs graph DB ops expertise for self-hosted Graphiti
- Requires LLM calls for episode ingestion and entity extraction
Best fit
Applications where facts change over time and you must query history, not just similarity. Overkill for a static FAQ bot; use Mem0 or LangMem for simpler profile memory.
Hindsight
What it is
Hindsight (MIT, ~16.7k stars) is an agent memory server from Vectorize.io. It organizes memories into world facts, experiences, and mental models, with three operations: `retain`, `recall`, and `reflect`. Retrieval merges semantic, keyword, graph, and temporal strategies.
Pros
- Docker one-liner or embedded Python (`hindsight-all`); optional Oracle AI Database for enterprise (README)
- LLM wrapper path: two lines to wrap an existing client for automatic retain/recall
- Vendor claims state-of-the-art LongMemEval performance; README states scores were independently reproduced by Virginia Tech Sanghani Center and The Washington Post (other vendor scores listed as self-reported)
- MIT license; managed Hindsight Cloud optional
Cons / limitations
- Requires LLM API for retain/reflect pipelines (OpenAI default in quickstart)
- Heavier than a flat vector store; may be overkill for simple chat personalization
- Production footprint at Fortune 500 cited by vendor; independent third-party benchmark reproduction details not fully verified here beyond README claim
Best fit
Long-running autonomous or employee-style agents that must learn from feedback and reflect on past work, not just recall chat snippets. Not the lightest option for a weekend MCP experiment.
Mastra Memory
What it is
Mastra is a TypeScript agent framework; `@mastra/memory` adds conversation history, working memory (structured user fields), semantic recall, and Observational Memory (background agents compress old turns into dense observations) (Memory docs). Storage via LibSQL, Postgres, etc.
Pros
- Memory is a first-class package with thread/resource scoping and multi-agent isolation rules (docs)
- Observational Memory targets context-window limits without losing long-horizon notes
- Integrates with Mastra Studio tracing for debugging what entered context
- Supermemory ships an official Mastra wrapper (`@supermemory/tools/mastra`)
Cons / limitations
- Memory features are coupled to Mastra; not a drop-in layer for arbitrary stacks (contrast Mem0)
- Requires configuring a Mastra storage provider
- Benchmarks as a standalone memory product were not found; compare as framework subsystem
Best fit
Teams already building on Mastra who want opinionated memory processors in one stack. Wrong choice if you only need a memory API inside LangChain or raw Express.
iai-mcp (iai-pme)
What it is
iai-mcp (iai-pme: personal memory engine) is a local MCP server that captures sessions verbatim, clusters them into a personal map, and injects relevant context at session start. Custom storage, community detection, and hyperdimensional substrate (not a thin Chroma wrapper) per README.
Pros
- Fully local: no API key, account, or telemetry for the memory engine (README)
- Ships reproducible benchmark harness; author reports daily personal use
- MCP-native for Claude Code and other hosts
Cons / limitations
- Smaller community (~282 GitHub stars vs tens of thousands for MemPalace/Mem0) as of June 2026
- Single-maintainer feel; status/limitations section in README should be read before production bets
- License and latest release cadence not re-fetched in this pass (verify on repo)
Best fit
Privacy-maximalists running MCP coding assistants who want a bespoke local engine with published micro-benchmarks. Not the default pick for team-wide managed memory.
agentmemory
What it is
agentmemory (package/site: agent-memory.dev) is an MCP-native persistent memory server for coding agents, built on the iii engine. It targets Claude Code, Cursor, Copilot CLI, Codex, Windsurf, and other MCP clients with hybrid search, lifecycle, and knowledge-graph features (README).
Pros
- Very active project: ~23.5k+ stars, Apache 2.0, 49 releases (latest v0.9.27 June 2026 per GitHub metadata)
- One MCP server shared across tools; skills/hooks ecosystem
- Implements Karpathy-style LLM wiki patterns with confidence scoring and graph-backed retrieval (vendor README)
Cons / limitations
- Focused on developer/coding-agent workflows, not general customer-support chatbots
- Requires running the agentmemory server (Node/iii stack); not a three-line pip install
- Independent benchmark reproduction not verified in this pass (project claims benchmark leadership on site)
Best fit
Teams living in MCP coding agents who want shared memory across Cursor, Claude Code, and CLI tools. Skip if you need enterprise temporal KG semantics.
Letta
What it is
Letta (formerly MemGPT research) is an agent runtime with OS-inspired memory: agents use tools to read/write memory blocks across tiers (core vs archival/recall). Apache 2.0, ~23k stars. Letta Code CLI adds terminal agents with skills and “dreaming”/sleep-time compute.
Pros
- Agents self-edit memory via tools (MemGPT pattern) rather than passive extraction (Letta docs)
- Self-hostable Docker server (`letta/letta`) with REST API and Python/TS SDKs
- Felicis $10M seed cited in comparisons; strong research lineage (UC Berkeley MemGPT paper)
Cons / limitations
- You adopt the Letta runtime, not a pluggable memory library (Mem0 vs Letta comparison)
- Published LongMemEval scores were not found in primary docs (competitors note “not published”)
- Managed Letta Cloud pricing ($20–200/mo cited by third parties; verify at letta.com)
Best fit
Long-horizon autonomous agents where the model should decide what to remember and evict. Poor fit if you only need to bolt memory onto an existing LangGraph graph.
LangMem
What it is
LangMem (MIT) provides memory primitives and tools for LangGraph agents: `create_manage_memory_tool`, `create_search_memory_tool`, plus background memory managers. Stores via LangGraph’s `BaseStore` (Postgres, Redis, in-memory) (README).
Pros
- Native to LangGraph Platform deployments; no separate memory microservice required
- Supports hot-path (agent-managed) and background extraction modes
- MIT license; small, focused scope (~1.5k stars)
Cons / limitations
- LangGraph lock-in for the integrated path; not framework-agnostic
- No bi-temporal graph; episodic memory noted as evolving in launch materials
- Smaller ecosystem than Mem0 or Letta
Best fit
Agents already on LangGraph/LangChain who want memory tools without operating a second vendor. Wrong choice for a Rust or Mastra-only stack.
Cognee
What it is
Cognee (~18.4k stars) is an open-source AI memory platform: ingest documents and events, run an Extract → Cognify → Load pipeline into a self-hosted knowledge graph plus vectors, then `remember` / `recall` / `forget` / `improve` (README). MCP and Claude Code plugin available.
Pros
- Graph-native memory with ontology generation; multimodal ingestion paths (docs)
- Self-host (Docker, Railway, Fly.io, etc.) or Cognee Cloud; Python SDK and CLI
- $7.5M seed and 70+ production deployments claimed on vendor blog (Cognee guide)
- Research paper: Optimizing the Interface Between Knowledge Graphs and LLMs (2025)
Cons / limitations
- Heavier operational footprint than Mem0; LLM API required for cognify pipeline
- MCP UI path expects Docker (README note)
- Open-core vs cloud feature split should be checked on current pricing page
Best fit
Teams building institutional / company-brain agents that must unify documents and sessions in a evolving graph. Overkill for single-user coding MCP memory.
Comparison table
| Tool | License/Model | Architecture | Self-host? | Best fit |
|---|---|---|---|---|
| Mem0 | Apache 2.0 + managed cloud | Vector + entity/BM25 hybrid extraction | Yes (Docker server) | Fast per-user memory in any framework |
| MemPalace | MIT | Verbatim text + vector (pluggable backend) + optional temporal KG | Yes (local default) | Offline coding-assistant memory with published retrieval benchmarks |
| Supermemory | OSS local binary + hosted platform | Memory engine + profiles + hybrid RAG | Yes (`supermemory-server` local) | Memory + connectors + profiles in one API |
| Graphiti / Zep | Graphiti OSS; Zep managed | Temporal knowledge graph | Graphiti yes; Zep cloud-first | Facts that change over time; audit history |
| Hindsight | MIT + cloud | World / experience / mental model + multi-strategy recall | Yes (Docker / embedded) | Reflective agents that learn from outcomes |
| Mastra Memory | Apache 2.0 (framework) | Thread history + working memory + semantic recall + observational compression | Yes (with Mastra storage) | Mastra-native agents needing long threads |
| iai-mcp | Verify on repo | Local MCP; custom HD / clustering store | Yes (local only) | Solo dev MCP hosts, maximum privacy |
| agentmemory | Apache 2.0 | MCP server; hybrid + graph (iii engine) | Yes | Shared memory across coding MCP clients |
| Letta | Apache 2.0 + cloud | Tiered self-editing memory blocks in agent runtime | Yes (Docker) | Autonomous agents managing their own memory |
| LangMem | MIT | LangGraph store + agent memory tools | Yes (Postgres/Redis backends) | LangGraph agents without extra service |
| Cognee | OSS + Cognee Cloud | ECL pipeline → knowledge graph + vectors | Yes | Company-brain / multi-source institutional memory |
How to choose
Drop-in layer vs full runtime. Mem0, Supermemory, and Hindsight expose memory as a service or library you attach to existing code. Letta and Mastra Memory expect you to live inside their agent harness. LangMem sits in between for LangGraph only.
Where data must live. MemPalace, iai-mcp, agentmemory (self-hosted), Graphiti, and Cognee can keep data on your machine or VPC. Supermemory local and Mem0 OSS do too, but many teams default to vendor clouds for speed.
Hardest requirement. If time and contradiction dominate (CRM, compliance, “when did status change?”), prioritize Graphiti/Zep or Cognee. If coding-session continuity dominates, prioritize MemPalace, agentmemory, or iai-mcp MCP paths. If agent self-improvement dominates, look at Letta or Hindsight’s `reflect`.
Budget. Count LLM calls for extraction, reflection, and reranking, not just storage. MemPalace’s core retrieval path avoids LLM tax; Mem0, Hindsight, Cognee, and Graphiti typically invoke models on write paths.
Benchmarks. LoCoMo and LongMemEval measure conversational recall under specific splits and graders; retrieval R@k ≠ end-to-end answer accuracy. Treat vendor leaderboard posts as claims until reproduced on MemoryBench or project harnesses.
Also surveyed
Discovered but excluded from full sections: LangGraph/LlamaIndex Memory (composable buffers inside larger frameworks), Dakera (Rust on-device MCP engine, 44MB binary), teddashh/mcp-memory-server (4-layer MCP audit trail, infra-heavy), Pieces MCP (proprietary long-term coding telemetry), TrueMem/Memori (commercial or paper-stage layers). Bare vector DBs (Pinecone, Qdrant, Chroma alone) were skipped per inclusion bar; MemPalace and Mem0 can use them as backends. Seed list had no stale entries removed this run; iai-mcp remains small but active. Four additions beyond seeds: agentmemory, Letta, LangMem, Cognee.