MemoryOS: Building Long‑Term Memory for AI Agents — A Peek Inside the New Memory Operating System

By - All About Artificial Newsroom
August 1, 2025
AI News

Introduction

Large language models have a short memory. Ask ChatGPT about a conversation you had yesterday and it usually draws a blank. As AI assistants become more agentic — taking actions on your behalf, planning your week, or writing code — long‑term memory becomes essential. That’s where MemoryOS comes in. Released by the BAI‑LAB research group, MemoryOS aims to be an operating system for artificial agents, providing modular storage for short‑term, mid‑term and long‑term memories. It quietly debuted its MemoryOS‑MCP server in mid‑June and has been iterating rapidly. While mainstream tech sites have yet to cover it, the project is attracting attention from developers who want their agents to remember who you are, what you’ve done and why.

This post examines what MemoryOS is, what recent updates reveal, and why persistent memory could reshape AI. We’ll also consider the ethical questions raised by AIs that remember your life as well as (or better than) you do.

What We Discovered

The release of MemoryOS‑MCP

On June 15, 2025, BAI‑LAB open‑sourced MemoryOS‑MCP, a server that allows agent clients to integrate long‑term memory through a Model Context Protocol (MCP) interface. In essence, MCP acts like a bridge between the agent and its memory modules, enabling structured reads and writes. The initial release introduced three memory layers:

Short‑term memory — a buffer of recent question–answer pairs, akin to working memory in humans.
Mid‑term memory — session histories with heat tracking, storing the context of tasks and marking which topics are “hot” or likely to be referenced again.
Long‑term knowledge — persistent facts and user profiles saved across sessions, stored as JSON files on disk.

These modules aim to solve a common problem in language models: context windows fill up quickly, and agents forget earlier interactions. By offloading information into structured memory and retrieving it when needed, MemoryOS promises continuity across chats and tasks.

Rapid iterations: July updates

Since open‑sourcing, the project has seen a burst of development. According to the repository’s “News” section, updates between July 7 and July 15 brought significant improvements:

Performance boost. On July 7, the implementation was upgraded to be five times faster thanks to parallelization and support for R1 models such as DeepSeek‑R1 and Qwen 3. Faster memory operations mean agents can recall and write memories without slowing down conversation.
New embeddings and vector stores. MemoryOS added support for BGE‑M3 and Qwen 3 embeddings and introduced the ChromaDB vector database. Vector stores allow semantic search over thousands of memory chunks, enabling the agent to find relevant facts even if the exact words differ.
Docker and playground. The July 15 update integrated Docker for easier deployment and launched a MemoryOS playground at baijia.online, available by invitation. The playground lets users visualize and tweak the memory system through a web interface, hinting at an eventual cloud offering.
Evaluation and customization. On July 9 and July 8, the team reported evaluation on the LoCoMo dataset and added a similarity_threshold parameter to control retrieval strictness. This lets developers tune how similar a query must be to stored memories for the agent to recall it, balancing precision and recall.

These updates show that MemoryOS is not a static concept but a rapidly evolving platform. The team is incorporating new embeddings as they emerge and optimizing for speed — both critical for real‑world deployment.

How MemoryOS works under the hood

MemoryOS structures memory as a set of tables in a local database (currently SQLite or a vector store like Chroma). Each memory item contains metadata such as timestamp, relevance score and tags. When a user interacts with the agent, the MCP server receives the request and determines which memories to update or retrieve. Long‑term facts may be indexed by keywords or embeddings so that semantically related information can be found even if the wording differs.

Because MemoryOS is modular, developers can choose which memory layers to enable. For instance, a customer service bot might use only mid‑term and long‑term memory to track a conversation across days, while a coding agent may rely on short‑term memory for local context and vector search for API documentation. The project’s README lists support for multiple model providers — OpenAI, Anthropic, DeepSeek, Qwen and vLLM — and tools like LangGraph and PandasAI. This means MemoryOS can be integrated into existing agent frameworks with minimal code changes.

Why It Could Matter

Implications for users

From a user perspective, long‑term memory transforms an AI assistant from a fancy autocomplete into something approaching a digital aide. Imagine an agent that remembers your favourite restaurants, your daughter’s birthday and the fact that you’re allergic to peanuts. When you ask it to plan a dinner, it won’t suggest Thai food because it knows you react badly to peanuts. MemoryOS makes this possible by persisting user preferences and surface facts across sessions. It also means you no longer have to repeat yourself — a common frustration with current chatbots. However, this convenience raises questions about privacy and consent. Who owns your memories stored in the agent? How are they encrypted, and who can access them? These are debates society has only begun to explore.

Implications for developers

Developers stand to gain a ready‑made memory layer that they can plug into any agent. Instead of reinventing vector search, caching and persistence, they can call MemoryOS‑MCP via the same protocol used by GitHub’s agent mode. The project’s support for various embeddings allows developers to experiment with different retrieval algorithms without rewriting their code. The introduction of a similarity_threshold parameter also gives developers fine control over recall, making it easier to tune the agent’s behavior. In practice, this could shorten development cycles for complex agents such as personalized tutors, research assistants or game AIs.

Implications for businesses

Companies exploring AI assistants for customer support, sales or operations often hit a wall when the model forgets previous interactions. MemoryOS offers a solution: maintain an index of past conversations and user data that can be referenced to provide consistent, personalized responses. The ability to self‑host via Docker and integrate with vector databases like ChromaDB means businesses can keep their memory data on their own servers, meeting regulatory requirements. Over time, such persistent memory could enable more advanced workflows, such as agents that automatically triage support tickets based on historical outcomes or suggest upsell opportunities based on prior purchases. However, companies must implement robust data governance to avoid storing sensitive information that could be misused.

Societal and ethical considerations

Long‑term memory in AI raises deep ethical questions. If your agent stores years of conversations, does it have a duty to forget on request? What happens if the memory is subpoenaed in a legal case? MemoryOS provides the technical mechanism, but it doesn’t solve the normative questions of memory rights, consent, and the psychological impact of delegating memory to a machine. The project’s open‑source nature invites researchers and ethicists to audit and improve the system, but it will also invite scrutiny from regulators. As one Reddit user in a memory upgrade discussion noted, ChatGPT’s cross‑chat memory “takes a snapshot at the moment the new chat begins and does not dynamically update”. MemoryOS’s persistent layers are even more durable; clear policies will be needed to manage them.

Web & Social Clues

While MemoryOS has yet to trend on X.com or mainstream Reddit, there are signs of grassroots interest:

Community testing guides. A user known as u/Sea‑Brilliant7877 on r/ChatGPT posted a detailed guide on how OpenAI’s memory system works. The guide explained that new chats access prior conversations as they existed at the moment the chat began and described a snapshot‑based model. MemoryOS takes this idea further by introducing mid‑term and long‑term layers.
Tutorial videos. Independent AI engineer Gao Dalie posted a tutorial (originally on LinkedIn, now mirrored on other platforms) explaining that MemoryOS MCP can support short‑term, mid‑term and long‑term memories stored as JSON files. He noted that the system uses a FIFO (First In, First Out) approach to manage memory and integrates with retrieval‑augmented generation.
GitHub stars. The MemoryOS repository has attracted hundreds of stars and forks, indicating developer curiosity. Comments praise its modularity and ask for support for additional embeddings. The maintainers have responded by quickly adding features like BGE‑M3 and Qwen 3.

Trend Connections

MemoryOS sits at the nexus of several trends:

Retrieval‑augmented generation (RAG). Models like GPT‑4o can fetch documents from an external knowledge base to supplement their training data. MemoryOS provides a structured, personal knowledge base that agents can query via embeddings and similarity thresholds.
Agent frameworks and MCP. By adhering to the Model Context Protocol, MemoryOS can integrate with any MCP‑enabled agent. This synergy with GitHub’s agent mode and local MCP server facilitates a modular agent ecosystem.
Personalization vs. privacy. The drive to make AI assistants more helpful pushes toward personalization. MemoryOS enables that, but also underscores the need for privacy protections, encryption and the ability to delete memories on demand.
Edge computing. As models and memory systems become more efficient, running them on local devices becomes feasible. Combined with on‑device LLMs, MemoryOS could power offline personal assistants.

Key Takeaways

MemoryOS‑MCP is an open‑source server that lets AI agents read and write long‑term, mid‑term and short‑term memories via the Model Context Protocol.
Recent July updates introduced five‑times faster performance, support for new embeddings (BGE‑M3, Qwen 3) and vector databases like ChromaDB, plus a Dockerized deployment and a playground site.
MemoryOS structures memory into layers and works with multiple model providers and agent frameworks, making it easy to integrate into custom agents.
Long‑term memory could make AI assistants more personal and useful, but raises privacy and ethical questions about who controls the data and how it’s used.
Early interest from developers and community testers suggests a growing appetite for agentic systems that remember; as the ecosystem evolves, expect debates over memory rights, persistence and safety.