Memory Architecturefor Your AI Agent
How we built a production AI agent with three memory layers — short-term conversation context, working session scratchpads, and persistent long-term memory — so it actually remembers who you are.
The Big Picture
Every user message flows through a 6-step pipeline that assembles context from multiple memory sources before calling the LLM.
Three Layers of Memory
Just like human cognition, an AI agent needs different memory types for different purposes — immediate recall, active workspace, and permanent knowledge.
Short-Term Memory
Conversation Context
Immediate chat buffer. Stores the current session's message pairs (user/assistant) loaded into the LLM context window on every turn.
history = get_session_history(session_id, max=20)
for msg in history:
messages.append({role: msg.role, content: msg.content})Working Memory
Session Scratchpad
Temporary in-flight reasoning. Tool results, intermediate state, and task context that exist only within the current session.
# Tool results stay in working memory
tool_result = await execute_tool(name, args)
tool_results.append({
tool_use_id: id,
content: tool_result
})Long-Term Memory
Persistent Storage
Durable memory that survives restarts. mem0.ai automatically extracts and stores important facts, preferences, and decisions.
# Async background storage after every response
await add_memory([
{role: "user", content: user_msg},
{role: "assistant", content: response}
], user_id)Long-Term Memory: File-Backed Persistence
Two file types power the long-term memory system — curated decisions and daily raw events.
Profile & Context Files
Structured context files loaded into every system prompt. They define the agent's identity, understand the user, and configure the environment.
SOUL.md
Agent persona, behavior guidelines, and operational limits
USER.md
User preferences, identity, and communication style
MEMORY.md
Persistent decisions, facts, dates, and preferences
HEARTBEAT.md
Operational schedules, periodic checks, and automated checklists
TOOLS.md
Environment specifics, voice IDs, device names, and tooling config
Cron Jobs
Background sessions, sub-agents, and scheduled operations
RAG Pipeline: Semantic Search Over Chat History
Every message is embedded into a vector store. When a user asks a question, the agent retrieves semantically similar past messages to build context.
async def retrieve(query, chat_id):
# 1. Embed the query
embedding = await embed(query)
# 2. Search vector store
results = vectorstore.search(
embedding, top_k=10
)
# 3. Filter by score threshold
return [
r for r in results
if r.score >= 0.3
]async def index_message(msg, chat_id):
# 1. Create embedding
embedding = await embed(msg.text)
# 2. Store with metadata
vectorstore.upsert(
id=msg.id,
embedding=embedding,
metadata={
"chat_id": chat_id,
"user": msg.user_name,
"timestamp": msg.date
}
)Context Engineering: The Full Pipeline
On every user message, the agent assembles its context window from six sources — then calls the LLM with the complete picture.
async def process_message(user_message, context):
system_parts = [generate_system_prompt(context)]
# ① Retrieve memories (mem0)
if is_memory_enabled():
memories = await search_memory(user_message, context.user_id)
if memories:
system_parts.append(format_memories(memories))
# ② Retrieve RAG context (vector search)
if config.rag.enabled and should_use_rag(user_message):
rag_response = await retrieve(user_message, chat_id=context.chat_id)
if rag_response.results:
system_parts.append(build_context_string(rag_response.results))
# ③ Build combined system prompt
system_prompt = "\n\n".join(system_parts)
# ④ Load conversation history (short-term)
messages = get_session_history(context.session_id, max=20)
messages.append({role: "user", content: user_message})
# ⑤ Call LLM with all tools
response = client.messages.create(
model="claude-opus-4-6",
system=system_prompt,
messages=messages,
tools=get_all_tools() # MCP + Telegram + built-in
)
# ⑥ Store new memories (async background)
if is_memory_enabled():
asyncio.create_task(store_memories(user_message, response, context))Tech Stack
Explore the Full Source Code
The complete Telegram ClawdBot — with memory, RAG, MCP tools, and scheduler — is open source. Clone it, deploy it, and build your own AI agent with production-grade memory.