Back to Home
TECHNICAL GUIDE • 2025

Memory Architecturefor Your AI Agent

How we built a production AI agent with three memory layers — short-term conversation context, working session scratchpads, and persistent long-term memory — so it actually remembers who you are.

3Memory Layers
6Context Files
RAGVector Search
MCPTool Protocol
Memory SystemsContext EngineeringAgentic AIRAGMCPTelegram Bot

The Big Picture

Every user message flows through a 6-step pipeline that assembles context from multiple memory sources before calling the LLM.

AI Agent Core
Short-Term
Chat History (SQLite)
Working
Session Scratchpad
Long-Term
mem0.ai Cloud
RAG Pipeline
Vector Embeddings
Profile Context
SOUL / USER / MEMORY.md
Cron & Tasks
APScheduler Jobs

Three Layers of Memory

Just like human cognition, an AI agent needs different memory types for different purposes — immediate recall, active workspace, and permanent knowledge.

Short-Term Memory

Conversation Context

Immediate chat buffer. Stores the current session's message pairs (user/assistant) loaded into the LLM context window on every turn.

SQLite-backed session history
Configurable max history (default 20)
Per-user, per-chat isolation
Cleared on /reset command
conversation_context.py
history = get_session_history(session_id, max=20)
for msg in history:
  messages.append({role: msg.role, content: msg.content})

Working Memory

Session Scratchpad

Temporary in-flight reasoning. Tool results, intermediate state, and task context that exist only within the current session.

Tool call results & intermediate state
Active task tracking
Session-scoped — lost on restart
Multi-step reasoning chains
session_scratchpad.py
# Tool results stay in working memory
tool_result = await execute_tool(name, args)
tool_results.append({
  tool_use_id: id,
  content: tool_result
})

Long-Term Memory

Persistent Storage

Durable memory that survives restarts. mem0.ai automatically extracts and stores important facts, preferences, and decisions.

mem0.ai cloud-backed storage
Auto-extraction from conversations
Semantic search retrieval
User-scoped memory banks
persistent_storage.py
# Async background storage after every response
await add_memory([
  {role: "user", content: user_msg},
  {role: "assistant", content: response}
], user_id)

Long-Term Memory: File-Backed Persistence

Two file types power the long-term memory system — curated decisions and daily raw events.

MEMORY.mdManually curated
① Key Decisions
- User prefers concise responses
- Timezone: IST (UTC+5:30)
- Project: ClawdBot on Telegram
② Recurring Preferences
- Daily standup at 10 AM
- Markdown for all reports
- Always confirm before deleting
memory/YYYY-MM-DD.mdAuto-generated daily
① Raw Events
09:15 — User asked about RAG config
09:22 — Searched memory for "vector DB"
09:30 — Tool call: send_message
② Running Logs
10:00 — Cron: daily_summary triggered
10:01 — Retrieved 12 memories
10:02 — Generated summary → sent

Profile & Context Files

Structured context files loaded into every system prompt. They define the agent's identity, understand the user, and configure the environment.

SOUL.md

Agent persona, behavior guidelines, and operational limits

How should you behave?

USER.md

User preferences, identity, and communication style

How should I address you?

MEMORY.md

Persistent decisions, facts, dates, and preferences

What did we decide last week?

HEARTBEAT.md

Operational schedules, periodic checks, and automated checklists

Scheduled tasks?

TOOLS.md

Environment specifics, voice IDs, device names, and tooling config

ElevenLabs voice ID?

Cron Jobs

Background sessions, sub-agents, and scheduled operations

Background tasks?

RAG Pipeline: Semantic Search Over Chat History

Every message is embedded into a vector store. When a user asks a question, the agent retrieves semantically similar past messages to build context.

User Message
Embed (OpenAI)
Vector Search
Score Filter ≥ 0.3
Top-K Results
src/rag/retriever.py
async def retrieve(query, chat_id):
    # 1. Embed the query
    embedding = await embed(query)
    # 2. Search vector store
    results = vectorstore.search(
        embedding, top_k=10
    )
    # 3. Filter by score threshold
    return [
        r for r in results
        if r.score >= 0.3
    ]
src/rag/indexer.py
async def index_message(msg, chat_id):
    # 1. Create embedding
    embedding = await embed(msg.text)
    # 2. Store with metadata
    vectorstore.upsert(
        id=msg.id,
        embedding=embedding,
        metadata={
            "chat_id": chat_id,
            "user": msg.user_name,
            "timestamp": msg.date
        }
    )

Context Engineering: The Full Pipeline

On every user message, the agent assembles its context window from six sources — then calls the LLM with the complete picture.

1
Retrieve Memories
mem0.ai semantic search
2
Retrieve RAG Docs
Vector similarity search
3
Build System Prompt
Context engineering
4
Call LLM
Claude Opus 4.6
5
Execute Tools
MCP + Telegram tools
6
Store Memories
Async background task
src/agent/agent.py — process_message()
async def process_message(user_message, context):
    system_parts = [generate_system_prompt(context)]

    # ① Retrieve memories (mem0)
    if is_memory_enabled():
        memories = await search_memory(user_message, context.user_id)
        if memories:
            system_parts.append(format_memories(memories))

    # ② Retrieve RAG context (vector search)
    if config.rag.enabled and should_use_rag(user_message):
        rag_response = await retrieve(user_message, chat_id=context.chat_id)
        if rag_response.results:
            system_parts.append(build_context_string(rag_response.results))

    # ③ Build combined system prompt
    system_prompt = "\n\n".join(system_parts)

    # ④ Load conversation history (short-term)
    messages = get_session_history(context.session_id, max=20)
    messages.append({role: "user", content: user_message})

    # ⑤ Call LLM with all tools
    response = client.messages.create(
        model="claude-opus-4-6",
        system=system_prompt,
        messages=messages,
        tools=get_all_tools()  # MCP + Telegram + built-in
    )

    # ⑥ Store new memories (async background)
    if is_memory_enabled():
        asyncio.create_task(store_memories(user_message, response, context))

Tech Stack

Claude Opus 4.6
Core LLM
mem0.ai
Long-Term Memory
OpenAI Embeddings
RAG Vectors
SQLite
Session Store
MCP Protocol
Tool Integration
APScheduler
Cron & Tasks

Explore the Full Source Code

The complete Telegram ClawdBot — with memory, RAG, MCP tools, and scheduler — is open source. Clone it, deploy it, and build your own AI agent with production-grade memory.