TECHNICAL GUIDE • 2025

Memory Architecturefor Your AI Agent

How we built a production AI agent with three memory layers — short-term conversation context, working session scratchpads, and persistent long-term memory — so it actually remembers who you are.

3Memory Layers

6Context Files

RAGVector Search

MCPTool Protocol

Memory SystemsContext EngineeringAgentic AIRAGMCPTelegram Bot

The Big Picture

Every user message flows through a 6-step pipeline that assembles context from multiple memory sources before calling the LLM.

AI Agent Core

Short-Term

Chat History (SQLite)

Working

Session Scratchpad

Long-Term

mem0.ai Cloud

RAG Pipeline

Vector Embeddings

Profile Context

SOUL / USER / MEMORY.md

Cron & Tasks

APScheduler Jobs

Three Layers of Memory

Just like human cognition, an AI agent needs different memory types for different purposes — immediate recall, active workspace, and permanent knowledge.

Short-Term Memory

Conversation Context

Immediate chat buffer. Stores the current session's message pairs (user/assistant) loaded into the LLM context window on every turn.

SQLite-backed session history

Configurable max history (default 20)

Per-user, per-chat isolation

Cleared on /reset command

conversation_context.py

history = get_session_history(session_id, max=20)
for msg in history:
  messages.append({role: msg.role, content: msg.content})

Working Memory

Session Scratchpad

Temporary in-flight reasoning. Tool results, intermediate state, and task context that exist only within the current session.

Tool call results & intermediate state

Active task tracking

Session-scoped — lost on restart

Multi-step reasoning chains

session_scratchpad.py

# Tool results stay in working memory
tool_result = await execute_tool(name, args)
tool_results.append({
  tool_use_id: id,
  content: tool_result
})

Long-Term Memory

Persistent Storage

Durable memory that survives restarts. mem0.ai automatically extracts and stores important facts, preferences, and decisions.

mem0.ai cloud-backed storage

Auto-extraction from conversations

Semantic search retrieval

User-scoped memory banks

persistent_storage.py

# Async background storage after every response
await add_memory([
  {role: "user", content: user_msg},
  {role: "assistant", content: response}
], user_id)

Long-Term Memory: File-Backed Persistence

Two file types power the long-term memory system — curated decisions and daily raw events.

MEMORY.mdManually curated

① Key Decisions

- User prefers concise responses

- Timezone: IST (UTC+5:30)

- Project: ClawdBot on Telegram

② Recurring Preferences

- Daily standup at 10 AM

- Markdown for all reports

- Always confirm before deleting

memory/YYYY-MM-DD.mdAuto-generated daily

① Raw Events

09:15 — User asked about RAG config

09:22 — Searched memory for "vector DB"

09:30 — Tool call: send_message

② Running Logs

10:00 — Cron: daily_summary triggered

10:01 — Retrieved 12 memories

10:02 — Generated summary → sent

Profile & Context Files

Structured context files loaded into every system prompt. They define the agent's identity, understand the user, and configure the environment.

SOUL.md

Agent persona, behavior guidelines, and operational limits

“How should you behave?”

USER.md

User preferences, identity, and communication style

“How should I address you?”

MEMORY.md

Persistent decisions, facts, dates, and preferences

“What did we decide last week?”

HEARTBEAT.md

Operational schedules, periodic checks, and automated checklists

“Scheduled tasks?”

TOOLS.md

Environment specifics, voice IDs, device names, and tooling config

“ElevenLabs voice ID?”

Cron Jobs

Background sessions, sub-agents, and scheduled operations

“Background tasks?”

RAG Pipeline: Semantic Search Over Chat History

Every message is embedded into a vector store. When a user asks a question, the agent retrieves semantically similar past messages to build context.

User Message

→

Embed (OpenAI)

→

Vector Search

→

Score Filter ≥ 0.3

→

Top-K Results

src/rag/retriever.py

async def retrieve(query, chat_id):
    # 1. Embed the query
    embedding = await embed(query)
    # 2. Search vector store
    results = vectorstore.search(
        embedding, top_k=10
    )
    # 3. Filter by score threshold
    return [
        r for r in results
        if r.score >= 0.3
    ]

src/rag/indexer.py

async def index_message(msg, chat_id):
    # 1. Create embedding
    embedding = await embed(msg.text)
    # 2. Store with metadata
    vectorstore.upsert(
        id=msg.id,
        embedding=embedding,
        metadata={
            "chat_id": chat_id,
            "user": msg.user_name,
            "timestamp": msg.date
        }
    )

Context Engineering: The Full Pipeline

On every user message, the agent assembles its context window from six sources — then calls the LLM with the complete picture.

Retrieve Memories

mem0.ai semantic search

Retrieve RAG Docs

Vector similarity search

Build System Prompt

Context engineering

Call LLM

Claude Opus 4.6

Execute Tools

MCP + Telegram tools

Store Memories

Async background task

src/agent/agent.py — process_message()

async def process_message(user_message, context):
    system_parts = [generate_system_prompt(context)]

    # ① Retrieve memories (mem0)
    if is_memory_enabled():
        memories = await search_memory(user_message, context.user_id)
        if memories:
            system_parts.append(format_memories(memories))

    # ② Retrieve RAG context (vector search)
    if config.rag.enabled and should_use_rag(user_message):
        rag_response = await retrieve(user_message, chat_id=context.chat_id)
        if rag_response.results:
            system_parts.append(build_context_string(rag_response.results))

    # ③ Build combined system prompt
    system_prompt = "\n\n".join(system_parts)

    # ④ Load conversation history (short-term)
    messages = get_session_history(context.session_id, max=20)
    messages.append({role: "user", content: user_message})

    # ⑤ Call LLM with all tools
    response = client.messages.create(
        model="claude-opus-4-6",
        system=system_prompt,
        messages=messages,
        tools=get_all_tools()  # MCP + Telegram + built-in
    )

    # ⑥ Store new memories (async background)
    if is_memory_enabled():
        asyncio.create_task(store_memories(user_message, response, context))

Tech Stack

Claude Opus 4.6

Core LLM

mem0.ai

Long-Term Memory

OpenAI Embeddings

RAG Vectors

SQLite

Session Store

MCP Protocol

Tool Integration

APScheduler

Cron & Tasks

Explore the Full Source Code

The complete Telegram ClawdBot — with memory, RAG, MCP tools, and scheduler — is open source. Clone it, deploy it, and build your own AI agent with production-grade memory.

View on GitHub Build Your Own Agent