Vol. I ยท Issue 01Spring ยท 2026Builder live$49 USD ยท One-time

Blog

AI Agent Memory: How to Make Your Agent Remember

Most AI agents forget everything when the chat closes. Here's the three-tier memory architecture - frozen profile, searchable history, and a semantic index - that makes an agent actually remember.

AI Agent Memory: How to Make Your Agent Remember

Most "AI agents" forget everything the moment the chat window closes. You tell it your rules on Monday, and on Tuesday it asks the same clarifying questions like you never met. The fix is a real memory system: a small frozen profile the agent reads on every wake-up, a searchable archive of every past session, and a semantic index for the big stuff like books, transcripts, and your own customer data. Get those three layers right and the agent stops being a clever stranger and starts being something that compounds.

This is how AI agent memory works in practice, with the specific limits and pitfalls that bite people. Everything here assumes Hermes as the runtime, though the same architecture ports to OpenClaw.

The three tiers of AI agent memory

Memory is not one thing. It's three, and each tier trades cost against scope.

  • Tier 1 โ€” frozen profile. Two small markdown files the agent loads at the start of every session: memory.md (the agent's working notes about your projects and decisions) and user.md (who you are, your preferences, your rules). These are read on every wake. They are also size-capped: memory.md tops out around 2,200 characters and user.md around 1,375. Those are hard limits, not suggestions. Cram verbose preferences in here and you'll blow the cap or starve the context window.
  • Tier 2 โ€” searchable history. Every session you've ever had, stored in SQLite with full-text search. The agent doesn't load this on wake; it queries it when a question needs history. This is your cheap, always-on archive.
  • Tier 3 โ€” semantic / pluggable. The big external memory: a vector index for large or growing corpora, or a local fact store. This is where books, video transcripts, PDFs, and voice-of-customer data live. You plug in what fits โ€” a local holographic store for privacy, or Pinecone when scale matters.

The rule of thumb: identity and rules go in Tier 1, raw conversation history in Tier 2, and anything large or semantic in Tier 3. Putting verbose content in Tier 1 is the single most common mistake, and it silently degrades every response.

Tier 1: keep the frozen profile ruthless

Because memory.md and user.md load on every single turn, every character costs you. Treat them like a business card, not a binder.

Good user.md content:

# User
- Name: Sam, solo founder, B2B SaaS (invoicing)
- Timezone: America/New_York
- Hard rules: never send email autonomously โ€” draft only.
  Never push to main without asking.
- Tone: terse. Skip preamble. Lead with the answer.

That's it. The agent now knows your rules on every wake without you repeating them. Anything longer than a few lines belongs in Tier 2 or in a skill.

Tier 2: the searchable archive you already have

You don't have to build Tier 2. Every conversation is already logged to SQLite with full-text search. The leverage is in *seeding* a richer memory store from history you already generated.

A local fact store (no cloud, no embedding cost) can be bootstrapped from your existing chats with one prompt:

Read all previous sessions as well as user.md and memory.md,
then seed the holographic fact store.

That single instruction pulls a dozen-plus structured facts about you out of past conversations instead of starting blank. It's the difference between an agent that "knows" you on day one versus day ninety.

Tier 3: build a knowledge base from your own work artifacts

Here's the non-obvious move that beats any documentation effort: your existing work artifacts contain richer signal about you than anything you'd sit down and write on purpose.

Session logs, sent emails, and video transcripts are full of how you actually think, where you get stuck, and which decisions you reversed. You can ingest them into a personal knowledge base โ€” an Obsidian-style wiki with cross-linked entries for goals, projects, voice, values, and recurring frustrations.

The raw sources worth ingesting:

  • Agent session logs (your prompts reveal voice; the responses reveal project state)
  • Gmail export via Google Takeout โ€” deselect everything, select Mail only, download the .mbox
  • Video transcripts (.srt files) โ€” hours of unfiltered natural speech, the richest voice source you have
  • Old project logs so you can ask "what was I thinking when I built this?" and get an answer from the actual record

One prompt to build it:

I have raw data in ~/personal-kb/raw/. Build a knowledge base
in ~/personal-kb/wiki/ in Obsidian markdown. Create cross-linked
files for: goals, projects, working style, voice, frustrations,
values. Clean the sources as needed. Keep all data local.

Then connect it back as Tier 3 memory so the agent searches your history before answering context-dependent questions.

A real example of what this surfaces: one builder's knowledge base pulled the exact quote *"way before you waste another hour of my life, double-check everything for stupid mistakes."* That's not a preference you'd think to write down. But now the agent knows the actual frustration โ€” wasted iteration time โ€” and adjusts.

One safety note that matters: keep raw sources local and .gitignore them. The generated wiki will contain email addresses, company names, and personal detail. Treat it as confidential. (More on locking agents down in our guide to AI agent security.)

When to reach for a vector database

For a handful of files, a local Obsidian-style index is fine. But it reloads the index file on every call, so token cost grows with file length. Once your corpus gets large or keeps growing, move to Pinecone โ€” it scales far better.

Pick the embedding model for the job, not by reflex:

  • Plain text? Use multilingual E5 large in Pinecone. It's excellent and avoids the free-tier call limits you'll hit on multimodal models (roughly 1,000 calls/day on upload *and* query).
  • Audio, video, images, diagrams? Use a true multimodal embedding model so the agent can return the actual page-82 diagram, not just text about it.

Two pitfalls that cost real money:

  • Images aren't embedded unless you say so. Explicitly request graphics ("vectorize with graphics, show no more than 2โ€“3") or your RAG stays text-only and silently drops every diagram.
  • Dedup on re-scrape. If you run a daily refresh job to keep the knowledge base current, you must say *"check if this is already scraped; if so, skip it โ€” don't re-upsert."* Leave that out and the job re-uploads your entire corpus every single run. You can offload that refresh to a scheduled job so it runs whether your laptop is open or not (see cron automation for AI agents).

A morning routine that uses memory well

The cleanest pattern that ties memory together: split heavy reasoning from fast delivery. A 6am job (cheap model) reads all your history, logs, and notes, then writes a synthesized brief to a dream.md file. A 7am job reads that file, adds today's weather and calendar, and delivers it. The expensive reasoning gets pre-computed once; any cheap job can consume the result. That pre-computed file is itself a form of memory โ€” readable by any model, any time.

FAQ

What is AI agent memory?

It's the system that lets an agent retain information across sessions instead of starting fresh each time. In practice it's three layers: a small frozen profile read on every wake, a searchable archive of past sessions, and a semantic index for large knowledge like documents and transcripts.

How is agent memory different from RAG?

RAG (retrieval-augmented generation) is one layer of memory โ€” the semantic Tier 3 where you vectorize documents and retrieve relevant chunks. Full agent memory also includes the frozen identity profile and the full-text history of past conversations, which RAG alone doesn't cover.

How much can the agent's core memory hold?

The frozen profile files are deliberately small โ€” roughly 2,200 characters for the agent's working notes and 1,375 for the user profile. They load on every turn, so they're kept tiny. Everything verbose belongs in the searchable archive or a vector index.

Can I build agent memory from data I already have?

Yes, and you should. Session logs, sent emails, and video transcripts carry richer signal than documentation you'd write on purpose. One ingestion prompt can turn them into a cross-linked knowledge base the agent queries before answering.

Start with a memory system that compounds

A private agent that remembers your rules, your projects, and your voice is the whole point โ€” and it's the layer most people skip. The OpenClawCrew starter kits ship with the three-tier memory architecture pre-wired, so your agent knows you from day one instead of day ninety. Grab a kit for $49, or have us set the whole thing up โ€” memory, knowledge base ingestion, and all โ€” done-for-you.

Next, give that memory something to act on: see how AI agent skills turn proven procedures into reusable capabilities, and how AI agent optimization makes the whole system get measurably better every week.