Agent Memory

How an AI agent remembers across turns and sessions. Four kinds of memory, three places vendors oversell it, and the data-governance question nobody wants to answer.

Agents & Automation

The Technical Definition

Agent memory is how an AI agent stores and retrieves information across turns within a session and across sessions over time. It’s typically split into four categories.

Short-term memory is the model’s context window — the tokens it can see right now. It’s bounded (usually 200K to 2M tokens in 2026) and resets every session.

Long-term memory is information persisted in an external store — most commonly a vector database, sometimes a structured database — that the agent retrieves from when relevant. This is what lets an agent “remember” your name across conversations.

Episodic memory stores specific past interactions — “the customer asked about pricing on March 14th and I quoted X.” It’s used for personalization and continuity.

Procedural memory stores skills and learned behaviors — patterns the agent has observed working and tries to repeat. In current systems this often shows up as cached prompt templates, reusable sub-agent configurations, or learned tool-use sequences.

What This Actually Means for Your Business

Every vendor pitching an “agent that learns your business” is selling some configuration of these four. The pitch usually conflates them on purpose, because the boring answer (“we put your past chats in a vector database and retrieve from it”) doesn’t sell as well as “the agent learns.”

Three places this oversell shows up.

The first is the “infinite memory” claim. There is no such thing. Long-term memory is bounded by what you’ve stored, retrieval quality is bounded by your embedding model, and at inference time the agent only sees what fits in the context window after retrieval. If the relevant memory doesn’t get retrieved, the agent doesn’t know it. Vendors call this “personalization.” Your team experiences it as “the agent forgot what we talked about yesterday.”

The second is the “self-improving agent” claim. Procedural memory in current systems is mostly templated, not learned. The agent isn’t getting smarter the way a junior analyst gets smarter. It’s accumulating examples that get retrieved into its context. That’s useful — but it’s caching, not learning. If you stop curating the examples, the improvement stops.

The third — and this is the one that gets escalated to legal six months in — is the data-governance problem. Long-term and episodic memory store your customer conversations, your internal data, sometimes your PII, in a vector database the vendor manages. Who has access? How long is it retained? What happens if a customer requests deletion under GDPR or CCPA? Can you audit what the agent “remembers” about a specific person? Most vendor contracts handle this poorly because most enterprise buyers don’t ask until it’s too late.

Reality Check

What the vendor says: “Our agent has persistent memory and learns from every interaction.”

What that means in practice: Your conversations get embedded into a vector database. Future conversations retrieve the closest matches and stuff them into the context window. The “learning” is retrieval. If you delete a customer’s data, you need a deletion path through that database — and most vendors don’t have one until you ask.

What Operators Actually Do

Teams running agents in production treat memory as a data system, not a feature. They define what gets stored (specific facts, preferences, past actions), what doesn’t (sensitive PII, regulated data, anything you can’t audit), and how long things live before expiring. They build deletion paths up front, not after the first compliance request.

They also separate memory by trust level. The agent’s memory of a customer’s account number gets one treatment. Its memory of a casual preference gets another. Mixing them in a single vector database — which is what most off-the-shelf agent platforms do by default — creates audit problems you’ll be untangling for a year.

Smart deployments also benchmark retrieval quality. The question isn’t “does the agent have memory?” — it’s “when something is in memory, does the agent actually retrieve it correctly?” If your retrieval rate on relevant prior context is 60%, the agent is going to feel forgetful no matter how much you’ve stored.

The Questions to Ask

Which memory types does this agent use, and where is each one stored? Short-term, long-term, episodic, procedural — they have different cost, latency, and compliance profiles. Vendors who answer “it just remembers” don’t know the answer.
What’s the deletion path? A customer asks for their data to be removed. How do you find every place their information is stored — including every embedding in every vector database — and remove it? Get this in writing.
How do you measure retrieval quality? Storing memory is easy. Retrieving the right memory at the right time is hard. What’s the hit rate, and how do you monitor it over time as the memory store grows?

The Technical Definition

What This Actually Means for Your Business

Reality Check

What Operators Actually Do

The Questions to Ask

One operator. Every other Wednesday.