Memory & Context
Memory is what separates a useful agent from a frustrating one. Without it, every conversation starts from zero. With it, an agent learns your preferences, remembers what it worked on yesterday, and builds up knowledge over time.
The four types of memory
| Type | Lives in | Persists? | Best for |
|---|---|---|---|
| In-context (working) | Context window | Session only | Current task state |
| Persistent (long-term) | Database | Yes | User preferences, learned facts |
| Episodic (history) | Log / database | Yes | Past interactions, decisions |
| Semantic (knowledge) | Vector database | Yes | Searchable reference knowledge |
1. In-context memory (working memory)
Everything the model can see right now β the current conversation, shared documents, recent tool outputs, and any injected facts. This lives in the context window and disappears when the session ends.
The context window has a hard size limit (measured in tokens). Fill it past the limit and older content gets dropped. This means a long, unmanaged conversation gradually loses its own early context.
Analogy: What is currently on your desk β everything in reach, but limited space.
Practical implication: For long tasks, the harness must actively manage what stays in context. Summarization, compression, and selective retrieval are all strategies for keeping the most important information visible.
2. Persistent memory (long-term storage)
Information the agent saves between sessions β facts it has learned about you, past decisions, stated preferences, recurring patterns. Stored in a database and retrieved when relevant.
Analogy: Your notes and files β not on your desk right now, but you can look them up.
Example stored facts:
user_preferences:
- Prefers bullet points over paragraphs in summaries
- Uses metric units
- Timezone: America/Chicago
project_context:
- Current sprint ends Friday
- Primary language: TypeScript
- Deploy target: AWS Lambda3. Episodic memory (interaction history)
A log of past interactions β what tasks were completed, what was decided, what worked, what did not. Allows the agent to reason about its own history and avoid repeating mistakes.
Analogy: Your work journal β a dated record of what you did and decided.
Example use: An agent that has tried and failed to reach a contact three times via email can check its episodic memory, recognize the pattern, and suggest trying a different channel.
4. Semantic memory (knowledge base)
Structured knowledge the agent can search β documentation, company policies, product catalogs, FAQs, research papers. Usually stored in a vector database which enables search by meaning rather than exact keyword match.
Analogy: Your reference library β you search it when you need to look something up.
Why vector search? A keyword search for "vacation time" might miss a policy document that says "annual leave entitlement." A vector search finds both because it understands semantic similarity.
How retrieval works
Agents do not load all their memory into context at once β that would exhaust the context window immediately. Instead they use retrieval-augmented generation (RAG):
User sends message or agent starts a step
β
Agent generates a search query from the current context
β
Search runs against memory store (vector DB, SQL, or both)
β
Top-N most relevant results retrieved
β
Results injected into context alongside the original message
β
Model generates response with full relevant context availableThis is why a well-configured agent can feel like it "remembers" something from three months ago β it is not holding the full history in memory, it is storing key facts and surfacing them on demand.
Retrieval quality matters
The usefulness of retrieval depends on:
- Chunking strategy β how documents are split before indexing. Chunks too small lose context; chunks too large dilute relevance.
- Embedding model β the model used to convert text into vectors. Better embeddings = better semantic matches.
- Reranking β a second-pass model that re-scores retrieved results for relevance before injecting them.
- Metadata filtering β filtering by date, source, or category before semantic search to narrow the candidate pool.
Memory in practice
Setting up an agent with useful memory
# Pseudocode: agent with persistent + semantic memory
def run_agent(user_message, user_id):
# 1. Retrieve relevant persistent facts about this user
user_facts = memory.get_user_facts(user_id)
# 2. Retrieve relevant knowledge base entries
kb_results = vector_db.search(user_message, top_k=5)
# 3. Retrieve recent episodic context
recent_history = memory.get_recent_episodes(user_id, n=3)
# 4. Build context for the model
context = build_context(user_facts, kb_results, recent_history)
# 5. Run the model with enriched context
response = model.generate(context + user_message)
# 6. Store this interaction as a new episode
memory.store_episode(user_id, user_message, response)
return responseWhat to store in persistent memory
Store things that are true across sessions and change infrequently:
- User preferences and communication style
- Project conventions and terminology
- Decisions that have been made and should not be revisited
- Frequently referenced facts (timezone, team size, tech stack)
Do not store everything β that degrades retrieval quality. Be selective about what is worth remembering long-term.
When setting up an agent, give it relevant context upfront rather than waiting for it to learn over time. Paste in your style guide, product glossary, or team conventions as initial persistent memory. The agent will use this immediately and you will get better results from the first interaction.
Context window limits
Be aware of the practical limits:
- Claude: Up to 200k tokens (~150,000 words) in context
- GPT-4o: Up to 128k tokens
- Gemini 1.5/2.0: Up to 1M tokens (excellent for very long documents)
Even with large context windows, flooding the context with everything degrades output quality. The model attends to everything in context, and noise hurts signal. Selective retrieval outperforms "stuff everything in" even when the window is large enough to hold everything.
Next: Tools & Actions β