Add short-term conversation memory, long-term JSON persistence (saving data to a file so it survives between sessions), and context window management (keeping the conversation from getting too long for the AI to handle). ~200 lines of Python.
Add short-term conversation memory, long-term JSON persistence (saving data to a file so it survives between sessions), and context window management (keeping the conversation from getting too long for the AI to handle). ~200 lines of Python.
An agent with two memory systems: short-term (the conversation history that exists while the program is running) and long-term (a JSON file on disk that stores facts and survives between sessions). Plus a context window manager — when the conversation history gets too long for Claude to handle in one call, this automatically summarizes the older messages to make room for new ones.
Short-term memory is the conversation history — the messages list we've been building since Day 1. It's in RAM. When the session ends, it's gone.
Long-term memory is a persistent store of facts the agent has learned. When the agent discovers something important — a user preference, a key fact, a result from a previous task — it writes it to a file or database. Next session, it loads that store and starts with that knowledge.
Together, they give the agent both context within a session and continuity across sessions.
import anthropic, json, os from datetime import datetime from pathlib import Path client = anthropic.Anthropic() MEMORY_FILE = "agent_memory.json" MAX_MESSAGES = 20 # summarize when history exceeds this # ── Long-term memory ─────────────────────────────── def load_memory() -> dict: if Path(MEMORY_FILE).exists(): return json.loads(Path(MEMORY_FILE).read_text()) return {"facts": [], "preferences": {}, "task_history": []} def save_memory(memory: dict): Path(MEMORY_FILE).write_text(json.dumps(memory, indent=2)) def remember_fact(fact: str, category: str = "general") -> str: memory = load_memory() entry = { "fact": fact, "category": category, "timestamp": datetime.now().isoformat() } memory["facts"].append(entry) save_memory(memory) return f"Remembered: '{fact}' (category: {category})" def recall_facts(query: str = "") -> str: memory = load_memory() facts = memory["facts"] if query: facts = [f for f in facts if query.lower() in f["fact"].lower() or query.lower() in f["category"].lower()] if not facts: return "No facts found." return json.dumps(facts, indent=2) # ── Context window management ────────────────────── def summarize_old_messages(messages: list) -> list: """When history is long, summarize old messages into a single system message.""" if len(messages) <= MAX_MESSAGES: return messages # Keep the most recent 10 messages recent = messages[-10:] old = messages[:-10] # Ask Claude to summarize the old messages summary_text = "\n".join( f"{m['role']}: {str(m['content'])[:200]}" for m in old ) summary_resp = client.messages.create( model="claude-haiku-4-5", # cheap model for summarization max_tokens=512, messages=[{ "role": "user", "content": f"Summarize this conversation history in 3-5 sentences:\n{summary_text}" }] ) summary = summary_resp.content[0].text # Replace old messages with a summary summary_message = { "role": "user", "content": f"[Earlier conversation summary]: {summary}" } return [summary_message] + recent # ── Tools (extends Day 2) ───────────────────────── TOOLS = [ {"name":"remember_fact","description":"Store a fact in long-term memory for future sessions. Use for important info the user shares.", "input_schema":{"type":"object","properties":{"fact":{"type":"string"},"category":{"type":"string"}},"required":["fact"]}}, {"name":"recall_facts","description":"Retrieve facts from long-term memory. Use at session start or when you need to recall past information.", "input_schema":{"type":"object","properties":{"query":{"type":"string"}},"required":[]}}, ] def execute_tool(name, inp): if name == "remember_fact": return remember_fact(inp["fact"], inp.get("category","general")) elif name == "recall_facts": return recall_facts(inp.get("query","")) raise ValueError(f"Unknown tool: {name}") # ── Agent with memory ───────────────────────────── class MemoryAgent: def __init__(self): self.messages = [] self.memory = load_memory() # Inject existing long-term memory as context if self.memory["facts"]: facts_str = "\n".join(f"- {f['fact']}" for f in self.memory["facts"]) print(f"Loaded {len(self.memory['facts'])} facts from long-term memory.") self._system_context = f"You know these facts from previous sessions:\n{facts_str}" else: self._system_context = "No previous session memory." def chat(self, user_input: str, max_steps=10): self.messages.append({"role":"user","content":user_input}) self.messages = summarize_old_messages(self.messages) for _ in range(max_steps): resp = client.messages.create( model="claude-sonnet-4-5", max_tokens=2048, system=self._system_context, tools=TOOLS, messages=self.messages ) if resp.stop_reason == "end_turn": answer = resp.content[0].text self.messages.append({"role":"assistant","content":answer}) return answer results = [] for b in resp.content: if b.type == "tool_use": result = execute_tool(b.name, b.input) results.append({"type":"tool_result","tool_use_id":b.id,"content":str(result)}) self.messages += [ {"role":"assistant","content":resp.content}, {"role":"user","content":results} ] return "Max steps reached." # ── Demo: multi-session memory ───────────────────── if __name__ == "__main__": agent = MemoryAgent() # Session 1: tell agent something r1 = agent.chat("My name is Alex and I prefer Python over JavaScript.") print("Response 1:", r1) # Session 2 (new agent instance simulates new session) agent2 = MemoryAgent() r2 = agent2.chat("What's my name and language preference?") print("Response 2:", r2) # Agent should answer correctly from long-term memory!
Run it twice: First run stores the preference. Second run (creating agent2) loads it from disk and answers correctly. That's long-term memory working.
Every message you add to the history costs tokens. Claude's context window is 200K tokens — but at $3/million tokens for Sonnet, a 100K-token conversation costs $0.30 per exchange. For production agents running hundreds of tasks per day, this adds up.
The summarization approach above keeps costs manageable: once you have more than 20 messages, summarize the oldest ones into a single compact message. You preserve the gist of the conversation without paying for every word.
More sophisticated approaches: For production, consider semantic memory — using embeddings to retrieve only the relevant facts for the current task, rather than loading everything. We use a simpler JSON approach here because it works for 80% of use cases and doesn't require a vector database.
forget_fact(fact_id) tool that removes a specific memory entryset_preference(key, value) tool that stores user preferences separately from factsTomorrow: Multi-agent systems. One agent directing others. This is where things get interesting.
Before moving on, make sure you can answer these without looking: