LangChain Memory Giving Your AI Application
Every time you call an AI model through the API, the model starts with a blank slate. It has no idea who you are, what you discussed earlier, or what task you were working on. For single-question tools this is fine. For chatbots, tutors, assistants, or any application where the conversation spans multiple turns, this statelessness is a serious limitation. LangChain's Memory system gives your application the ability to store and retrieve conversation history, making coherent multi-turn interactions possible.
The Goldfish Problem
A goldfish supposedly forgets everything every few seconds (this is actually a myth, but it serves as a useful mental model). An AI model without memory is like a goldfish — every question you ask, it starts fresh with no recollection of anything before it. Memory components give your application the ability to remember — like upgrading the goldfish's brain to hold a real conversation.
WITHOUT MEMORY: User: "My name is Priya." AI: "Nice to meet you, Priya!" User: "What is my name?" AI: "I don't know your name. Could you tell me?" ← Forgot! WITH MEMORY: User: "My name is Priya." AI: "Nice to meet you, Priya!" User: "What is my name?" AI: "Your name is Priya." ← Remembered!
How Memory Works in LangChain
Memory in LangChain is not magic. It works by storing the conversation history and injecting the relevant parts back into the prompt before each model call. The model appears to "remember" because it can see the previous messages in the prompt it receives.
Turn 1: ┌─────────────────────────────────────────┐ │ Prompt sent to model: │ │ system: "You are a helpful assistant" │ │ human: "My name is Priya" │ └─────────────────────────────────────────┘ Response: "Nice to meet you, Priya!" Memory stores: [Human: "My name is Priya", AI: "Nice to meet you, Priya!"] Turn 2: ┌─────────────────────────────────────────────────────┐ │ Prompt sent to model: │ │ system: "You are a helpful assistant" │ │ human: "My name is Priya" ← from memory│ │ ai: "Nice to meet you, Priya!" ← from memory│ │ human: "What is my name?" ← new message│ └─────────────────────────────────────────────────────┘ Response: "Your name is Priya."
The model sees all past messages in its context window and uses them to formulate a relevant response. Memory manages the storage, retrieval, and injection of this history automatically.
The Modern Approach: Managing History Manually
In newer versions of LangChain (0.3.x and beyond), the recommended approach stores conversation history as a plain Python list and passes it into your chain using MessagesPlaceholder. This approach is simple, transparent, and gives you full control.
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage
from langchain_core.output_parsers import StrOutputParser
load_dotenv()
model = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.3)
parser = StrOutputParser()
prompt = ChatPromptTemplate.from_messages([
("system", "You are a friendly assistant named Maya."),
MessagesPlaceholder(variable_name="history"),
("human", "{input}")
])
chain = prompt | model | parser
# Store conversation history
history = []
def chat(user_input: str) -> str:
# Call the chain with current history
response = chain.invoke({
"history": history,
"input": user_input
})
# Save the new turn to history
history.append(HumanMessage(content=user_input))
history.append(AIMessage(content=response))
return response
# Have a multi-turn conversation
print(chat("Hi! My name is Rahul."))
print(chat("I work as a data scientist."))
print(chat("What is my name and what do I do?"))
The third question returns both the name and occupation because the history list contains both previous turns. The chain receives the growing history list on every call.
Diagram: Memory Injection Pattern
Turn 1:
history = []
input = "My name is Rahul"
↓
Prompt: [System] + [] + [Human: "My name is Rahul"]
↓ model ↓
Response: "Nice to meet you, Rahul!"
history = [HumanMsg("My name is Rahul"), AIMsg("Nice to meet you, Rahul!")]
Turn 2:
history = [HumanMsg(...), AIMsg(...)]
input = "What is my name?"
↓
Prompt: [System] + [HumanMsg, AIMsg] + [Human: "What is my name?"]
↓ model ↓
Response: "Your name is Rahul."
history = [HumanMsg, AIMsg, HumanMsg("What is my name?"), AIMsg("Your name is Rahul.")]
The Memory Growth Problem
Storing every message forever creates two problems. First, conversations grow until they exceed the model's context window limit. Second, sending thousands of tokens of old conversation history on every call gets expensive.
You need a strategy to manage history growth. LangChain provides several built-in solutions.
Strategy 1: Trim to Last N Messages
Keep only the most recent messages. This is simple and predictable. The model loses older context but always processes a manageable amount of text.
from langchain_core.messages import trim_messages
def chat_with_trim(user_input: str) -> str:
# Keep only the last 10 messages (5 turns)
trimmed_history = trim_messages(
history,
max_tokens=2000,
token_counter=model,
strategy="last",
include_system=False
)
response = chain.invoke({
"history": trimmed_history,
"input": user_input
})
history.append(HumanMessage(content=user_input))
history.append(AIMessage(content=response))
return response
Strategy 2: Summarize Old History
Instead of discarding old messages, compress them into a summary. The model can still reference information from early in the conversation, just in condensed form.
def summarize_old_history(messages: list) -> str:
"""Summarize a list of messages into a short paragraph."""
summary_prompt = ChatPromptTemplate.from_messages([
("system", "Summarize this conversation history in 2-3 sentences. Preserve key facts."),
("human", "{messages}")
])
summary_chain = summary_prompt | model | parser
text = "\n".join([f"{m.type}: {m.content}" for m in messages])
return summary_chain.invoke({"messages": text})
def chat_with_summary(user_input: str) -> str:
global history, summary
# When history gets long, summarize older messages
if len(history) > 20:
# Summarize the oldest 10 messages
old_messages = history[:10]
summary = summarize_old_history(old_messages)
history = history[10:] # Keep only recent messages
# Combine summary + recent history
context = []
if summary:
context.append(SystemMessage(content=f"Earlier in this conversation: {summary}"))
context.extend(history)
response = chain.invoke({"history": context, "input": user_input})
history.append(HumanMessage(content=user_input))
history.append(AIMessage(content=response))
return response
Persisting Memory Across Sessions
In-memory history lists vanish when your application restarts. For real applications where users return for multiple sessions, you need to save and load conversation history from a database or file.
Simple File-Based Persistence
import json
from pathlib import Path
from langchain_core.messages import HumanMessage, AIMessage
def save_history(history: list, user_id: str):
"""Save conversation history to a JSON file."""
data = [
{"type": m.type, "content": m.content}
for m in history
]
Path(f"history_{user_id}.json").write_text(json.dumps(data))
def load_history(user_id: str) -> list:
"""Load conversation history from a JSON file."""
path = Path(f"history_{user_id}.json")
if not path.exists():
return []
data = json.loads(path.read_text())
messages = []
for item in data:
if item["type"] == "human":
messages.append(HumanMessage(content=item["content"]))
elif item["type"] == "ai":
messages.append(AIMessage(content=item["content"]))
return messages
# Usage
user_id = "user_123"
history = load_history(user_id)
def chat_persistent(user_input: str) -> str:
response = chain.invoke({"history": history, "input": user_input})
history.append(HumanMessage(content=user_input))
history.append(AIMessage(content=response))
save_history(history, user_id)
return response
Database-Based Persistence (Production)
For production applications, LangChain integrates with databases like Redis, PostgreSQL, and MongoDB for conversation storage. Install the appropriate integration package:
pip install langchain-community
from langchain_community.chat_message_histories import RedisChatMessageHistory
# Store history in Redis (fast, persistent, supports multiple users)
history_store = RedisChatMessageHistory(
session_id="user_123",
url="redis://localhost:6379"
)
# history_store.messages gives the full history
# history_store.add_user_message() adds a human message
# history_store.add_ai_message() adds an AI response
# history_store.clear() clears the history for this session
Multi-User Memory Management
Applications serving multiple users need separate history for each user. Never mix histories. Use a session identifier (user ID, session token) as the key for each user's history.
# Dictionary to hold history for each user
user_histories = {}
def get_history(user_id: str) -> list:
if user_id not in user_histories:
user_histories[user_id] = []
return user_histories[user_id]
def chat_multi_user(user_id: str, user_input: str) -> str:
history = get_history(user_id)
response = chain.invoke({
"history": history,
"input": user_input
})
history.append(HumanMessage(content=user_input))
history.append(AIMessage(content=response))
return response
# Each user gets their own separate memory
print(chat_multi_user("user_001", "My favorite color is blue."))
print(chat_multi_user("user_002", "My favorite color is red."))
print(chat_multi_user("user_001", "What is my favorite color?")) # Returns blue
print(chat_multi_user("user_002", "What is my favorite color?")) # Returns red
Memory Types Comparison
Memory Approach Pros Cons Best For ────────────────────────────────────────────────────────────────────────────── Full history list Simple, accurate Can exceed context Short sessions Trimmed (last N msgs) Predictable cost Loses old context Long sessions Summarized history Preserves key facts Slight inaccuracy Very long sessions Database-backed Persists forever Needs infrastructure Multi-session apps
What to Store in Memory vs System Prompt
Not everything needs to go into the conversation history. Some information belongs in the system message because it never changes and should always be present. Other information belongs in the history because it arose during the conversation.
System Prompt (fixed, always present): - The assistant's name and persona - The application's domain and purpose - Hard rules the assistant must follow - Default language and tone Conversation History (dynamic, grows over time): - Things the user said in previous turns - Facts the user shared about themselves - Decisions made during the conversation - Previous questions and answers
Debugging Memory Issues
When memory behaves unexpectedly, print the history list and the full prompt before sending it to the model. This reveals whether the history is being stored correctly and whether it is being injected into the right position in the prompt.
def debug_chat(user_input: str) -> str:
print(f"History has {len(history)} messages")
for msg in history:
print(f" {msg.type}: {msg.content[:50]}...")
# See the full formatted prompt
formatted = prompt.format_messages(history=history, input=user_input)
print(f"\nFull prompt ({len(formatted)} messages):")
for msg in formatted:
print(f" {msg.type}: {msg.content[:80]}...")
response = chain.invoke({"history": history, "input": user_input})
history.append(HumanMessage(content=user_input))
history.append(AIMessage(content=response))
return response
Summary
Memory gives your AI application the ability to maintain context across multiple turns by storing conversation history and injecting it into each prompt. The modern LangChain approach uses a plain Python list with MessagesPlaceholder. History growth is managed by trimming old messages or summarizing them. Persistence across sessions requires saving history to files or databases. Multi-user applications need separate history per user identified by a session key.
