LangChain Memory Giving Your AI Application

Every time you call an AI model through the API, the model starts with a blank slate. It has no idea who you are, what you discussed earlier, or what task you were working on. For single-question tools this is fine. For chatbots, tutors, assistants, or any application where the conversation spans multiple turns, this statelessness is a serious limitation. LangChain's Memory system gives your application the ability to store and retrieve conversation history, making coherent multi-turn interactions possible.

The Goldfish Problem

A goldfish supposedly forgets everything every few seconds (this is actually a myth, but it serves as a useful mental model). An AI model without memory is like a goldfish — every question you ask, it starts fresh with no recollection of anything before it. Memory components give your application the ability to remember — like upgrading the goldfish's brain to hold a real conversation.

WITHOUT MEMORY:
User:  "My name is Priya."
AI:    "Nice to meet you, Priya!"
User:  "What is my name?"
AI:    "I don't know your name. Could you tell me?"  ← Forgot!

WITH MEMORY:
User:  "My name is Priya."
AI:    "Nice to meet you, Priya!"
User:  "What is my name?"
AI:    "Your name is Priya."  ← Remembered!

How Memory Works in LangChain

Memory in LangChain is not magic. It works by storing the conversation history and injecting the relevant parts back into the prompt before each model call. The model appears to "remember" because it can see the previous messages in the prompt it receives.

Turn 1:
┌─────────────────────────────────────────┐
│ Prompt sent to model:                   │
│   system: "You are a helpful assistant" │
│   human:  "My name is Priya"            │
└─────────────────────────────────────────┘
Response: "Nice to meet you, Priya!"

Memory stores: [Human: "My name is Priya", AI: "Nice to meet you, Priya!"]

Turn 2:
┌─────────────────────────────────────────────────────┐
│ Prompt sent to model:                               │
│   system: "You are a helpful assistant"             │
│   human:  "My name is Priya"           ← from memory│
│   ai:     "Nice to meet you, Priya!"   ← from memory│
│   human:  "What is my name?"           ← new message│
└─────────────────────────────────────────────────────┘
Response: "Your name is Priya."

The model sees all past messages in its context window and uses them to formulate a relevant response. Memory manages the storage, retrieval, and injection of this history automatically.

The Modern Approach: Managing History Manually

In newer versions of LangChain (0.3.x and beyond), the recommended approach stores conversation history as a plain Python list and passes it into your chain using MessagesPlaceholder. This approach is simple, transparent, and gives you full control.

from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage
from langchain_core.output_parsers import StrOutputParser

load_dotenv()

model = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.3)
parser = StrOutputParser()

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a friendly assistant named Maya."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

chain = prompt | model | parser

# Store conversation history
history = []

def chat(user_input: str) -> str:
    # Call the chain with current history
    response = chain.invoke({
        "history": history,
        "input": user_input
    })

    # Save the new turn to history
    history.append(HumanMessage(content=user_input))
    history.append(AIMessage(content=response))

    return response

# Have a multi-turn conversation
print(chat("Hi! My name is Rahul."))
print(chat("I work as a data scientist."))
print(chat("What is my name and what do I do?"))

The third question returns both the name and occupation because the history list contains both previous turns. The chain receives the growing history list on every call.

Diagram: Memory Injection Pattern

Turn 1:
  history = []
  input = "My name is Rahul"
  ↓
  Prompt: [System] + [] + [Human: "My name is Rahul"]
  ↓ model ↓
  Response: "Nice to meet you, Rahul!"
  history = [HumanMsg("My name is Rahul"), AIMsg("Nice to meet you, Rahul!")]

Turn 2:
  history = [HumanMsg(...), AIMsg(...)]
  input = "What is my name?"
  ↓
  Prompt: [System] + [HumanMsg, AIMsg] + [Human: "What is my name?"]
  ↓ model ↓
  Response: "Your name is Rahul."
  history = [HumanMsg, AIMsg, HumanMsg("What is my name?"), AIMsg("Your name is Rahul.")]

The Memory Growth Problem

Storing every message forever creates two problems. First, conversations grow until they exceed the model's context window limit. Second, sending thousands of tokens of old conversation history on every call gets expensive.

You need a strategy to manage history growth. LangChain provides several built-in solutions.

Strategy 1: Trim to Last N Messages

Keep only the most recent messages. This is simple and predictable. The model loses older context but always processes a manageable amount of text.

from langchain_core.messages import trim_messages

def chat_with_trim(user_input: str) -> str:
    # Keep only the last 10 messages (5 turns)
    trimmed_history = trim_messages(
        history,
        max_tokens=2000,
        token_counter=model,
        strategy="last",
        include_system=False
    )

    response = chain.invoke({
        "history": trimmed_history,
        "input": user_input
    })

    history.append(HumanMessage(content=user_input))
    history.append(AIMessage(content=response))
    return response

Strategy 2: Summarize Old History

Instead of discarding old messages, compress them into a summary. The model can still reference information from early in the conversation, just in condensed form.

def summarize_old_history(messages: list) -> str:
    """Summarize a list of messages into a short paragraph."""
    summary_prompt = ChatPromptTemplate.from_messages([
        ("system", "Summarize this conversation history in 2-3 sentences. Preserve key facts."),
        ("human", "{messages}")
    ])
    summary_chain = summary_prompt | model | parser

    text = "\n".join([f"{m.type}: {m.content}" for m in messages])
    return summary_chain.invoke({"messages": text})

def chat_with_summary(user_input: str) -> str:
    global history, summary

    # When history gets long, summarize older messages
    if len(history) > 20:
        # Summarize the oldest 10 messages
        old_messages = history[:10]
        summary = summarize_old_history(old_messages)
        history = history[10:]  # Keep only recent messages

    # Combine summary + recent history
    context = []
    if summary:
        context.append(SystemMessage(content=f"Earlier in this conversation: {summary}"))
    context.extend(history)

    response = chain.invoke({"history": context, "input": user_input})
    history.append(HumanMessage(content=user_input))
    history.append(AIMessage(content=response))
    return response

Persisting Memory Across Sessions

In-memory history lists vanish when your application restarts. For real applications where users return for multiple sessions, you need to save and load conversation history from a database or file.

Simple File-Based Persistence

import json
from pathlib import Path
from langchain_core.messages import HumanMessage, AIMessage

def save_history(history: list, user_id: str):
    """Save conversation history to a JSON file."""
    data = [
        {"type": m.type, "content": m.content}
        for m in history
    ]
    Path(f"history_{user_id}.json").write_text(json.dumps(data))

def load_history(user_id: str) -> list:
    """Load conversation history from a JSON file."""
    path = Path(f"history_{user_id}.json")
    if not path.exists():
        return []

    data = json.loads(path.read_text())
    messages = []
    for item in data:
        if item["type"] == "human":
            messages.append(HumanMessage(content=item["content"]))
        elif item["type"] == "ai":
            messages.append(AIMessage(content=item["content"]))
    return messages

# Usage
user_id = "user_123"
history = load_history(user_id)

def chat_persistent(user_input: str) -> str:
    response = chain.invoke({"history": history, "input": user_input})
    history.append(HumanMessage(content=user_input))
    history.append(AIMessage(content=response))
    save_history(history, user_id)
    return response

Database-Based Persistence (Production)

For production applications, LangChain integrates with databases like Redis, PostgreSQL, and MongoDB for conversation storage. Install the appropriate integration package:

pip install langchain-community
from langchain_community.chat_message_histories import RedisChatMessageHistory

# Store history in Redis (fast, persistent, supports multiple users)
history_store = RedisChatMessageHistory(
    session_id="user_123",
    url="redis://localhost:6379"
)

# history_store.messages gives the full history
# history_store.add_user_message() adds a human message
# history_store.add_ai_message() adds an AI response
# history_store.clear() clears the history for this session

Multi-User Memory Management

Applications serving multiple users need separate history for each user. Never mix histories. Use a session identifier (user ID, session token) as the key for each user's history.

# Dictionary to hold history for each user
user_histories = {}

def get_history(user_id: str) -> list:
    if user_id not in user_histories:
        user_histories[user_id] = []
    return user_histories[user_id]

def chat_multi_user(user_id: str, user_input: str) -> str:
    history = get_history(user_id)

    response = chain.invoke({
        "history": history,
        "input": user_input
    })

    history.append(HumanMessage(content=user_input))
    history.append(AIMessage(content=response))
    return response

# Each user gets their own separate memory
print(chat_multi_user("user_001", "My favorite color is blue."))
print(chat_multi_user("user_002", "My favorite color is red."))
print(chat_multi_user("user_001", "What is my favorite color?"))  # Returns blue
print(chat_multi_user("user_002", "What is my favorite color?"))  # Returns red

Memory Types Comparison

Memory Approach          Pros                Cons               Best For
──────────────────────────────────────────────────────────────────────────────
Full history list        Simple, accurate    Can exceed context  Short sessions
Trimmed (last N msgs)    Predictable cost    Loses old context   Long sessions
Summarized history       Preserves key facts Slight inaccuracy   Very long sessions
Database-backed          Persists forever    Needs infrastructure Multi-session apps

What to Store in Memory vs System Prompt

Not everything needs to go into the conversation history. Some information belongs in the system message because it never changes and should always be present. Other information belongs in the history because it arose during the conversation.

System Prompt (fixed, always present):
  - The assistant's name and persona
  - The application's domain and purpose
  - Hard rules the assistant must follow
  - Default language and tone

Conversation History (dynamic, grows over time):
  - Things the user said in previous turns
  - Facts the user shared about themselves
  - Decisions made during the conversation
  - Previous questions and answers

Debugging Memory Issues

When memory behaves unexpectedly, print the history list and the full prompt before sending it to the model. This reveals whether the history is being stored correctly and whether it is being injected into the right position in the prompt.

def debug_chat(user_input: str) -> str:
    print(f"History has {len(history)} messages")
    for msg in history:
        print(f"  {msg.type}: {msg.content[:50]}...")

    # See the full formatted prompt
    formatted = prompt.format_messages(history=history, input=user_input)
    print(f"\nFull prompt ({len(formatted)} messages):")
    for msg in formatted:
        print(f"  {msg.type}: {msg.content[:80]}...")

    response = chain.invoke({"history": history, "input": user_input})
    history.append(HumanMessage(content=user_input))
    history.append(AIMessage(content=response))
    return response

Summary

Memory gives your AI application the ability to maintain context across multiple turns by storing conversation history and injecting it into each prompt. The modern LangChain approach uses a plain Python list with MessagesPlaceholder. History growth is managed by trimming old messages or summarizing them. Persistence across sessions requires saving history to files or databases. Multi-user applications need separate history per user identified by a session key.

Leave a Comment