LangChain Callbacks and Logging Monitoring
You built a working LangChain application. Now you need to know what it is doing at runtime. How long does each step take? Which prompts are being sent to the model? How many tokens are being used per request? When something goes wrong, which step failed? LangChain's Callbacks system answers all of these questions by letting you hook into every event that occurs during chain execution — without modifying your application logic.
The Airport Security Camera Analogy
Airport security cameras record every gate, corridor, and checkpoint without interfering with passenger flow. Passengers walk through normally. The cameras silently capture everything. If something goes wrong, staff review the footage to understand exactly what happened. LangChain Callbacks work the same way — they observe chain execution silently, recording events, timings, and data, without changing the chain's behavior.
Chain execution with callbacks:
chain.invoke(input)
│
▼ on_chain_start → Callback records: input, timestamp, chain name
│
▼ on_llm_start → Callback records: prompt text, model name
│
▼ on_llm_end → Callback records: response, tokens used, duration
│
▼ on_chain_end → Callback records: output, total duration
│
▼ Result returned to your code
Built-In Callbacks: StdOutCallbackHandler
The simplest callback prints events to the terminal as they happen. It is the equivalent of verbose=True but as a reusable object you can attach to any chain.
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.callbacks import StdOutCallbackHandler
load_dotenv()
model = ChatOpenAI(model="gpt-3.5-turbo")
prompt = ChatPromptTemplate.from_messages([("human", "{question}")])
parser = StrOutputParser()
chain = prompt | model | parser
# Attach callback to this invocation
result = chain.invoke(
{"question": "What is photosynthesis?"},
config={"callbacks": [StdOutCallbackHandler()]}
)
The terminal shows exactly which component started, what input it received, what output it produced, and how long it took. This is the fastest way to debug unexpected behavior.
Building a Custom Callback Handler
The real power of callbacks comes from writing your own. Subclass BaseCallbackHandler and override the event methods you care about. Each method receives relevant data about the event.
from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.outputs import LLMResult
from typing import Any
import time
import json
class DetailedLogger(BaseCallbackHandler):
"""Logs all LangChain events to a file for later analysis."""
def __init__(self, log_file: str = "langchain_log.jsonl"):
self.log_file = log_file
self.start_times = {}
def _write_log(self, event: str, data: dict):
"""Append a JSON log entry to the log file."""
entry = {"event": event, "timestamp": time.time(), **data}
with open(self.log_file, "a") as f:
f.write(json.dumps(entry) + "\n")
def on_llm_start(self, serialized: dict, prompts: list, **kwargs):
"""Called when the LLM starts processing."""
run_id = str(kwargs.get("run_id", "unknown"))
self.start_times[run_id] = time.time()
self._write_log("llm_start", {
"model": serialized.get("name", "unknown"),
"prompt_length": sum(len(p) for p in prompts)
})
print(f"[LOG] LLM call started. Prompt: {sum(len(p) for p in prompts)} chars")
def on_llm_end(self, response: LLMResult, **kwargs):
"""Called when the LLM finishes processing."""
run_id = str(kwargs.get("run_id", "unknown"))
duration = time.time() - self.start_times.pop(run_id, time.time())
# Extract token usage from the response
usage = {}
if response.llm_output:
usage = response.llm_output.get("token_usage", {})
self._write_log("llm_end", {
"duration_seconds": round(duration, 3),
"prompt_tokens": usage.get("prompt_tokens", 0),
"completion_tokens": usage.get("completion_tokens", 0),
"total_tokens": usage.get("total_tokens", 0)
})
print(f"[LOG] LLM call finished in {duration:.2f}s. "
f"Tokens: {usage.get('total_tokens', '?')}")
def on_llm_error(self, error: Exception, **kwargs):
"""Called when the LLM raises an error."""
self._write_log("llm_error", {"error": str(error)})
print(f"[LOG] LLM ERROR: {error}")
def on_chain_start(self, serialized: dict, inputs: dict, **kwargs):
"""Called when a chain starts."""
print(f"[LOG] Chain '{serialized.get('name', 'unknown')}' started")
def on_chain_end(self, outputs: dict, **kwargs):
"""Called when a chain finishes."""
print(f"[LOG] Chain finished")
def on_tool_start(self, serialized: dict, input_str: str, **kwargs):
"""Called when a tool starts executing."""
print(f"[LOG] Tool '{serialized.get('name', 'unknown')}' called with: {input_str[:100]}")
def on_tool_end(self, output: str, **kwargs):
"""Called when a tool finishes."""
print(f"[LOG] Tool returned: {output[:100]}")
# Use the custom callback
logger = DetailedLogger("my_app_log.jsonl")
result = chain.invoke(
{"question": "Explain gravity briefly."},
config={"callbacks": [logger]}
)
Token Usage Tracking for Cost Management
Every token sent to and received from a paid AI API costs money. A chain that works fine in development can become expensive in production if it sends unexpectedly large prompts. Token tracking reveals exactly where tokens are being spent.
from langchain_community.callbacks import get_openai_callback
with get_openai_callback() as cb:
result = chain.invoke({"question": "Summarize the water cycle."})
print(f"Prompt tokens: {cb.prompt_tokens}")
print(f"Completion tokens: {cb.completion_tokens}")
print(f"Total tokens: {cb.total_tokens}")
print(f"Estimated cost: ${cb.total_cost:.6f}")
Use this during development to profile your chains before deploying. A chain that uses 3,000 tokens per request costs 30x more than one using 100 tokens.
Integrating with LangSmith
LangSmith is the official observability platform for LangChain applications. Add these three lines to your .env file and every chain run appears in your LangSmith dashboard automatically — no code changes needed.
LANGCHAIN_TRACING_V2=true LANGCHAIN_API_KEY=your-langsmith-api-key LANGCHAIN_PROJECT=my-langchain-app
LangSmith shows a timeline of each step, records all inputs and outputs, tracks latency and token usage, and lets you compare runs side by side. It is free for individual developers and invaluable for debugging complex agent runs.
Callback Event Reference
Event Method When It Fires ────────────────────────────────────────────────────────── on_chain_start Chain begins executing on_chain_end Chain finishes successfully on_chain_error Chain raises an exception on_llm_start LLM call begins on_llm_new_token Each streaming token (streaming only) on_llm_end LLM call finishes on_llm_error LLM call fails on_tool_start Tool begins executing on_tool_end Tool finishes on_tool_error Tool raises an exception on_retriever_start Retriever begins search on_retriever_end Retriever returns results on_agent_action Agent decides to call a tool on_agent_finish Agent produces final answer
Summary
Callbacks are hooks that fire at every event during chain execution without modifying the chain itself. StdOutCallbackHandler provides instant visibility during development. Custom handlers extend BaseCallbackHandler and override the event methods you need. get_openai_callback tracks token usage and estimated API costs. LangSmith activates automatically via environment variables and provides a full production monitoring dashboard.
