LangChain Callbacks and Logging Monitoring

You built a working LangChain application. Now you need to know what it is doing at runtime. How long does each step take? Which prompts are being sent to the model? How many tokens are being used per request? When something goes wrong, which step failed? LangChain's Callbacks system answers all of these questions by letting you hook into every event that occurs during chain execution — without modifying your application logic.

The Airport Security Camera Analogy

Airport security cameras record every gate, corridor, and checkpoint without interfering with passenger flow. Passengers walk through normally. The cameras silently capture everything. If something goes wrong, staff review the footage to understand exactly what happened. LangChain Callbacks work the same way — they observe chain execution silently, recording events, timings, and data, without changing the chain's behavior.

Chain execution with callbacks:

chain.invoke(input)
       │
       ▼ on_chain_start → Callback records: input, timestamp, chain name
       │
       ▼ on_llm_start → Callback records: prompt text, model name
       │
       ▼ on_llm_end → Callback records: response, tokens used, duration
       │
       ▼ on_chain_end → Callback records: output, total duration
       │
       ▼ Result returned to your code

Built-In Callbacks: StdOutCallbackHandler

The simplest callback prints events to the terminal as they happen. It is the equivalent of verbose=True but as a reusable object you can attach to any chain.

from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.callbacks import StdOutCallbackHandler

load_dotenv()

model = ChatOpenAI(model="gpt-3.5-turbo")
prompt = ChatPromptTemplate.from_messages([("human", "{question}")])
parser = StrOutputParser()
chain = prompt | model | parser

# Attach callback to this invocation
result = chain.invoke(
    {"question": "What is photosynthesis?"},
    config={"callbacks": [StdOutCallbackHandler()]}
)

The terminal shows exactly which component started, what input it received, what output it produced, and how long it took. This is the fastest way to debug unexpected behavior.

Building a Custom Callback Handler

The real power of callbacks comes from writing your own. Subclass BaseCallbackHandler and override the event methods you care about. Each method receives relevant data about the event.

from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.outputs import LLMResult
from typing import Any
import time
import json

class DetailedLogger(BaseCallbackHandler):
    """Logs all LangChain events to a file for later analysis."""

    def __init__(self, log_file: str = "langchain_log.jsonl"):
        self.log_file = log_file
        self.start_times = {}

    def _write_log(self, event: str, data: dict):
        """Append a JSON log entry to the log file."""
        entry = {"event": event, "timestamp": time.time(), **data}
        with open(self.log_file, "a") as f:
            f.write(json.dumps(entry) + "\n")

    def on_llm_start(self, serialized: dict, prompts: list, **kwargs):
        """Called when the LLM starts processing."""
        run_id = str(kwargs.get("run_id", "unknown"))
        self.start_times[run_id] = time.time()
        self._write_log("llm_start", {
            "model": serialized.get("name", "unknown"),
            "prompt_length": sum(len(p) for p in prompts)
        })
        print(f"[LOG] LLM call started. Prompt: {sum(len(p) for p in prompts)} chars")

    def on_llm_end(self, response: LLMResult, **kwargs):
        """Called when the LLM finishes processing."""
        run_id = str(kwargs.get("run_id", "unknown"))
        duration = time.time() - self.start_times.pop(run_id, time.time())

        # Extract token usage from the response
        usage = {}
        if response.llm_output:
            usage = response.llm_output.get("token_usage", {})

        self._write_log("llm_end", {
            "duration_seconds": round(duration, 3),
            "prompt_tokens": usage.get("prompt_tokens", 0),
            "completion_tokens": usage.get("completion_tokens", 0),
            "total_tokens": usage.get("total_tokens", 0)
        })
        print(f"[LOG] LLM call finished in {duration:.2f}s. "
              f"Tokens: {usage.get('total_tokens', '?')}")

    def on_llm_error(self, error: Exception, **kwargs):
        """Called when the LLM raises an error."""
        self._write_log("llm_error", {"error": str(error)})
        print(f"[LOG] LLM ERROR: {error}")

    def on_chain_start(self, serialized: dict, inputs: dict, **kwargs):
        """Called when a chain starts."""
        print(f"[LOG] Chain '{serialized.get('name', 'unknown')}' started")

    def on_chain_end(self, outputs: dict, **kwargs):
        """Called when a chain finishes."""
        print(f"[LOG] Chain finished")

    def on_tool_start(self, serialized: dict, input_str: str, **kwargs):
        """Called when a tool starts executing."""
        print(f"[LOG] Tool '{serialized.get('name', 'unknown')}' called with: {input_str[:100]}")

    def on_tool_end(self, output: str, **kwargs):
        """Called when a tool finishes."""
        print(f"[LOG] Tool returned: {output[:100]}")

# Use the custom callback
logger = DetailedLogger("my_app_log.jsonl")
result = chain.invoke(
    {"question": "Explain gravity briefly."},
    config={"callbacks": [logger]}
)

Token Usage Tracking for Cost Management

Every token sent to and received from a paid AI API costs money. A chain that works fine in development can become expensive in production if it sends unexpectedly large prompts. Token tracking reveals exactly where tokens are being spent.

from langchain_community.callbacks import get_openai_callback

with get_openai_callback() as cb:
    result = chain.invoke({"question": "Summarize the water cycle."})

print(f"Prompt tokens:     {cb.prompt_tokens}")
print(f"Completion tokens: {cb.completion_tokens}")
print(f"Total tokens:      {cb.total_tokens}")
print(f"Estimated cost:    ${cb.total_cost:.6f}")

Use this during development to profile your chains before deploying. A chain that uses 3,000 tokens per request costs 30x more than one using 100 tokens.

Integrating with LangSmith

LangSmith is the official observability platform for LangChain applications. Add these three lines to your .env file and every chain run appears in your LangSmith dashboard automatically — no code changes needed.

LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=your-langsmith-api-key
LANGCHAIN_PROJECT=my-langchain-app

LangSmith shows a timeline of each step, records all inputs and outputs, tracks latency and token usage, and lets you compare runs side by side. It is free for individual developers and invaluable for debugging complex agent runs.

Callback Event Reference

Event Method              When It Fires
──────────────────────────────────────────────────────────
on_chain_start            Chain begins executing
on_chain_end              Chain finishes successfully
on_chain_error            Chain raises an exception
on_llm_start              LLM call begins
on_llm_new_token          Each streaming token (streaming only)
on_llm_end                LLM call finishes
on_llm_error              LLM call fails
on_tool_start             Tool begins executing
on_tool_end               Tool finishes
on_tool_error             Tool raises an exception
on_retriever_start        Retriever begins search
on_retriever_end          Retriever returns results
on_agent_action           Agent decides to call a tool
on_agent_finish           Agent produces final answer

Summary

Callbacks are hooks that fire at every event during chain execution without modifying the chain itself. StdOutCallbackHandler provides instant visibility during development. Custom handlers extend BaseCallbackHandler and override the event methods you need. get_openai_callback tracks token usage and estimated API costs. LangSmith activates automatically via environment variables and provides a full production monitoring dashboard.

Previous lessons

Back to courses

Next lessons