Understanding Large Language Models (LLMs)
An AI Agent needs a brain — and that brain is a Large Language Model (LLM). LLMs are the core technology behind modern AI Agents. Without understanding what an LLM is and how it works, building effective agents would be like driving a car without knowing how the engine works.
What is a Large Language Model?
A Large Language Model is an AI system that has been trained on massive amounts of text data — books, websites, articles, code, and more — to understand and generate human language.
The word "large" refers to two things:
- Large data: Trained on hundreds of billions of words from the internet and books
- Large parameters: Contains billions of internal numerical values that store "learned knowledge"
Simple Analogy
Think of an LLM like a student who has read millions of books and can now answer questions, write essays, summarise documents, write code, and have conversations — because of everything read during training.
How Does an LLM Work? (Simplified)
An LLM works by predicting the most likely next word (or token) given everything that has been written so far. It does this billions of times, very quickly, to generate a full response.
Step-by-Step: How an LLM Generates Text
Input (Prompt): "The capital of France is" LLM Thinks: → What word most likely comes after "The capital of France is"? → Based on training data: "Paris" (very high probability) Output: "Paris"
This is repeated token by token until a complete, coherent response is built:
"The capital of France is Paris, one of the most visited cities in the world, known for the Eiffel Tower and the Louvre Museum."
Key LLMs Used in AI Agents
| LLM Name | Created By | Popular Use |
|---|---|---|
| GPT-4 / GPT-4o | OpenAI | Most widely used in agent development |
| Claude 3 | Anthropic | Strong reasoning, long context |
| Gemini 1.5 | Google DeepMind | Multimodal (text + image + video) |
| LLaMA 3 | Meta | Open-source, run locally |
| Mistral | Mistral AI | Fast, lightweight, open-source |
Tokens — The Language of LLMs
LLMs do not read words — they read tokens. A token is a small chunk of text, usually a word or part of a word.
Token Examples
| Text | Approximate Tokens |
|---|---|
| "Hello" | 1 token |
| "Hello world" | 2 tokens |
| "Artificial Intelligence" | 3 tokens |
| 1 page of text (500 words) | ≈ 375 tokens |
| 1 book (80,000 words) | ≈ 60,000 tokens |
This matters because LLMs have a context window — the maximum number of tokens they can process at once. GPT-4 supports up to 128,000 tokens; Claude 3 supports up to 200,000 tokens.
The Context Window
The context window is like an LLM's working memory — everything it can "see" and use when generating a response. This includes:
- The system instructions (what role the AI plays)
- The entire conversation history
- Any tool results fed back to the LLM
- Documents or data provided as input
When building AI Agents, managing the context window carefully is crucial — running out of context means the agent loses earlier parts of the conversation.
What Can an LLM Do?
LLMs are remarkably capable and serve as the core reasoning engine of agents. They can:
| Capability | Example |
|---|---|
| Understand natural language | Interpret ambiguous, complex questions |
| Reason step-by-step | Solve a maths problem by thinking aloud |
| Write and fix code | Generate Python code from a description |
| Summarise long content | Condense a 50-page PDF into 5 bullet points |
| Translate languages | Convert English to Hindi, French, etc. |
| Decide which tool to call | Choose between search, calculator, or database |
| Format structured output | Return a JSON object with specific fields |
What LLMs Cannot Do (by Themselves)
Understanding the limitations of LLMs helps explain why agents are built around them with additional tools:
- Cannot access the internet — LLMs only know what was in their training data
- Training cutoff — Knowledge stops at a certain date (e.g., GPT-4 has a training cutoff)
- Cannot run code — Unless given a code execution tool
- Cannot remember past conversations — Every new conversation starts fresh (unless memory is added)
- Sometimes "hallucinate" — Can generate plausible-sounding but incorrect information
This is exactly why AI Agents add tools, memory, and external data sources on top of the LLM.
How AI Agents Use LLMs
In an AI Agent, the LLM is called multiple times during a single task. Each time, it is given a prompt that includes:
1. A system message: "You are a helpful assistant with access to web search." 2. The conversation history: User: "What's the latest news about AI in India?" 3. Available tools: - web_search(query) - summarise_text(text) 4. Instructions on how to respond: "Think step by step. If you need information, call a tool first."
The LLM then responds with either a tool call or a final answer — and the agent framework handles the rest.
LLM Parameters That Matter for Agents
Temperature
Controls how creative or predictable the LLM's responses are.
| Temperature | Behaviour | Best For |
|---|---|---|
| 0.0 | Very deterministic, same answer every time | Data extraction, code generation |
| 0.5 | Balanced — thoughtful but slightly varied | General agent reasoning |
| 1.0 | More creative, varied responses | Creative writing, brainstorming |
Max Tokens
The maximum length of the LLM's response. For agents that need to explain their reasoning and call tools, setting this high enough is important.
Model Choice
Different models have different strengths. For agents that need heavy reasoning and tool use, GPT-4o or Claude 3.5 Sonnet are currently the top choices.
Calling an LLM in Python (Basic Example)
import openai
client = openai.OpenAI(api_key="your-api-key")
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is machine learning?"}
],
temperature=0.3,
max_tokens=500
)
print(response.choices[0].message.content)
This simple call is the foundation of every AI Agent — the agent just calls this function repeatedly with updated context until the task is done.
Summary
A Large Language Model is the reasoning brain of an AI Agent. It processes text as tokens, generates responses by predicting the most likely next tokens, and can understand language, reason, write code, and decide which tools to use. While powerful, LLMs have limitations — they have no internet access, no real-time knowledge, and no memory — which is why agents extend them with tools, memory systems, and external data. Understanding LLMs is the foundation for building intelligent agents.
