Generative AI Large Language Models
Large Language Models — commonly called LLMs — are the most widely used type of generative AI today. They power tools like ChatGPT, Claude, Gemini, and dozens of other AI assistants. An LLM is a model trained on massive amounts of text data, capable of understanding and generating human language with remarkable accuracy.
What Makes a Language Model "Large"
The word "large" in Large Language Model refers to two things: the size of the training data and the number of parameters inside the model.
- Training data: Modern LLMs train on trillions of words — books, websites, code repositories, scientific papers, and more.
- Parameters: These are the internal numerical values the model uses to store what it has learned. GPT-3 has 175 billion parameters. Larger models have even more.
More parameters generally allow a model to capture more subtle patterns in language, leading to more coherent and accurate outputs.
How LLMs Understand Language
LLMs do not understand language the way humans do. They work with numbers. Every word or piece of a word gets converted into a number (called a token), and each token gets mapped to a list of numbers called a vector or embedding.
These vectors capture the meaning and relationships between words mathematically. Words with similar meanings end up as vectors that are numerically close to each other.
Word → Token → Vector (simplified) ──────────────────────────────────── "King" → [0.8, 0.1, 0.9, 0.3 ...] "Queen" → [0.7, 0.9, 0.8, 0.3 ...] "Apple" → [0.1, 0.2, 0.1, 0.9 ...] "King" and "Queen" vectors are close → similar meaning "Apple" vector is far from both → different concept
The Three Core Tasks an LLM Performs
1. Completion
Given a piece of text, the model completes it. This is the most fundamental behavior and underlies almost everything an LLM does.
Input: "The capital of France is" Output: "Paris"
2. Instruction Following
When trained on instruction-response pairs (a process called instruction tuning), the model learns to follow directions and complete tasks rather than just predicting the next word.
Input: "Write a one-sentence summary of the water cycle."
Output: "The water cycle is the continuous process in which water
evaporates from surfaces, rises as vapor, condenses into
clouds, and falls back as precipitation."
3. Conversation
With additional training on multi-turn dialogue data, the model can hold a conversation — remembering context from earlier in the chat and building on it.
Popular LLMs and Their Creators
| Model Name | Creator | Notable Use |
|---|---|---|
| GPT-4 | OpenAI | ChatGPT, Copilot, enterprise tools |
| Claude | Anthropic | Safe, long-context conversation and analysis |
| Gemini | Google DeepMind | Google Search, Workspace, Android assistant |
| LLaMA 3 | Meta AI | Open-source, runs locally, research and custom apps |
| Mistral | Mistral AI | Efficient open-source model, enterprise use |
| Command R | Cohere | Enterprise search, retrieval-augmented generation |
Context Window — The Model's Working Memory
An LLM can only process a limited amount of text at one time. This limit is called the context window. Everything the model uses to generate a response — including the prompt, previous conversation turns, and any documents provided — must fit within this window.
Context Window (example: 128,000 tokens)
──────────────────────────────────────────────────────────
│ System instructions │ Chat history │ Documents │ Prompt │
└────────────────────────────────────────────────────────┘
↑ Everything here is visible to the model
Anything outside the window → model cannot see or use it
Early models had context windows of 2,000–4,000 tokens. Modern models support 128,000 tokens and beyond — equivalent to an entire book.
Open-Source vs Closed Models
| Type | What It Means | Example | Best For |
|---|---|---|---|
| Closed / Proprietary | Weights not public, accessed via API | GPT-4, Claude, Gemini | Production apps, ease of use |
| Open-Source | Weights available to download and run | LLaMA 3, Mistral, Phi-3 | Privacy, customization, local deployment |
LLM Strengths and Limitations
Strengths
- Generates fluent, human-like text
- Understands and follows complex instructions
- Works across many languages
- Handles a wide range of tasks without task-specific retraining
- Can reason through multi-step problems
Limitations
- Hallucination: Sometimes generates confident but incorrect information
- Knowledge cutoff: Trained data has an end date; the model does not know recent events
- No real-time access: Cannot browse the internet unless a tool is connected
- Context limit: Cannot process documents longer than its context window
- Bias: Can reflect biases present in training data
LLM Architecture at a Glance
Input Text (Prompt)
│
▼
Tokenizer → Converts text into numbers
│
▼
Embedding Layer → Maps tokens to vectors
│
▼
Transformer Layers (many) → Apply attention, learn relationships
│
▼
Output Layer → Predicts next token probabilities
│
▼
Sampler → Picks next token based on probabilities + temperature
│
▼
Output Text (Response)
The Transformer layers are the heart of every modern LLM. The next topic covers the Transformer architecture and its key mechanism — attention — in detail.
