Generative AI Large Language Models

Large Language Models — commonly called LLMs — are the most widely used type of generative AI today. They power tools like ChatGPT, Claude, Gemini, and dozens of other AI assistants. An LLM is a model trained on massive amounts of text data, capable of understanding and generating human language with remarkable accuracy.

What Makes a Language Model "Large"

The word "large" in Large Language Model refers to two things: the size of the training data and the number of parameters inside the model.

Training data: Modern LLMs train on trillions of words — books, websites, code repositories, scientific papers, and more.
Parameters: These are the internal numerical values the model uses to store what it has learned. GPT-3 has 175 billion parameters. Larger models have even more.

More parameters generally allow a model to capture more subtle patterns in language, leading to more coherent and accurate outputs.

How LLMs Understand Language

LLMs do not understand language the way humans do. They work with numbers. Every word or piece of a word gets converted into a number (called a token), and each token gets mapped to a list of numbers called a vector or embedding.

These vectors capture the meaning and relationships between words mathematically. Words with similar meanings end up as vectors that are numerically close to each other.

Word → Token → Vector (simplified)
────────────────────────────────────
"King"   →  [0.8, 0.1, 0.9, 0.3 ...]
"Queen"  →  [0.7, 0.9, 0.8, 0.3 ...]
"Apple"  →  [0.1, 0.2, 0.1, 0.9 ...]

"King" and "Queen" vectors are close → similar meaning
"Apple" vector is far from both → different concept

The Three Core Tasks an LLM Performs

1. Completion

Given a piece of text, the model completes it. This is the most fundamental behavior and underlies almost everything an LLM does.

Input:  "The capital of France is"
Output: "Paris"

2. Instruction Following

When trained on instruction-response pairs (a process called instruction tuning), the model learns to follow directions and complete tasks rather than just predicting the next word.

Input:  "Write a one-sentence summary of the water cycle."
Output: "The water cycle is the continuous process in which water
         evaporates from surfaces, rises as vapor, condenses into
         clouds, and falls back as precipitation."

3. Conversation

With additional training on multi-turn dialogue data, the model can hold a conversation — remembering context from earlier in the chat and building on it.

Popular LLMs and Their Creators

Model Name	Creator	Notable Use
GPT-4	OpenAI	ChatGPT, Copilot, enterprise tools
Claude	Anthropic	Safe, long-context conversation and analysis
Gemini	Google DeepMind	Google Search, Workspace, Android assistant
LLaMA 3	Meta AI	Open-source, runs locally, research and custom apps
Mistral	Mistral AI	Efficient open-source model, enterprise use
Command R	Cohere	Enterprise search, retrieval-augmented generation

Context Window — The Model's Working Memory

An LLM can only process a limited amount of text at one time. This limit is called the context window. Everything the model uses to generate a response — including the prompt, previous conversation turns, and any documents provided — must fit within this window.

Context Window (example: 128,000 tokens)
──────────────────────────────────────────────────────────
│ System instructions │ Chat history │ Documents │ Prompt │
└────────────────────────────────────────────────────────┘
         ↑ Everything here is visible to the model

Anything outside the window → model cannot see or use it

Early models had context windows of 2,000–4,000 tokens. Modern models support 128,000 tokens and beyond — equivalent to an entire book.

Open-Source vs Closed Models

Type	What It Means	Example	Best For
Closed / Proprietary	Weights not public, accessed via API	GPT-4, Claude, Gemini	Production apps, ease of use
Open-Source	Weights available to download and run	LLaMA 3, Mistral, Phi-3	Privacy, customization, local deployment

LLM Strengths and Limitations

Strengths

Generates fluent, human-like text
Understands and follows complex instructions
Works across many languages
Handles a wide range of tasks without task-specific retraining
Can reason through multi-step problems

Limitations

Hallucination: Sometimes generates confident but incorrect information
Knowledge cutoff: Trained data has an end date; the model does not know recent events
No real-time access: Cannot browse the internet unless a tool is connected
Context limit: Cannot process documents longer than its context window
Bias: Can reflect biases present in training data

LLM Architecture at a Glance

Input Text (Prompt)
        │
        ▼
  Tokenizer → Converts text into numbers
        │
        ▼
  Embedding Layer → Maps tokens to vectors
        │
        ▼
  Transformer Layers (many) → Apply attention, learn relationships
        │
        ▼
  Output Layer → Predicts next token probabilities
        │
        ▼
  Sampler → Picks next token based on probabilities + temperature
        │
        ▼
  Output Text (Response)

The Transformer layers are the heart of every modern LLM. The next topic covers the Transformer architecture and its key mechanism — attention — in detail.

Previous lesson

Back to course

Next lesson