Generative AI Large Language Models

Large Language Models — commonly called LLMs — are the most widely used type of generative AI today. They power tools like ChatGPT, Claude, Gemini, and dozens of other AI assistants. An LLM is a model trained on massive amounts of text data, capable of understanding and generating human language with remarkable accuracy.

What Makes a Language Model "Large"

The word "large" in Large Language Model refers to two things: the size of the training data and the number of parameters inside the model.

  • Training data: Modern LLMs train on trillions of words — books, websites, code repositories, scientific papers, and more.
  • Parameters: These are the internal numerical values the model uses to store what it has learned. GPT-3 has 175 billion parameters. Larger models have even more.

More parameters generally allow a model to capture more subtle patterns in language, leading to more coherent and accurate outputs.

How LLMs Understand Language

LLMs do not understand language the way humans do. They work with numbers. Every word or piece of a word gets converted into a number (called a token), and each token gets mapped to a list of numbers called a vector or embedding.

These vectors capture the meaning and relationships between words mathematically. Words with similar meanings end up as vectors that are numerically close to each other.

Word → Token → Vector (simplified)
────────────────────────────────────
"King"   →  [0.8, 0.1, 0.9, 0.3 ...]
"Queen"  →  [0.7, 0.9, 0.8, 0.3 ...]
"Apple"  →  [0.1, 0.2, 0.1, 0.9 ...]

"King" and "Queen" vectors are close → similar meaning
"Apple" vector is far from both → different concept

The Three Core Tasks an LLM Performs

1. Completion

Given a piece of text, the model completes it. This is the most fundamental behavior and underlies almost everything an LLM does.

Input:  "The capital of France is"
Output: "Paris"

2. Instruction Following

When trained on instruction-response pairs (a process called instruction tuning), the model learns to follow directions and complete tasks rather than just predicting the next word.

Input:  "Write a one-sentence summary of the water cycle."
Output: "The water cycle is the continuous process in which water
         evaporates from surfaces, rises as vapor, condenses into
         clouds, and falls back as precipitation."

3. Conversation

With additional training on multi-turn dialogue data, the model can hold a conversation — remembering context from earlier in the chat and building on it.

Popular LLMs and Their Creators

Model NameCreatorNotable Use
GPT-4OpenAIChatGPT, Copilot, enterprise tools
ClaudeAnthropicSafe, long-context conversation and analysis
GeminiGoogle DeepMindGoogle Search, Workspace, Android assistant
LLaMA 3Meta AIOpen-source, runs locally, research and custom apps
MistralMistral AIEfficient open-source model, enterprise use
Command RCohereEnterprise search, retrieval-augmented generation

Context Window — The Model's Working Memory

An LLM can only process a limited amount of text at one time. This limit is called the context window. Everything the model uses to generate a response — including the prompt, previous conversation turns, and any documents provided — must fit within this window.

Context Window (example: 128,000 tokens)
──────────────────────────────────────────────────────────
│ System instructions │ Chat history │ Documents │ Prompt │
└────────────────────────────────────────────────────────┘
         ↑ Everything here is visible to the model

Anything outside the window → model cannot see or use it

Early models had context windows of 2,000–4,000 tokens. Modern models support 128,000 tokens and beyond — equivalent to an entire book.

Open-Source vs Closed Models

TypeWhat It MeansExampleBest For
Closed / ProprietaryWeights not public, accessed via APIGPT-4, Claude, GeminiProduction apps, ease of use
Open-SourceWeights available to download and runLLaMA 3, Mistral, Phi-3Privacy, customization, local deployment

LLM Strengths and Limitations

Strengths

  • Generates fluent, human-like text
  • Understands and follows complex instructions
  • Works across many languages
  • Handles a wide range of tasks without task-specific retraining
  • Can reason through multi-step problems

Limitations

  • Hallucination: Sometimes generates confident but incorrect information
  • Knowledge cutoff: Trained data has an end date; the model does not know recent events
  • No real-time access: Cannot browse the internet unless a tool is connected
  • Context limit: Cannot process documents longer than its context window
  • Bias: Can reflect biases present in training data

LLM Architecture at a Glance

Input Text (Prompt)
        │
        ▼
  Tokenizer → Converts text into numbers
        │
        ▼
  Embedding Layer → Maps tokens to vectors
        │
        ▼
  Transformer Layers (many) → Apply attention, learn relationships
        │
        ▼
  Output Layer → Predicts next token probabilities
        │
        ▼
  Sampler → Picks next token based on probabilities + temperature
        │
        ▼
  Output Text (Response)

The Transformer layers are the heart of every modern LLM. The next topic covers the Transformer architecture and its key mechanism — attention — in detail.

Leave a Comment