GenAI Types of Models

Generative AI is not a single technology. It is a family of different model architectures, each designed to generate a specific type of content. Understanding the main model types helps in choosing the right approach for any given task.

The Four Main Types of Generative AI Models

1. Large Language Models (LLMs)

Large Language Models generate text. They learn from vast amounts of written content — books, websites, articles, code — and produce human-like text in response to a prompt.

How it works: The model reads a sequence of words and predicts the most probable next word. It repeats this process until the full response is complete.

Examples: GPT-4, Claude, Gemini, LLaMA, Mistral

Used for: Writing, summarization, translation, question answering, coding assistance, chatbots

2. Diffusion Models

Diffusion models generate images (and increasingly audio and video). They work by learning to remove noise from a blurry image until a clear image appears.

How it works: During training, the model sees clean images slowly turned into pure noise. It learns to reverse this process. At generation time, it starts from random noise and gradually "denoises" it into a sharp image matching the prompt.

Examples: Stable Diffusion, DALL·E 3, Midjourney, Adobe Firefly

Used for: Art generation, product mockups, photo editing, visual design

Diffusion Process (Image Generation)
─────────────────────────────────────
Random Noise ──▶ Partial Image ──▶ Clearer Image ──▶ Final Image
[■■■■■■■■■■]   [▓▒░■▓▒░■▓▒]   [rough shape]    [photo of a dog]
   Start             ▲                                  End
                 Model applies
                 learned pattern
                 to reduce noise

3. Generative Adversarial Networks (GANs)

GANs were one of the earliest breakthroughs in generative AI. They use two competing neural networks — a generator and a discriminator — that train against each other.

How it works:

The generator creates fake images
The discriminator tries to tell real images from fake ones
Over time, the generator gets so good that the discriminator cannot tell the difference

GAN Structure
─────────────────────────────────────────────
Random Input ──▶ GENERATOR ──▶ Fake Image ──┐
                                             │
                                    DISCRIMINATOR ──▶ Real or Fake?
                                             │
Real Images ─────────────────────────────────┘

Examples: StyleGAN, DeepFake models, face generation systems

Used for: Synthetic face generation, image-to-image translation, data augmentation

4. Variational Autoencoders (VAEs)

VAEs compress data into a compact representation and then reconstruct it. They are useful for generating variations of existing content.

How it works: The model encodes an input (like an image) into a small set of numbers, then decodes those numbers back into a new image. By slightly changing the numbers in the middle, new variations of the original image appear.

Examples: Used inside image editing tools and in combination with other models

Used for: Image compression, face morphing, drug molecule generation, anomaly detection

VAE Structure
──────────────────────────────────────────────────
Input Image ──▶ ENCODER ──▶ [small number set] ──▶ DECODER ──▶ Reconstructed Image
                                    │
                                    ▼
                              Change numbers slightly
                                    │
                                    ▼
                           DECODER ──▶ New Variation of Image

Newer Additions: Transformer-Based Multimodal Models

Modern generative AI often combines multiple model types into one. These multimodal models can accept text, image, and audio as input and generate any combination of those as output.

Examples: GPT-4o (text + image), Gemini Ultra, Claude 3 Opus

Used for: Analyzing images while responding in text, generating images from detailed descriptions, processing documents with mixed content

Quick Comparison Table

Model Type	Main Output	Key Strength	Example
LLM	Text	Language understanding and generation	ChatGPT, Claude
Diffusion Model	Image	Photorealistic image generation	Stable Diffusion
GAN	Image / Video	Highly realistic synthetic media	StyleGAN
VAE	Image / Data	Compact encoding and variation generation	Used inside image tools
Multimodal	Text + Image + Audio	Cross-format understanding and generation	GPT-4o, Gemini

Which Model Type Should Be Used?

Choosing the right model type depends on the task at hand:

For writing, summarizing, or coding → use an LLM
For creating images from text → use a Diffusion Model
For realistic face or video synthesis → use a GAN
For generating data variations → use a VAE
For tasks involving both text and images → use a Multimodal Model

Large Language Models are the most widely used type today, and they form the foundation of most generative AI applications. The next topic explores LLMs in detail.

Previous lessons

Back to courses

Next lessons