Deep Learning Autoencoders

An autoencoder is a neural network trained to compress data into a compact form and then reconstruct the original data from that compact form. It learns to capture the most important features of data — automatically — without any labeled examples. This makes autoencoders powerful tools for compression, denoising, anomaly detection, and generative modeling.

The Core Idea: Compress and Reconstruct

Think of an autoencoder like a zip-file system. You compress a file to save space — and then you unzip it to get the original back. An autoencoder learns to do this compression entirely by itself.

Autoencoder Architecture Diagram

INPUT          ENCODER        BOTTLENECK        DECODER        OUTPUT
(original)   (compress)     (compact code)    (reconstruct)  (rebuilt)

[784 pixels]→ [256] → [128] → [32 dims] → [128] → [256] → [784 pixels]
     ↑                             ↑                              ↑
  Original                   Latent Space                 Reconstructed
  Image                      (learned compact             Image
                              representation)

The network has two halves:

Encoder — compresses the input into a small, dense code called the latent vector
Decoder — takes the latent vector and reconstructs the original input

Training an Autoencoder

An autoencoder is self-supervised — it uses the input itself as the target output. No labels are required.

Training goal:
  Input image → Encoder → Bottleneck → Decoder → Reconstructed image
  Minimize the difference between input and reconstructed output

Loss = Reconstruction Error
     = distance between original pixels and reconstructed pixels

If the reconstruction is near-perfect → Loss is low → bottleneck learned useful compression

The bottleneck forces the network to be selective. It cannot store everything — it must learn which features matter most. The result is a compact, meaningful representation of the data.

What the Bottleneck Learns

Example: Handwritten Digit Images

Input: 784 pixel values (28×28 image of digit "3")
Bottleneck: 2-dimensional latent code

Latent space map (2D plot):
   Dimension 2
     ↑
   3 │    ○ ○ ○          ← all "3" digits cluster here
     │         △ △       ← all "7" digits cluster here
   7 │    * *            ← all "1" digits cluster here
   1 │
     └──────────────────→ Dimension 1

Digits with similar shapes sit close together in latent space.

The autoencoder discovers the structure of the data — with no labels — by learning to compress and reconstruct it.

Key Use Cases

1. Denoising

A denoising autoencoder is trained to reconstruct clean images from corrupted inputs. You deliberately add noise to training images, and the autoencoder learns to remove it.

Training:
  Noisy input → Autoencoder → Clean output
  Loss = distance between output and the original clean image

At inference:
  Noisy photo → Autoencoder → Restored clean photo

Applications include photo restoration, audio denoising, and cleaning up corrupted scans of documents.

2. Anomaly Detection

Train an autoencoder on normal examples only. It learns to compress and reconstruct normal data very well. When an anomalous input arrives, the autoencoder struggles to reconstruct it — because the bottleneck never learned how to encode that kind of data.

Autoencoder trained on: normal factory sensor readings

Normal input:  Reconstruction error = 0.02 (very low)  → Normal
Faulty input:  Reconstruction error = 0.87 (very high) → Anomaly!

Applications:
  → Detecting defective products on assembly lines
  → Spotting fraudulent transactions
  → Flagging unusual network traffic

3. Dimensionality Reduction

The bottleneck code is a compressed version of the input that retains its most important features. This compact representation is useful for visualization, clustering, and speeding up other machine learning models.

Original data: 10,000 features per sample (unmanageable)
After autoencoder bottleneck: 50 features per sample (compact)

The 50-feature version:
  → Trains other models faster
  → Visualizable in 2D or 3D
  → Still captures the key patterns

4. Data Generation (Variational Autoencoder)

A Variational Autoencoder (VAE) extends the standard autoencoder to generate new data. Instead of a single latent point, the encoder produces a probability distribution. Sampling from this distribution produces new, realistic outputs.

Standard Autoencoder:
  Input → Encoder → Single point in latent space → Decoder → Reconstruction

Variational Autoencoder (VAE):
  Input → Encoder → Distribution (mean + variance) → Sample point → Decoder → New output

Generate a new face:
  Sample a random point from the latent distribution
  → Decoder → A new, realistic human face never seen before

Autoencoder Types at a Glance

Type	Key Difference	Main Use
Basic Autoencoder	Compress and reconstruct	Feature extraction, compression
Denoising Autoencoder	Trained on corrupted inputs	Noise removal, image restoration
Sparse Autoencoder	Most bottleneck neurons forced to zero	Feature learning, representation
Variational Autoencoder (VAE)	Latent space is a distribution	Image generation, data synthesis
Convolutional Autoencoder	Uses convolutional layers	Image compression, denoising

Key Terms

Autoencoder — a network that compresses input into a bottleneck, then reconstructs it
Encoder — the compression half of the network
Decoder — the reconstruction half of the network
Latent Space — the compressed representation space defined by the bottleneck
Latent Vector — a single compressed representation of one input
Reconstruction Error — the loss measuring how different the output is from the input
VAE — Variational Autoencoder — a generative model that learns a distribution over latent space

Previous lesson

Back to course

Next lesson