Computer Vision Transfer Learning

Transfer learning reuses a neural network trained on one large task as a starting point for a new, different task. Instead of training from scratch — which requires millions of images and weeks of computation — you borrow the knowledge already captured in a pre-trained model and adapt it to your specific problem.

The Core Idea

When a CNN trains on ImageNet (1.2 million images, 1000 classes), its early layers learn to detect edges, textures, and shapes. These universal visual features are useful for almost any image task — not just ImageNet. Transfer learning takes those layers as a foundation and adds new layers on top for your specific task.

Transfer Learning Analogy

LEARNING FROM SCRATCH:
  You want to become a radiologist.
  You start by learning biology, chemistry, anatomy, physics,
  pathology, and then finally radiology — years of study.

TRANSFER LEARNING:
  You already have a medical degree (you know biology, anatomy).
  You only need to study radiology-specific skills.
  Time saved: massive.

For CNNs:
  Pre-trained model = medical degree (general visual knowledge)
  Fine-tuning = radiology specialization (your specific task)

Pre-trained Models as Feature Extractors

The simplest form of transfer learning treats the pre-trained CNN as a fixed feature extractor. You freeze all the original weights and only train a new classifier head on top for your task.

Feature Extraction Pipeline

PRE-TRAINED ResNet-50 (trained on ImageNet):
  [Input Image]
       ↓
  [Conv Block 1]   ← Detects edges (FROZEN — weights not updated)
       ↓
  [Conv Block 2]   ← Detects textures (FROZEN)
       ↓
  [Conv Block 3]   ← Detects shapes (FROZEN)
       ↓
  [Conv Block 4]   ← Detects object parts (FROZEN)
       ↓
  [Global Average Pool] → 2048-dimensional feature vector

NEW HEAD (trained from scratch on YOUR data):
  [2048 features]
       ↓
  [Dense layer → 256] ← TRAINABLE
       ↓
  [Dense layer → N]   ← TRAINABLE (N = your number of classes)
       ↓
  [Softmax]
       ↓
  [Your custom class labels]

This approach works well when your dataset is small (hundreds to a few thousand images) and your images are similar in appearance to ImageNet photos.

Fine-Tuning

Fine-tuning unfreezes some or all of the pre-trained layers and trains them on your data — but with a very small learning rate to avoid destroying the existing knowledge. This lets the model adapt its features specifically to your domain while preserving what it already knows.

When to Freeze vs. Fine-Tune

Your Dataset	Similarity to Pre-training Data	Strategy
Small (<1000 images)	Similar (natural photos)	Freeze all → train head only
Small (<1000 images)	Different (medical scans, satellite)	Fine-tune a few top layers
Medium (10k–100k)	Similar	Fine-tune all layers with low learning rate
Large (>1M images)	Any	Consider training from scratch

Learning Rate in Fine-Tuning

Training from scratch: learning rate = 0.01
Fine-tuning:           learning rate = 0.0001 (100× smaller)

Why so small?
  Pre-trained weights already encode valuable features.
  Large updates would "forget" them (called catastrophic forgetting).
  Small updates gently nudge the features toward your domain.

Gradual unfreezing (popular technique):
  Epoch 1–5:  Train new head only.
  Epoch 6–10: Unfreeze top 1 CNN block + train.
  Epoch 11–15: Unfreeze top 2 blocks + train.
  (More gradual = less risk of forgetting.)

Popular Pre-trained Models

Model	Year	Parameters	Top-1 Accuracy (ImageNet)	Best For
VGG-16	2014	138M	71.3%	Simple baseline, widely documented
ResNet-50	2015	25M	76.1%	General purpose, balanced speed/accuracy
MobileNetV2	2018	3.4M	72.0%	Mobile apps, edge devices
EfficientNet-B4	2019	19M	83.0%	High accuracy, moderate size
Vision Transformer (ViT)	2020	86M	81.8%	Large data, self-attention based

Domain Adaptation

Domain adaptation handles a specific challenge: your training images come from one source (domain) but your deployment images come from a different source with different visual characteristics. For example, training on sunny daytime photos but deploying in rainy, nighttime conditions.

Domain Gap Illustration

SOURCE DOMAIN (training):    TARGET DOMAIN (deployment):
  Clear daytime photos          Rainy, night-time photos
  Clean, colorful               Blurry, dark, wet

  Model trained on source → accuracy drops on target.
  Why? The model learns source-specific patterns (bright colors,
       sharp textures) that do not transfer to target.

Domain Adaptation techniques:
  1. Collect some target-domain data and fine-tune.
  2. Use adversarial training to make features domain-invariant.
  3. Use data augmentation (add rain, darkness) to bridge the gap.

Real-World Transfer Learning Example: Plant Disease Detection

Task: Detect disease in cassava plant leaves.
Available labeled data: 1,500 images (very small dataset).
Without transfer learning: 62% accuracy (model cannot generalize).
With transfer learning (EfficientNet fine-tuned): 91% accuracy.

Steps taken:
  1. Download EfficientNet-B4 pre-trained on ImageNet.
  2. Replace final layer: 1000 ImageNet classes → 5 cassava disease classes.
  3. Freeze all layers except last 3 blocks + new head.
  4. Train for 10 epochs on 1,500 images.
  5. Achieve 91% test accuracy.

Time to train: 20 minutes on a single GPU.
Training from scratch would need: ~50,000+ images.

Transfer Learning for Non-Classification Tasks

Transfer learning also applies to detection and segmentation. Pre-trained CNN backbones (ResNet, VGG) serve as the feature extractor inside detection models (YOLO, Faster R-CNN) and segmentation models (U-Net, DeepLab). You replace only the task-specific head while reusing the backbone's learned features.

Backbone Transfer for Detection

OBJECT DETECTION MODEL:

  [Pre-trained ResNet-50 backbone] → feature maps
         ↑                               ↓
  (Frozen or fine-tuned)     [Detection head: RPN + classifier]
                                          ↑
                              (Trained from scratch on your detection dataset)

Key Takeaways

Transfer learning reuses a model trained on a large dataset as a starting point for a new task.
Feature extraction freezes all pre-trained weights and only trains a new classifier head — ideal for small datasets.
Fine-tuning unfreezes some layers and trains them with a small learning rate — adapts features to your domain.
Popular backbones (ResNet, EfficientNet, MobileNet) are pre-trained on ImageNet and widely available.
Domain adaptation addresses differences between training and deployment environments.
Transfer learning dramatically reduces the data and compute needed for real-world projects.

Previous lesson

Back to course

Next lesson