CV Thresholding and Segmentation

Segmentation splits an image into meaningful parts. Thresholding is the simplest way to segment — it assigns each pixel to one of two groups based on its brightness. More advanced methods divide an image into many regions based on color, texture, or learned features.

What Is Thresholding?

Thresholding converts a grayscale image into a binary image (just black and white). You choose a threshold value T. Every pixel brighter than T becomes white (255), and every pixel darker becomes black (0).

Threshold Decision Diagram

Grayscale pixel values (0–255):
  [30] [80] [120] [200] [240] [50] [170] [90]

Threshold T = 128:

  Pixel ≤ 128 → BLACK (0)
  Pixel > 128 → WHITE (255)

Result:
  [0]  [0]  [0]  [255] [255] [0]  [255] [0]
   ↑    ↑    ↑     ↑     ↑    ↑     ↑    ↑
  dark dark dark  bright bright dark bright dark

Simple (Global) Thresholding

Global thresholding uses one threshold value for the entire image. It works well when the image has uniform lighting and a clear separation between the foreground (object) and background.

Example: Separating Text from Paper

Scanned document (grayscale values, simplified):
  Background paper: ~230–255 (bright)
  Printed text:     ~10–60   (dark)

Choose T = 128.

Text pixels (≤128) → BLACK → text visible
Background (>128)  → WHITE → clean paper

Result: Clean black text on white background.
→ Perfect for OCR (Optical Character Recognition).

Otsu's Automatic Thresholding

Choosing T manually requires guessing. Otsu's method finds the best threshold automatically by looking at the histogram — a chart showing how many pixels have each brightness value. It picks the T that maximizes the difference between the two groups of pixels.

Otsu's Method Concept

Histogram of pixel values (count vs. brightness):

Count
  │         ██                    ████
  │        ████                  ██████
  │       ██████                ████████
  │     ████████              ██████████
  └──────────────────────────────────────
  0     50    100    128    200    255  Brightness

Two peaks visible:
  Peak 1 around 40–60   → dark pixels (foreground object)
  Peak 2 around 200–220 → bright pixels (background)

Otsu picks T ≈ 128 (the valley between the peaks).
This cleanly separates the two groups.

Adaptive Thresholding

When lighting in an image is uneven — bright in one corner and dark in another — global thresholding fails. Adaptive thresholding solves this by computing a different threshold for small regions of the image separately.

Global vs. Adaptive — Uneven Lighting

Image: handwritten note with shadow on left side.

GLOBAL THRESHOLD (T=128):
  Bright side → text visible
  Shadow side → text disappears (too dark to cross threshold)
  Result: Missing text!

ADAPTIVE THRESHOLD:
  Left region: T computed locally ≈ 60   (dark area)
  Right region: T computed locally ≈ 160  (bright area)
  Result: Text visible across the entire image.

Beyond Binary: Multi-Class Segmentation

Thresholding creates only two classes (foreground and background). Real-world images have many regions. Segmentation methods that go beyond thresholding label each pixel as belonging to one of many possible classes.

Types of Segmentation

Type	Output	Example
Semantic segmentation	Every pixel gets a class label	Road / Car / Pedestrian / Sky
Instance segmentation	Separate label for each object instance	Car 1 / Car 2 / Car 3 (each a different color)
Panoptic segmentation	Semantic + Instance combined	Sky (one region) + each person labeled separately

Segmentation Type Diagram

Original image: [Two cats on a rug]

Semantic:    Cat | Cat | Rug   (all cat pixels = one "cat" label)
              ████ ████ ░░░░░

Instance:    Cat1 | Cat2 | Rug  (each cat gets own label)
              ████  ▓▓▓▓  ░░░░░

Panoptic:    Cat1(id=1) | Cat2(id=2) | Rug(stuff)

K-Means Clustering for Segmentation

K-Means groups pixels by color similarity. You choose K (the number of groups), and the algorithm assigns each pixel to the nearest color group. This works without any labeled training data.

K-Means Segmentation Steps

Step 1: Choose K=3 (three color groups).

Step 2: Pick 3 random pixels as starting "center" colors:
  Center A = Blue (sky)
  Center B = Green (grass)
  Center C = Brown (dirt path)

Step 3: Assign every pixel to nearest center by color distance.

Step 4: Recalculate each center as the average color of its assigned pixels.

Step 5: Repeat Steps 3–4 until assignments stop changing.

Result:
  Region 1 (Blue)  = Sky area
  Region 2 (Green) = Grass area
  Region 3 (Brown) = Dirt path area

Watershed Algorithm

The watershed algorithm treats pixel brightness like elevation on a terrain map. Valleys fill with water first. Boundaries form where different water bodies would meet. This method segments touching objects — like cells touching each other in a microscope image.

Watershed Analogy

Think of a satellite view of mountains and valleys:

  ▲▲▲▲▲  ▲▲▲▲▲
  ▲▲▲▲▲  ▲▲▲▲▲    ← High intensity = mountains
  ▲▲   ▲▲   ▲▲
       ↓↓↓         ← Low intensity = valleys
  ~~~~basin1~~~~   ← Water fills valley 1
           ~~~~    ← Water fills valley 2
  == watershed ==  ← Boundary between them

In cell imaging:
  Each cell = a dark valley
  Cell boundary = bright ridge between cells
  Watershed finds each cell separately.

Real-World Applications

Medical imaging – Segment tumors, organs, or blood vessels from scans.
Satellite imagery – Label each land-use type: forest, water, urban, farmland.
Autonomous driving – Identify road, pedestrian, vehicle, and sign regions in each camera frame.
Document processing – Separate text from images in scanned pages.
Background removal – Separate a product from its background for e-commerce photos.

Key Takeaways

Thresholding assigns pixels to black or white based on a brightness cutoff value.
Otsu's method automatically finds the optimal threshold using the histogram.
Adaptive thresholding uses different thresholds for different image regions — handles uneven lighting.
Semantic segmentation labels each pixel. Instance segmentation labels each object separately.
K-Means groups pixels by color without any training data.
Watershed separates touching objects by treating brightness as terrain elevation.

Previous lessons

Back to courses

Next lessons