CV Thresholding and Segmentation
Segmentation splits an image into meaningful parts. Thresholding is the simplest way to segment — it assigns each pixel to one of two groups based on its brightness. More advanced methods divide an image into many regions based on color, texture, or learned features.
What Is Thresholding?
Thresholding converts a grayscale image into a binary image (just black and white). You choose a threshold value T. Every pixel brighter than T becomes white (255), and every pixel darker becomes black (0).
Threshold Decision Diagram
Grayscale pixel values (0–255): [30] [80] [120] [200] [240] [50] [170] [90] Threshold T = 128: Pixel ≤ 128 → BLACK (0) Pixel > 128 → WHITE (255) Result: [0] [0] [0] [255] [255] [0] [255] [0] ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ dark dark dark bright bright dark bright dark
Simple (Global) Thresholding
Global thresholding uses one threshold value for the entire image. It works well when the image has uniform lighting and a clear separation between the foreground (object) and background.
Example: Separating Text from Paper
Scanned document (grayscale values, simplified): Background paper: ~230–255 (bright) Printed text: ~10–60 (dark) Choose T = 128. Text pixels (≤128) → BLACK → text visible Background (>128) → WHITE → clean paper Result: Clean black text on white background. → Perfect for OCR (Optical Character Recognition).
Otsu's Automatic Thresholding
Choosing T manually requires guessing. Otsu's method finds the best threshold automatically by looking at the histogram — a chart showing how many pixels have each brightness value. It picks the T that maximizes the difference between the two groups of pixels.
Otsu's Method Concept
Histogram of pixel values (count vs. brightness): Count │ ██ ████ │ ████ ██████ │ ██████ ████████ │ ████████ ██████████ └────────────────────────────────────── 0 50 100 128 200 255 Brightness Two peaks visible: Peak 1 around 40–60 → dark pixels (foreground object) Peak 2 around 200–220 → bright pixels (background) Otsu picks T ≈ 128 (the valley between the peaks). This cleanly separates the two groups.
Adaptive Thresholding
When lighting in an image is uneven — bright in one corner and dark in another — global thresholding fails. Adaptive thresholding solves this by computing a different threshold for small regions of the image separately.
Global vs. Adaptive — Uneven Lighting
Image: handwritten note with shadow on left side. GLOBAL THRESHOLD (T=128): Bright side → text visible Shadow side → text disappears (too dark to cross threshold) Result: Missing text! ADAPTIVE THRESHOLD: Left region: T computed locally ≈ 60 (dark area) Right region: T computed locally ≈ 160 (bright area) Result: Text visible across the entire image.
Beyond Binary: Multi-Class Segmentation
Thresholding creates only two classes (foreground and background). Real-world images have many regions. Segmentation methods that go beyond thresholding label each pixel as belonging to one of many possible classes.
Types of Segmentation
| Type | Output | Example |
|---|---|---|
| Semantic segmentation | Every pixel gets a class label | Road / Car / Pedestrian / Sky |
| Instance segmentation | Separate label for each object instance | Car 1 / Car 2 / Car 3 (each a different color) |
| Panoptic segmentation | Semantic + Instance combined | Sky (one region) + each person labeled separately |
Segmentation Type Diagram
Original image: [Two cats on a rug]
Semantic: Cat | Cat | Rug (all cat pixels = one "cat" label)
████ ████ ░░░░░
Instance: Cat1 | Cat2 | Rug (each cat gets own label)
████ ▓▓▓▓ ░░░░░
Panoptic: Cat1(id=1) | Cat2(id=2) | Rug(stuff)
K-Means Clustering for Segmentation
K-Means groups pixels by color similarity. You choose K (the number of groups), and the algorithm assigns each pixel to the nearest color group. This works without any labeled training data.
K-Means Segmentation Steps
Step 1: Choose K=3 (three color groups). Step 2: Pick 3 random pixels as starting "center" colors: Center A = Blue (sky) Center B = Green (grass) Center C = Brown (dirt path) Step 3: Assign every pixel to nearest center by color distance. Step 4: Recalculate each center as the average color of its assigned pixels. Step 5: Repeat Steps 3–4 until assignments stop changing. Result: Region 1 (Blue) = Sky area Region 2 (Green) = Grass area Region 3 (Brown) = Dirt path area
Watershed Algorithm
The watershed algorithm treats pixel brightness like elevation on a terrain map. Valleys fill with water first. Boundaries form where different water bodies would meet. This method segments touching objects — like cells touching each other in a microscope image.
Watershed Analogy
Think of a satellite view of mountains and valleys:
▲▲▲▲▲ ▲▲▲▲▲
▲▲▲▲▲ ▲▲▲▲▲ ← High intensity = mountains
▲▲ ▲▲ ▲▲
↓↓↓ ← Low intensity = valleys
~~~~basin1~~~~ ← Water fills valley 1
~~~~ ← Water fills valley 2
== watershed == ← Boundary between them
In cell imaging:
Each cell = a dark valley
Cell boundary = bright ridge between cells
Watershed finds each cell separately.
Real-World Applications
- Medical imaging – Segment tumors, organs, or blood vessels from scans.
- Satellite imagery – Label each land-use type: forest, water, urban, farmland.
- Autonomous driving – Identify road, pedestrian, vehicle, and sign regions in each camera frame.
- Document processing – Separate text from images in scanned pages.
- Background removal – Separate a product from its background for e-commerce photos.
Key Takeaways
- Thresholding assigns pixels to black or white based on a brightness cutoff value.
- Otsu's method automatically finds the optimal threshold using the histogram.
- Adaptive thresholding uses different thresholds for different image regions — handles uneven lighting.
- Semantic segmentation labels each pixel. Instance segmentation labels each object separately.
- K-Means groups pixels by color without any training data.
- Watershed separates touching objects by treating brightness as terrain elevation.
