What Is Computer Vision

Computer Vision is the field of study that teaches computers to understand and interpret images and videos — just like your eyes send visual signals to your brain, a camera sends data to a computer, and computer vision algorithms make sense of it.

The Core Idea

When you look at a photo of a dog, your brain instantly recognizes the animal, its posture, and even its breed. A computer sees the same photo as millions of tiny colored dots called pixels. Computer vision turns those dots into meaningful information — "this is a golden retriever sitting on grass."

A Simple Diagram: Human Vision vs. Computer Vision

HUMAN VISION
  [Eye] → [Optic Nerve] → [Brain] → "That's a cat!"

COMPUTER VISION
  [Camera] → [Pixels Grid] → [Algorithm] → "That's a cat!"

Both processes start with light hitting a sensor and end with understanding. The difference is that the human brain has billions of years of evolution behind it, while computer vision uses math and data.

Why Computer Vision Matters

Computer vision powers many everyday tools you already use. Unlocking your phone with your face uses face recognition. Self-driving cars use cameras to detect lanes, pedestrians, and traffic signs. Doctors use computer vision to spot tumors in X-rays. Online stores use it to let you search for products by taking a photo.

Where Computer Vision Is Used

Healthcare – Detecting diseases from scans and slides
Security – Surveillance cameras that alert on unusual behavior
Retail – Cashier-less checkout systems (like Amazon Go)
Agriculture – Drones that identify diseased crops
Manufacturing – Quality control on assembly lines

How a Computer Sees an Image

Every digital image breaks down into a grid of pixels. Each pixel holds a number representing its brightness or color. A black-and-white photo uses one number per pixel (0 = black, 255 = white). A color photo uses three numbers per pixel — one each for Red, Green, and Blue.

Pixel Grid Example

Tiny 4×4 grayscale image (numbers = brightness):
+-----+-----+-----+-----+
|  10 |  80 | 200 | 255 |
+-----+-----+-----+-----+
|  30 | 100 | 180 | 240 |
+-----+-----+-----+-----+
|  50 | 120 | 160 | 220 |
+-----+-----+-----+-----+
|  70 | 140 | 150 | 200 |
+-----+-----+-----+-----+
Left side = darker, Right side = brighter

A real photo can have millions of such pixels. Computer vision reads all these numbers and finds patterns — edges, shapes, textures — to identify what is in the image.

The Three Main Tasks in Computer Vision

Most computer vision problems fall into three categories.

Classification – "What is in this image?" (Example: cat or dog?)
Detection – "Where is the object?" (Example: draw a box around every car)
Segmentation – "Which pixels belong to the object?" (Example: color every pixel that is part of a road)

Task Comparison Diagram

Original Image: [A dog on grass]

Classification → Label: "Dog"

Detection      → Label + Box:
                 ┌───────────┐
                 │   Dog     │
                 └───────────┘

Segmentation   → Pixel mask:
                 ░░░░░░░░░░░░   ← grass pixels
                 ████████████   ← dog pixels

Classical vs. Modern Computer Vision

Early computer vision (before 2010) relied on hand-crafted rules. Engineers manually wrote code to detect edges, find corners, and match shapes. Modern computer vision uses deep learning — where a neural network learns these rules automatically by studying thousands of example images.

Before vs. After Deep Learning

Aspect	Classical (Before 2010)	Modern (Deep Learning)
Who writes the rules?	Human engineers	The algorithm learns them
Accuracy	Moderate	Very high
Data needed	Small	Large
Flexibility	Limited	High

Key Takeaways

Computer vision teaches machines to understand images and videos.
Images are grids of numbers called pixels.
The three main tasks are classification, detection, and segmentation.
Modern computer vision uses deep learning to learn from data automatically.

Back to course

Next lesson