CV How Images Are Stored

Before a computer can understand an image, it needs to store the image in a way that math can work on it. Understanding how images are stored is the first hands-on step in computer vision.

The Building Block: A Pixel

A pixel (short for "picture element") is the smallest unit of a digital image. Think of a pixel like a single tile in a mosaic. Each tile has one color. Thousands of tiles together form the full picture.

Zooming Into an Image

Normal view:         Zoomed in 8×:
[Photo of a bird]   [Colored squares visible]

  ██ ██ ██
  ██ ██ ██     ← Each square = 1 pixel
  ██ ██ ██

A typical smartphone photo has 12 million pixels (12 megapixels). Each pixel stores color information as one or more numbers.

Grayscale Images

A grayscale image stores one number per pixel. That number ranges from 0 (pure black) to 255 (pure white). Every shade of gray falls between those two extremes.

Grayscale Value Scale

0        64       128      192      255
█████   ▓▓▓▓▓   ▒▒▒▒▒   ░░░░░   · · · ·
Black   Dark    Medium   Light    White

Example: A 3×3 Grayscale Image

Pixel values:          What you see:
+-----+-----+-----+   +-------+-------+-------+
|  0  | 128 | 255 |   | Black |  Gray | White |
+-----+-----+-----+   +-------+-------+-------+
|  64 | 128 | 192 |   | Dark  |  Gray | Light |
+-----+-----+-----+   +-------+-------+-------+
| 128 | 192 | 255 |   | Gray  | Light | White |
+-----+-----+-----+   +-------+-------+-------+

Grayscale images take less memory than color images because they store only one number per pixel instead of three.

Color Images: The RGB Model

Color images store three numbers per pixel — one for Red, one for Green, and one for Blue. Mixing these three values creates any visible color. This is the RGB color model.

How RGB Colors Mix

Red   (255, 0,   0)   → Pure red
Green (0,   255, 0)   → Pure green
Blue  (0,   0,   255) → Pure blue

Red + Green = (255, 255, 0)  → Yellow
Red + Blue  = (255, 0, 255)  → Magenta
Green + Blue= (0, 255, 255)  → Cyan
All three   = (255, 255, 255)→ White
None        = (0,   0,   0)  → Black

RGB Image as Three Layers

COLOR IMAGE
    ↓
┌──────────┐  ┌──────────┐  ┌──────────┐
│ Red      │  │ Green    │  │ Blue     │
│ Channel  │  │ Channel  │  │ Channel  │
│ (values  │  │ (values  │  │ (values  │
│ 0–255)   │  │ 0–255)   │  │ 0–255)   │
└──────────┘  └──────────┘  └──────────┘
        Combine all 3 → Full color image

In code, a color image is stored as a 3D array: Height × Width × 3. The "3" represents the three color channels.

Image Dimensions and Shape

Every image has a specific size described by its height and width in pixels, plus the number of channels.

Image Shape Examples

Image Type	Size	Shape in Code	Total Values Stored
Grayscale small	100 × 100	(100, 100)	10,000
Color small	100 × 100	(100, 100, 3)	30,000
Color HD	1920 × 1080	(1080, 1920, 3)	6,220,800
Color 4K	3840 × 2160	(2160, 3840, 3)	24,883,200

Common Image File Formats

Different file formats store image data differently. Each format trades off quality, file size, and compatibility.

Format Comparison

Format	Compression	Quality Loss?	Best For
PNG	Lossless	No	Screenshots, diagrams
JPEG	Lossy	Yes	Photos, web images
BMP	None	No	Raw storage, Windows
TIFF	Both options	Optional	Medical, professional imaging
WebP	Both options	Optional	Modern web images

Lossless vs. Lossy — A Layman Analogy

ORIGINAL SENTENCE:
"The quick brown fox jumps over the lazy dog"

LOSSLESS (PNG):
Stores every word exactly.
"The quick brown fox jumps over the lazy dog"

LOSSY (JPEG):
Removes details the eye barely notices.
"The quick fox jumps over the lazy dog"
(Shorter, slightly different, but readable)

JPEG files are smaller but lose some fine detail each time you save them. PNG files stay identical to the original no matter how many times you save.

What Happens When You Read an Image in Code

When your computer vision program reads an image, it converts the file into a numerical array. Every algorithm then reads from and writes to this array of numbers — not the image file directly.

Image Reading Pipeline

cat.jpg (file on disk)
    ↓  [read by library like OpenCV]
NumPy Array:
  Shape: (480, 640, 3)
  Values: [[[255, 128, 64], [250, 130, 60], ...],
            [[240, 125, 62], ...],
            ...]
    ↓  [your algorithm processes numbers]
Result: "Cat detected at position (120, 200)"

Key Takeaways

Images are grids of pixels — each pixel is one or more numbers.
Grayscale uses 1 value per pixel (0–255). Color uses 3 values (R, G, B).
Image shape in code is (Height, Width, Channels).
PNG is lossless (no quality loss). JPEG is lossy (smaller file, some detail lost).
Computer vision algorithms work on number arrays, not image files.

Previous lessons

Back to courses

Next lessons