CV How Images Are Stored
Before a computer can understand an image, it needs to store the image in a way that math can work on it. Understanding how images are stored is the first hands-on step in computer vision.
The Building Block: A Pixel
A pixel (short for "picture element") is the smallest unit of a digital image. Think of a pixel like a single tile in a mosaic. Each tile has one color. Thousands of tiles together form the full picture.
Zooming Into an Image
Normal view: Zoomed in 8×: [Photo of a bird] [Colored squares visible] ██ ██ ██ ██ ██ ██ ← Each square = 1 pixel ██ ██ ██
A typical smartphone photo has 12 million pixels (12 megapixels). Each pixel stores color information as one or more numbers.
Grayscale Images
A grayscale image stores one number per pixel. That number ranges from 0 (pure black) to 255 (pure white). Every shade of gray falls between those two extremes.
Grayscale Value Scale
0 64 128 192 255 █████ ▓▓▓▓▓ ▒▒▒▒▒ ░░░░░ · · · · Black Dark Medium Light White
Example: A 3×3 Grayscale Image
Pixel values: What you see: +-----+-----+-----+ +-------+-------+-------+ | 0 | 128 | 255 | | Black | Gray | White | +-----+-----+-----+ +-------+-------+-------+ | 64 | 128 | 192 | | Dark | Gray | Light | +-----+-----+-----+ +-------+-------+-------+ | 128 | 192 | 255 | | Gray | Light | White | +-----+-----+-----+ +-------+-------+-------+
Grayscale images take less memory than color images because they store only one number per pixel instead of three.
Color Images: The RGB Model
Color images store three numbers per pixel — one for Red, one for Green, and one for Blue. Mixing these three values creates any visible color. This is the RGB color model.
How RGB Colors Mix
Red (255, 0, 0) → Pure red Green (0, 255, 0) → Pure green Blue (0, 0, 255) → Pure blue Red + Green = (255, 255, 0) → Yellow Red + Blue = (255, 0, 255) → Magenta Green + Blue= (0, 255, 255) → Cyan All three = (255, 255, 255)→ White None = (0, 0, 0) → Black
RGB Image as Three Layers
COLOR IMAGE
↓
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Red │ │ Green │ │ Blue │
│ Channel │ │ Channel │ │ Channel │
│ (values │ │ (values │ │ (values │
│ 0–255) │ │ 0–255) │ │ 0–255) │
└──────────┘ └──────────┘ └──────────┘
Combine all 3 → Full color image
In code, a color image is stored as a 3D array: Height × Width × 3. The "3" represents the three color channels.
Image Dimensions and Shape
Every image has a specific size described by its height and width in pixels, plus the number of channels.
Image Shape Examples
| Image Type | Size | Shape in Code | Total Values Stored |
|---|---|---|---|
| Grayscale small | 100 × 100 | (100, 100) | 10,000 |
| Color small | 100 × 100 | (100, 100, 3) | 30,000 |
| Color HD | 1920 × 1080 | (1080, 1920, 3) | 6,220,800 |
| Color 4K | 3840 × 2160 | (2160, 3840, 3) | 24,883,200 |
Common Image File Formats
Different file formats store image data differently. Each format trades off quality, file size, and compatibility.
Format Comparison
| Format | Compression | Quality Loss? | Best For |
|---|---|---|---|
| PNG | Lossless | No | Screenshots, diagrams |
| JPEG | Lossy | Yes | Photos, web images |
| BMP | None | No | Raw storage, Windows |
| TIFF | Both options | Optional | Medical, professional imaging |
| WebP | Both options | Optional | Modern web images |
Lossless vs. Lossy — A Layman Analogy
ORIGINAL SENTENCE: "The quick brown fox jumps over the lazy dog" LOSSLESS (PNG): Stores every word exactly. "The quick brown fox jumps over the lazy dog" LOSSY (JPEG): Removes details the eye barely notices. "The quick fox jumps over the lazy dog" (Shorter, slightly different, but readable)
JPEG files are smaller but lose some fine detail each time you save them. PNG files stay identical to the original no matter how many times you save.
What Happens When You Read an Image in Code
When your computer vision program reads an image, it converts the file into a numerical array. Every algorithm then reads from and writes to this array of numbers — not the image file directly.
Image Reading Pipeline
cat.jpg (file on disk)
↓ [read by library like OpenCV]
NumPy Array:
Shape: (480, 640, 3)
Values: [[[255, 128, 64], [250, 130, 60], ...],
[[240, 125, 62], ...],
...]
↓ [your algorithm processes numbers]
Result: "Cat detected at position (120, 200)"
Key Takeaways
- Images are grids of pixels — each pixel is one or more numbers.
- Grayscale uses 1 value per pixel (0–255). Color uses 3 values (R, G, B).
- Image shape in code is (Height, Width, Channels).
- PNG is lossless (no quality loss). JPEG is lossy (smaller file, some detail lost).
- Computer vision algorithms work on number arrays, not image files.
