Computer Vision Optical Flow and Motion

Optical flow describes the apparent motion of pixels between consecutive video frames. It tells you how each pixel moved from one frame to the next — enabling video understanding, object tracking, action recognition, and video stabilization.

What Is Optical Flow?

When a camera records video, each frame is a still image. Between two consecutive frames, objects move. Optical flow computes a motion vector at each pixel — an arrow pointing in the direction the pixel moved and showing how far it traveled.

Optical Flow Visualization

FRAME 1:                    FRAME 2 (33ms later):
  ┌────────────────────┐      ┌────────────────────┐
  │                    │      │                    │
  │    ○               │      │        ○           │ ← Ball moved right
  │                    │      │                    │
  │  ████              │      │    ████            │ ← Person moved right
  └────────────────────┘      └────────────────────┘

OPTICAL FLOW (motion vectors):
  ┌────────────────────┐
  │                    │
  │    ○ →→→→→→→       │ ← Ball: large rightward vector
  │                    │
  │  ████ →→→          │ ← Person: smaller rightward vector
  └────────────────────┘

Each arrow = how much that pixel moved (direction + magnitude).
Background pixels = zero vector (they did not move).

The Optical Flow Assumption

Optical flow relies on the brightness constancy assumption: the brightness of a pixel does not change as it moves from frame to frame. This is reasonable for small movements between close frames (30+ fps video) but breaks down with large motion, sudden lighting changes, or occlusion.

Brightness Constancy Equation

Let I(x, y, t) = brightness at pixel (x, y) at time t.

Brightness constancy:
  I(x, y, t) = I(x + u, y + v, t + 1)

Where (u, v) = the motion vector (how far the pixel moved).

Expand with Taylor series and simplify:
  Ix·u + Iy·v + It = 0

Where:
  Ix = image gradient in x direction (horizontal edge)
  Iy = image gradient in y direction (vertical edge)
  It = change in brightness over time

Two unknowns (u, v) but only one equation → need more constraints.

Lucas-Kanade: Sparse Optical Flow

Lucas-Kanade (1981) solves the underdetermined system by assuming all pixels in a small neighborhood have the same motion. This produces a motion vector per selected point — sparse optical flow. It works well for tracking a limited set of feature points (corners) through a video.

Lucas-Kanade Window Assumption

For a 3×3 neighborhood around pixel P, assume all 9 pixels move equally:

  Each pixel i gives one equation: Ix_i·u + Iy_i·v + It_i = 0
  9 pixels → 9 equations, 2 unknowns (u, v) → over-determined.

Solve using least squares:
  [Ix₁  Iy₁]         [−It₁]
  [Ix₂  Iy₂]  [u]  = [−It₂]
  [...]       [v]    [...]
  [Ix₉  Iy₉]         [−It₉]

Solution = best (u, v) fitting all 9 equations simultaneously.

Limitation: Only works for small motion (< a few pixels).
Fix: Image pyramid (compute flow at low resolution first, then refine).

Lucas-Kanade in Action: Video Tracking

FRAME 1: Detect 100 Shi-Tomasi corners as tracking points.
  ● ●   ● ●     ●  ●    ← Corner positions

FRAME 2: For each corner, run Lucas-Kanade to find (u, v).
  ● →→ ●   ● →→ ●       ← Each point tracked to new position

FRAME 3: Update positions. Track long-term trajectories.
  …continuations…

Application: Sports player tracking, AR marker tracking, video stabilization.

Horn-Schunck: Dense Optical Flow

Horn-Schunck computes a motion vector for every pixel in the image — dense optical flow. It adds a smoothness constraint: neighboring pixels should have similar motion vectors. This produces a smooth flow field but assumes the entire scene moves smoothly, which fails at moving object boundaries.

Dense vs. Sparse Flow

Type	Vectors Computed	Speed	Best Use
Sparse (Lucas-Kanade)	At selected feature points only	Fast	Object tracking, AR
Dense (Horn-Schunck / Farnebäck)	At every pixel	Slow	Video analysis, action recognition

FlowNet and Deep Learning Optical Flow

Traditional optical flow methods are slow for full-resolution dense flow. FlowNet and its successors (PWC-Net, RAFT) use deep learning to predict dense flow in a fraction of the time. RAFT (2020) achieves near-perfect accuracy on standard benchmarks and runs in real time on modern GPUs.

Deep Optical Flow Architecture (RAFT)

INPUTS: Frame 1 and Frame 2

Step 1: Feature extraction CNN
  [Frame 1] → Feature map F1
  [Frame 2] → Feature map F2

Step 2: Build 4D correlation volume
  For each pixel in F1, compute similarity to all pixels in F2.
  → Tells us which pixel in Frame 2 is most similar to each Frame 1 pixel.

Step 3: Recurrent flow estimator (GRU)
  Start with zero flow estimate.
  Look up correlation volume → estimate flow update.
  Repeat 12 times → gradually refines flow estimate.

Output: Dense flow field (u, v) for every pixel.
Accuracy: State-of-the-art on MPI-Sintel and KITTI benchmarks.

Action Recognition Using Optical Flow

Optical flow is a powerful input for video action recognition. The flow field captures motion patterns — running looks different from walking, which looks different from jumping — regardless of the person's appearance or clothing color.

Two-Stream Architecture

VIDEO INPUT (e.g., 10 frames)

STREAM 1 — Spatial (RGB frames):
  [Frame 5 (single RGB frame)]
        ↓
  [CNN]
        ↓
  Spatial features (what is in the scene)

STREAM 2 — Temporal (Optical Flow):
  [Optical flow for frames 1→2, 2→3, ..., 9→10]  (10 flow maps)
        ↓
  [CNN]
        ↓
  Temporal features (how things are moving)

FUSION:
  Spatial features + Temporal features
        ↓
  [Classifier]
        ↓
  Action: "Running" / "Jumping" / "Handshaking" / etc.

Video Stabilization Using Optical Flow

A shaky handheld video has unintended camera motion in every frame. Optical flow estimates that camera motion by analyzing background pixel movement. Stabilization applies the opposite transformation to cancel the shake, producing a smooth video.

Stabilization Pipeline

[Shaky video]
     ↓
Compute optical flow for background pixels (stationary objects).
     ↓
Estimate camera motion path: [tx1, ty1, rz1, tx2, ty2, rz2, ...]
     ↓
Smooth the motion path (e.g., moving average over 30 frames).
     ↓
Apply inverse transform per frame: cancel detected shake.
     ↓
[Smooth, stabilized video]

Real-World Applications

Video surveillance – Detect moving objects against a stationary background.
Sports analytics – Track player and ball movements, compute speed and distance.
Video compression – Encode only changes between frames instead of full frames.
Autonomous driving – Estimate how quickly surrounding objects are approaching.
Medical ultrasound – Track heart wall motion to assess cardiac function.

Key Takeaways

Optical flow describes how pixels move between video frames — direction and distance.
Brightness constancy assumption: a pixel keeps the same brightness as it moves.
Lucas-Kanade computes sparse flow at selected feature points — fast and used for tracking.
Dense flow methods (Horn-Schunck, RAFT) compute a motion vector at every pixel.
Deep learning models like RAFT achieve state-of-the-art flow estimation in real time.
Optical flow enables action recognition, video stabilization, and autonomous driving perception.

Previous lesson

Back to course

Next lesson