Deep Learning Recurrent Neural Networks

Standard neural networks process each input independently — they have no memory of what came before. Recurrent Neural Networks (RNNs) break this limitation. They process sequences — data where order and context matter — by carrying information forward from one step to the next.

Why Standard Networks Fail at Sequences

Consider the sentence: "The bank was steep, so I slid down it."

The word "bank" has multiple meanings. You understand it refers to a riverbank — not a financial institution — because of the words around it: "steep," "slid," and "down." A standard network processes each word in isolation and misses this relationship entirely.

What RNNs Process

Text — sentences, documents, chat messages
Time Series — stock prices, weather data, sensor readings
Audio — speech, music notes
Video — frames in order

The Core Idea: A Hidden State

An RNN maintains a hidden state — a memory vector that updates at every time step. The hidden state captures what the network has seen so far and passes it forward into the next step.

Unrolled RNN Diagram

 Time →    Step 1      Step 2      Step 3      Step 4
           "The"       "cat"       "sat"       "down"

Input:      x₁    →    x₂    →    x₃    →    x₄
             ↓          ↓          ↓          ↓
Hidden:   [h₀]──→ [h₁]──→ [h₂]──→ [h₃]──→ [h₄]
                    ↓          ↓          ↓
Output:            y₁         y₂         y₃

h = hidden state (memory passed forward at each step)

Each hidden state h combines two things: the current word and the previous hidden state. The result is a running summary of the entire sequence seen so far.

How the Hidden State Updates

At each time step t:

h_t = tanh(W_h × h_{t-1}  +  W_x × x_t)
        ↑ previous memory      ↑ current input

Where:
  h_{t-1} = previous hidden state
  x_t     = current input
  W_h     = weight matrix for the hidden connection
  W_x     = weight matrix for the input connection
  tanh    = activation function (keeps values between -1 and 1)

Types of RNN Tasks

Input and Output Configurations

One-to-One:
  Input: single → Output: single
  Example: Standard classification (not really RNN territory)

One-to-Many:
  Input: single → Output: sequence
  Example: Image captioning (one photo → sentence of words)

Many-to-One:
  Input: sequence → Output: single
  Example: Sentiment analysis ("I loved this movie" → Positive)

Many-to-Many (same length):
  Input: sequence → Output: sequence (step-by-step)
  Example: Video labeling (each frame labeled)

Many-to-Many (different length):
  Input: sequence → Output: different-length sequence
  Example: Language translation (English sentence → French sentence)

Training an RNN: Backpropagation Through Time

Training an RNN uses a variant of backpropagation called Backpropagation Through Time (BPTT). Gradients flow not just backward through layers but also backward through time steps.

Forward: x₁ → h₁ → x₂ → h₂ → x₃ → h₃ → Loss

Backward:
  Gradient flows from Loss → h₃ → h₂ → h₁ → x₁
  (through all time steps — hence "through time")

The vanishing gradient problem hits RNNs hard. Long sequences cause gradients to shrink nearly to zero before reaching the early time steps. The model stops learning from distant context. This is why LSTMs — covered in the next topic — were invented.

A Practical Example: Sentiment Analysis

Input sentence:  "The food was terrible but the service was amazing"

Step-by-step hidden state evolution:

Step 1: "The"      → h₁ = neutral
Step 2: "food"     → h₂ = food context
Step 3: "was"      → h₃ = past tense signal
Step 4: "terrible" → h₄ = strong negative
Step 5: "but"      → h₅ = contrast incoming
Step 6: "the"      → h₆ = continuing
Step 7: "service"  → h₇ = new topic
Step 8: "was"      → h₈ = past tense again
Step 9: "amazing"  → h₉ = strong positive

Final output from h₉ → Classification: "Mixed Sentiment"

RNN Limitations

Limitation	Explanation	Solution
Short-term memory	Struggles to remember events from many steps back	LSTM / GRU networks
Vanishing gradients	Early steps receive no learning signal	LSTM / GRU networks
Sequential processing	Cannot process steps in parallel — slow on long sequences	Transformers
Slow training	Each step depends on the previous — hard to parallelize	Transformers

Real-World Applications of RNNs

Autocomplete — predicts the next word as you type
Speech Recognition — converts audio frames to text one step at a time
Music Generation — generates the next note based on previous notes
Stock Price Prediction — processes price sequences to predict future movement
Anomaly Detection — monitors IoT sensors to flag unusual patterns in time-series data

Key Terms

Recurrent Neural Network (RNN) — a network that processes sequences by passing memory forward at each step
Hidden State — the memory vector that carries information from previous steps
Time Step — one element in a sequence (one word, one frame, one data point)
BPTT — Backpropagation Through Time — the training algorithm for RNNs
Vanishing Gradient — gradients shrinking to near-zero over long sequences

Previous lesson

Back to course

Next lesson