Deep Learning Recurrent Neural Networks

Standard neural networks process each input independently — they have no memory of what came before. Recurrent Neural Networks (RNNs) break this limitation. They process sequences — data where order and context matter — by carrying information forward from one step to the next.

Why Standard Networks Fail at Sequences

Consider the sentence: "The bank was steep, so I slid down it."

The word "bank" has multiple meanings. You understand it refers to a riverbank — not a financial institution — because of the words around it: "steep," "slid," and "down." A standard network processes each word in isolation and misses this relationship entirely.

What RNNs Process

  • Text — sentences, documents, chat messages
  • Time Series — stock prices, weather data, sensor readings
  • Audio — speech, music notes
  • Video — frames in order

The Core Idea: A Hidden State

An RNN maintains a hidden state — a memory vector that updates at every time step. The hidden state captures what the network has seen so far and passes it forward into the next step.

Unrolled RNN Diagram

 Time →    Step 1      Step 2      Step 3      Step 4
           "The"       "cat"       "sat"       "down"

Input:      x₁    →    x₂    →    x₃    →    x₄
             ↓          ↓          ↓          ↓
Hidden:   [h₀]──→ [h₁]──→ [h₂]──→ [h₃]──→ [h₄]
                    ↓          ↓          ↓
Output:            y₁         y₂         y₃

h = hidden state (memory passed forward at each step)

Each hidden state h combines two things: the current word and the previous hidden state. The result is a running summary of the entire sequence seen so far.

How the Hidden State Updates

At each time step t:

h_t = tanh(W_h × h_{t-1}  +  W_x × x_t)
        ↑ previous memory      ↑ current input

Where:
  h_{t-1} = previous hidden state
  x_t     = current input
  W_h     = weight matrix for the hidden connection
  W_x     = weight matrix for the input connection
  tanh    = activation function (keeps values between -1 and 1)

Types of RNN Tasks

Input and Output Configurations

One-to-One:
  Input: single → Output: single
  Example: Standard classification (not really RNN territory)

One-to-Many:
  Input: single → Output: sequence
  Example: Image captioning (one photo → sentence of words)

Many-to-One:
  Input: sequence → Output: single
  Example: Sentiment analysis ("I loved this movie" → Positive)

Many-to-Many (same length):
  Input: sequence → Output: sequence (step-by-step)
  Example: Video labeling (each frame labeled)

Many-to-Many (different length):
  Input: sequence → Output: different-length sequence
  Example: Language translation (English sentence → French sentence)

Training an RNN: Backpropagation Through Time

Training an RNN uses a variant of backpropagation called Backpropagation Through Time (BPTT). Gradients flow not just backward through layers but also backward through time steps.

Forward: x₁ → h₁ → x₂ → h₂ → x₃ → h₃ → Loss

Backward:
  Gradient flows from Loss → h₃ → h₂ → h₁ → x₁
  (through all time steps — hence "through time")

The vanishing gradient problem hits RNNs hard. Long sequences cause gradients to shrink nearly to zero before reaching the early time steps. The model stops learning from distant context. This is why LSTMs — covered in the next topic — were invented.

A Practical Example: Sentiment Analysis

Input sentence:  "The food was terrible but the service was amazing"

Step-by-step hidden state evolution:

Step 1: "The"      → h₁ = neutral
Step 2: "food"     → h₂ = food context
Step 3: "was"      → h₃ = past tense signal
Step 4: "terrible" → h₄ = strong negative
Step 5: "but"      → h₅ = contrast incoming
Step 6: "the"      → h₆ = continuing
Step 7: "service"  → h₇ = new topic
Step 8: "was"      → h₈ = past tense again
Step 9: "amazing"  → h₉ = strong positive

Final output from h₉ → Classification: "Mixed Sentiment"

RNN Limitations

LimitationExplanationSolution
Short-term memoryStruggles to remember events from many steps backLSTM / GRU networks
Vanishing gradientsEarly steps receive no learning signalLSTM / GRU networks
Sequential processingCannot process steps in parallel — slow on long sequencesTransformers
Slow trainingEach step depends on the previous — hard to parallelizeTransformers

Real-World Applications of RNNs

  • Autocomplete — predicts the next word as you type
  • Speech Recognition — converts audio frames to text one step at a time
  • Music Generation — generates the next note based on previous notes
  • Stock Price Prediction — processes price sequences to predict future movement
  • Anomaly Detection — monitors IoT sensors to flag unusual patterns in time-series data

Key Terms

  • Recurrent Neural Network (RNN) — a network that processes sequences by passing memory forward at each step
  • Hidden State — the memory vector that carries information from previous steps
  • Time Step — one element in a sequence (one word, one frame, one data point)
  • BPTT — Backpropagation Through Time — the training algorithm for RNNs
  • Vanishing Gradient — gradients shrinking to near-zero over long sequences

Leave a Comment

Your email address will not be published. Required fields are marked *