Deep Learning Recurrent Neural Networks
Standard neural networks process each input independently — they have no memory of what came before. Recurrent Neural Networks (RNNs) break this limitation. They process sequences — data where order and context matter — by carrying information forward from one step to the next.
Why Standard Networks Fail at Sequences
Consider the sentence: "The bank was steep, so I slid down it."
The word "bank" has multiple meanings. You understand it refers to a riverbank — not a financial institution — because of the words around it: "steep," "slid," and "down." A standard network processes each word in isolation and misses this relationship entirely.
What RNNs Process
- Text — sentences, documents, chat messages
- Time Series — stock prices, weather data, sensor readings
- Audio — speech, music notes
- Video — frames in order
The Core Idea: A Hidden State
An RNN maintains a hidden state — a memory vector that updates at every time step. The hidden state captures what the network has seen so far and passes it forward into the next step.
Unrolled RNN Diagram
Time → Step 1 Step 2 Step 3 Step 4
"The" "cat" "sat" "down"
Input: x₁ → x₂ → x₃ → x₄
↓ ↓ ↓ ↓
Hidden: [h₀]──→ [h₁]──→ [h₂]──→ [h₃]──→ [h₄]
↓ ↓ ↓
Output: y₁ y₂ y₃
h = hidden state (memory passed forward at each step)
Each hidden state h combines two things: the current word and the previous hidden state. The result is a running summary of the entire sequence seen so far.
How the Hidden State Updates
At each time step t:
h_t = tanh(W_h × h_{t-1} + W_x × x_t)
↑ previous memory ↑ current input
Where:
h_{t-1} = previous hidden state
x_t = current input
W_h = weight matrix for the hidden connection
W_x = weight matrix for the input connection
tanh = activation function (keeps values between -1 and 1)
Types of RNN Tasks
Input and Output Configurations
One-to-One:
Input: single → Output: single
Example: Standard classification (not really RNN territory)
One-to-Many:
Input: single → Output: sequence
Example: Image captioning (one photo → sentence of words)
Many-to-One:
Input: sequence → Output: single
Example: Sentiment analysis ("I loved this movie" → Positive)
Many-to-Many (same length):
Input: sequence → Output: sequence (step-by-step)
Example: Video labeling (each frame labeled)
Many-to-Many (different length):
Input: sequence → Output: different-length sequence
Example: Language translation (English sentence → French sentence)
Training an RNN: Backpropagation Through Time
Training an RNN uses a variant of backpropagation called Backpropagation Through Time (BPTT). Gradients flow not just backward through layers but also backward through time steps.
Forward: x₁ → h₁ → x₂ → h₂ → x₃ → h₃ → Loss Backward: Gradient flows from Loss → h₃ → h₂ → h₁ → x₁ (through all time steps — hence "through time")
The vanishing gradient problem hits RNNs hard. Long sequences cause gradients to shrink nearly to zero before reaching the early time steps. The model stops learning from distant context. This is why LSTMs — covered in the next topic — were invented.
A Practical Example: Sentiment Analysis
Input sentence: "The food was terrible but the service was amazing" Step-by-step hidden state evolution: Step 1: "The" → h₁ = neutral Step 2: "food" → h₂ = food context Step 3: "was" → h₃ = past tense signal Step 4: "terrible" → h₄ = strong negative Step 5: "but" → h₅ = contrast incoming Step 6: "the" → h₆ = continuing Step 7: "service" → h₇ = new topic Step 8: "was" → h₈ = past tense again Step 9: "amazing" → h₉ = strong positive Final output from h₉ → Classification: "Mixed Sentiment"
RNN Limitations
| Limitation | Explanation | Solution |
|---|---|---|
| Short-term memory | Struggles to remember events from many steps back | LSTM / GRU networks |
| Vanishing gradients | Early steps receive no learning signal | LSTM / GRU networks |
| Sequential processing | Cannot process steps in parallel — slow on long sequences | Transformers |
| Slow training | Each step depends on the previous — hard to parallelize | Transformers |
Real-World Applications of RNNs
- Autocomplete — predicts the next word as you type
- Speech Recognition — converts audio frames to text one step at a time
- Music Generation — generates the next note based on previous notes
- Stock Price Prediction — processes price sequences to predict future movement
- Anomaly Detection — monitors IoT sensors to flag unusual patterns in time-series data
Key Terms
- Recurrent Neural Network (RNN) — a network that processes sequences by passing memory forward at each step
- Hidden State — the memory vector that carries information from previous steps
- Time Step — one element in a sequence (one word, one frame, one data point)
- BPTT — Backpropagation Through Time — the training algorithm for RNNs
- Vanishing Gradient — gradients shrinking to near-zero over long sequences
