Machine Learning Overfitting and Underfitting
Overfitting and underfitting are the two failure modes of Machine Learning models. Understanding them is essential for building models that perform reliably on data they have never seen before.
Underfitting
Underfitting happens when a model is too simple for the complexity of the data.
It misses real patterns and performs poorly on both training and test data.
Example: Predicting house price using only ONE feature (number of rooms)
when the actual price depends on location, size, age, amenities, etc.
Training accuracy: 58%
Test accuracy: 55%
→ Both are low. The model learned nothing useful.
Visual (fit to wavy data):
Data: ● ● ● ● ● ●
● ●
● ●
Underfit line: ─────────────────────── (flat, misses the wave)
Overfitting
Overfitting happens when a model is too complex and memorizes the training data, including its noise and random quirks. It performs very well on training data but fails on new data. Example: A Decision Tree with no depth limit trained on 50 records. It creates 50 leaf nodes — one for each training record. Perfect on training. Useless on test data. Training accuracy: 100% Test accuracy: 63% → Large gap. Model memorized instead of learning. Visual: Actual data points: ● (each is a real measurement) Overfit curve: ∿∿∿∿∿ (follows every tiny wiggle including noise)
The Ideal Fit
A well-fitting model captures the real underlying pattern
without chasing noise.
Training accuracy: 88%
Test accuracy: 85%
→ Small gap. Model generalizes well.
Visual:
Data: ● ● ● ● ●
● ● ●
Good fit: ────────smooth curve────────
The curve follows the overall trend, not every bump.
Causes of Overfitting
┌─────────────────────────────┬─────────────────────────────────────┐ │ Cause │ Example │ ├─────────────────────────────┼─────────────────────────────────────┤ │ Too many features │ 500 features, only 200 records │ │ Model too complex │ Decision Tree with no depth limit │ │ Too little training data │ 50 records for a complex problem │ │ Training too long │ Neural network trained for 1000 │ │ │ epochs on a small dataset │ │ No regularization applied │ Coefficients grow unrestricted │ └─────────────────────────────┴─────────────────────────────────────┘
How to Fix Overfitting
Fix 1: Get more training data More diverse examples → less memorization Fix 2: Reduce model complexity Decision Tree: set max_depth Neural Network: use fewer layers/neurons Fix 3: Apply regularization (L1 / L2 — next topic) Fix 4: Feature selection Remove irrelevant features that only add noise Fix 5: Cross validation Reliable estimate of true performance prevents overconfidence Fix 6: Dropout (Neural Networks) Randomly disable neurons during training to prevent co-dependence
