Machine Learning Linear Regression

Linear Regression is one of the simplest and most foundational algorithms in Machine Learning. It predicts a continuous numerical value by finding the best straight line through the data. Despite its simplicity, it is widely used in business forecasting, science, and economics.

What Problem Does Linear Regression Solve?

Linear Regression answers questions like: "Given what we know, what number can we expect?" It belongs to supervised learning and falls under the regression category — the output is a number, not a category.

Examples of Linear Regression Problems:
  - How much will this house sell for?
  - How many products will sell next month?
  - What temperature will it be tomorrow?
  - How many calories will a person burn based on exercise time?

The Core Idea: Fitting a Line

Linear Regression finds the best straight line that describes the relationship between one or more input features (X) and the target output (Y). Once the line is found, any new input can be plugged in to get a predicted output.

Study Hours vs Exam Score:

Score
 100|                                        ●
  90|                               ●
  80|                        ●   /
  70|                  ●    /
  60|            ●    /
  50|      ●    /
  40| ●   /
     └─────────────────────────────────────
     1    2    3    4    5    6    7    8    Study Hours

The line drawn through these points is the Linear Regression model.

The Linear Regression Formula

Simple Linear Regression (one input feature):

  Y = m × X + b

  Where:
    Y = Predicted output (exam score)
    X = Input feature (study hours)
    m = Slope (how much Y changes when X increases by 1)
    b = Intercept (value of Y when X = 0)

Example:
  m = 8, b = 30

  Prediction for 5 study hours:
  Y = 8 × 5 + 30 = 40 + 30 = 70 marks

What Does the Slope Mean?

m = 8 means:
  For every 1 extra hour of study,
  the predicted score increases by 8 marks.

m = -5 would mean:
  For every 1 extra hour of screen time,
  the predicted score drops by 5 marks.

Multiple Linear Regression

When more than one feature influences the output, the formula extends to include all features.

Multiple Linear Regression (many input features):

  Y = m1×X1 + m2×X2 + m3×X3 + ... + b

Example — House Price Prediction:
  Y = Price
  X1 = House Size (sqft)
  X2 = Number of Bedrooms
  X3 = Distance from City Center (km)

  Learned Formula:
  Price = 150 × Size + 5000 × Bedrooms - 3000 × Distance + 20000

  New House: Size=1500, Bedrooms=3, Distance=10 km
  Price = 150×1500 + 5000×3 - 3000×10 + 20000
        = 225000 + 15000 - 30000 + 20000
        = ₹2,30,000

How the Algorithm Learns: Cost Function and Gradient Descent

The algorithm starts with random values for m and b. It then measures how wrong its predictions are using a cost function, and adjusts m and b repeatedly until the predictions become as accurate as possible.

Cost Function: Mean Squared Error (MSE)

MSE = Average of (Actual - Predicted)²

Example:
  Record 1: Actual=70, Predicted=65  → Error=5  → Squared=25
  Record 2: Actual=80, Predicted=84  → Error=-4 → Squared=16
  Record 3: Actual=60, Predicted=58  → Error=2  → Squared=4

  MSE = (25 + 16 + 4) / 3 = 15

Goal: Minimize MSE → Find m and b that make this number as small as possible.

Why squared? Squaring removes negative signs and penalizes
large errors much more than small ones.

Gradient Descent

Imagine standing on a hilly surface blindfolded.
Goal: Reach the lowest valley (minimum error).

Strategy:
  1. Feel which direction goes downhill
  2. Take a small step in that direction
  3. Repeat until no downhill direction exists

This is exactly what Gradient Descent does with m and b.

Gradient Descent Loop:
  Start with random m, b
  ┌──────────────────────────────────────────┐
  │  Calculate MSE with current m, b        │
  │  Calculate gradient (slope of MSE)      │
  │  Update: m = m - learning_rate × gradient│
  │  Update: b = b - learning_rate × gradient│
  └───────────────── repeat until MSE stops dropping
  Final m and b = Best fit line parameters

Assumptions of Linear Regression

Linear Regression works well only when certain conditions are met. Violating these assumptions weakens the model significantly.

┌─────────────────────────────┬────────────────────────────────────┐
│ Assumption                  │ Meaning                            │
├─────────────────────────────┼────────────────────────────────────┤
│ Linearity                   │ X and Y have a straight-line       │
│                             │ relationship (not curved)          │
│ No multicollinearity        │ Input features should not be       │
│                             │ strongly correlated with each other│
│ Homoscedasticity            │ Errors spread evenly at all values │
│                             │ of X (no funnel shape in errors)   │
│ Independence of errors      │ One prediction's error should not  │
│                             │ influence another's                │
│ Normally distributed errors │ Prediction errors should follow    │
│                             │ a bell-curve distribution          │
└─────────────────────────────┴────────────────────────────────────┘

Evaluating Linear Regression: Key Metrics

┌──────────────────────────┬──────────────────────────────────────────┐
│ Metric                   │ What it Measures                         │
├──────────────────────────┼──────────────────────────────────────────┤
│ MAE (Mean Absolute Error)│ Average absolute difference between      │
│                          │ actual and predicted values              │
│                          │ Lower is better. Easy to interpret.      │
├──────────────────────────┼──────────────────────────────────────────┤
│ MSE (Mean Squared Error) │ Average squared difference               │
│                          │ Penalizes large errors more than small   │
├──────────────────────────┼──────────────────────────────────────────┤
│ RMSE (Root MSE)          │ Square root of MSE                       │
│                          │ Same unit as the target — easier to read │
├──────────────────────────┼──────────────────────────────────────────┤
│ R² Score                 │ How much variance in Y the model explains│
│                          │ R²=1.0 → Perfect fit                     │
│                          │ R²=0.0 → No better than guessing mean    │
│                          │ R²<0.0 → Worse than guessing mean        │
└──────────────────────────┴──────────────────────────────────────────┘

Example:
  House prices (actual range ₹1L – ₹10L)
  RMSE = ₹25,000 → predictions off by ₹25,000 on average
  R² = 0.87     → model explains 87% of price variation

When to Use and When Not to Use Linear Regression

Use Linear Regression When:
  ✓ The target is a continuous number
  ✓ Relationship between features and target is approximately linear
  ✓ Interpretability matters (the formula is easy to explain)
  ✓ Dataset is clean with few outliers

Avoid Linear Regression When:
  ✗ Target is a category (use Logistic Regression instead)
  ✗ Relationship is clearly curved or non-linear
  ✗ Many strong outliers exist in the data
  ✗ Features are highly correlated with each other

Linear Regression Diagram Summary

Input Features (X)
         │
         ▼
Linear Formula: Y = m×X + b
         │
         ▼
  Predicted Value (Ŷ)
         │
         ▼
Compare with Actual Value (Y) → Calculate Error (MSE)
         │
         ▼
Gradient Descent → Adjust m and b
         │
         ▼
  Repeat until Error is Minimized
         │
         ▼
  Final Model → Ready to Predict on New Data

Previous lessons

Back to courses

Next lessons