Machine Learning Linear Regression
Linear Regression is one of the simplest and most foundational algorithms in Machine Learning. It predicts a continuous numerical value by finding the best straight line through the data. Despite its simplicity, it is widely used in business forecasting, science, and economics.
What Problem Does Linear Regression Solve?
Linear Regression answers questions like: "Given what we know, what number can we expect?" It belongs to supervised learning and falls under the regression category — the output is a number, not a category.
Examples of Linear Regression Problems: - How much will this house sell for? - How many products will sell next month? - What temperature will it be tomorrow? - How many calories will a person burn based on exercise time?
The Core Idea: Fitting a Line
Linear Regression finds the best straight line that describes the relationship between one or more input features (X) and the target output (Y). Once the line is found, any new input can be plugged in to get a predicted output.
Study Hours vs Exam Score:
Score
100| ●
90| ●
80| ● /
70| ● /
60| ● /
50| ● /
40| ● /
└─────────────────────────────────────
1 2 3 4 5 6 7 8 Study Hours
The line drawn through these points is the Linear Regression model.
The Linear Regression Formula
Simple Linear Regression (one input feature):
Y = m × X + b
Where:
Y = Predicted output (exam score)
X = Input feature (study hours)
m = Slope (how much Y changes when X increases by 1)
b = Intercept (value of Y when X = 0)
Example:
m = 8, b = 30
Prediction for 5 study hours:
Y = 8 × 5 + 30 = 40 + 30 = 70 marks
What Does the Slope Mean?
m = 8 means: For every 1 extra hour of study, the predicted score increases by 8 marks. m = -5 would mean: For every 1 extra hour of screen time, the predicted score drops by 5 marks.
Multiple Linear Regression
When more than one feature influences the output, the formula extends to include all features.
Multiple Linear Regression (many input features):
Y = m1×X1 + m2×X2 + m3×X3 + ... + b
Example — House Price Prediction:
Y = Price
X1 = House Size (sqft)
X2 = Number of Bedrooms
X3 = Distance from City Center (km)
Learned Formula:
Price = 150 × Size + 5000 × Bedrooms - 3000 × Distance + 20000
New House: Size=1500, Bedrooms=3, Distance=10 km
Price = 150×1500 + 5000×3 - 3000×10 + 20000
= 225000 + 15000 - 30000 + 20000
= ₹2,30,000
How the Algorithm Learns: Cost Function and Gradient Descent
The algorithm starts with random values for m and b. It then measures how wrong its predictions are using a cost function, and adjusts m and b repeatedly until the predictions become as accurate as possible.
Cost Function: Mean Squared Error (MSE)
MSE = Average of (Actual - Predicted)² Example: Record 1: Actual=70, Predicted=65 → Error=5 → Squared=25 Record 2: Actual=80, Predicted=84 → Error=-4 → Squared=16 Record 3: Actual=60, Predicted=58 → Error=2 → Squared=4 MSE = (25 + 16 + 4) / 3 = 15 Goal: Minimize MSE → Find m and b that make this number as small as possible. Why squared? Squaring removes negative signs and penalizes large errors much more than small ones.
Gradient Descent
Imagine standing on a hilly surface blindfolded. Goal: Reach the lowest valley (minimum error). Strategy: 1. Feel which direction goes downhill 2. Take a small step in that direction 3. Repeat until no downhill direction exists This is exactly what Gradient Descent does with m and b. Gradient Descent Loop: Start with random m, b ┌──────────────────────────────────────────┐ │ Calculate MSE with current m, b │ │ Calculate gradient (slope of MSE) │ │ Update: m = m - learning_rate × gradient│ │ Update: b = b - learning_rate × gradient│ └───────────────── repeat until MSE stops dropping Final m and b = Best fit line parameters
Assumptions of Linear Regression
Linear Regression works well only when certain conditions are met. Violating these assumptions weakens the model significantly.
┌─────────────────────────────┬────────────────────────────────────┐ │ Assumption │ Meaning │ ├─────────────────────────────┼────────────────────────────────────┤ │ Linearity │ X and Y have a straight-line │ │ │ relationship (not curved) │ │ No multicollinearity │ Input features should not be │ │ │ strongly correlated with each other│ │ Homoscedasticity │ Errors spread evenly at all values │ │ │ of X (no funnel shape in errors) │ │ Independence of errors │ One prediction's error should not │ │ │ influence another's │ │ Normally distributed errors │ Prediction errors should follow │ │ │ a bell-curve distribution │ └─────────────────────────────┴────────────────────────────────────┘
Evaluating Linear Regression: Key Metrics
┌──────────────────────────┬──────────────────────────────────────────┐ │ Metric │ What it Measures │ ├──────────────────────────┼──────────────────────────────────────────┤ │ MAE (Mean Absolute Error)│ Average absolute difference between │ │ │ actual and predicted values │ │ │ Lower is better. Easy to interpret. │ ├──────────────────────────┼──────────────────────────────────────────┤ │ MSE (Mean Squared Error) │ Average squared difference │ │ │ Penalizes large errors more than small │ ├──────────────────────────┼──────────────────────────────────────────┤ │ RMSE (Root MSE) │ Square root of MSE │ │ │ Same unit as the target — easier to read │ ├──────────────────────────┼──────────────────────────────────────────┤ │ R² Score │ How much variance in Y the model explains│ │ │ R²=1.0 → Perfect fit │ │ │ R²=0.0 → No better than guessing mean │ │ │ R²<0.0 → Worse than guessing mean │ └──────────────────────────┴──────────────────────────────────────────┘ Example: House prices (actual range ₹1L – ₹10L) RMSE = ₹25,000 → predictions off by ₹25,000 on average R² = 0.87 → model explains 87% of price variation
When to Use and When Not to Use Linear Regression
Use Linear Regression When: ✓ The target is a continuous number ✓ Relationship between features and target is approximately linear ✓ Interpretability matters (the formula is easy to explain) ✓ Dataset is clean with few outliers Avoid Linear Regression When: ✗ Target is a category (use Logistic Regression instead) ✗ Relationship is clearly curved or non-linear ✗ Many strong outliers exist in the data ✗ Features are highly correlated with each other
Linear Regression Diagram Summary
Input Features (X)
│
▼
Linear Formula: Y = m×X + b
│
▼
Predicted Value (Ŷ)
│
▼
Compare with Actual Value (Y) → Calculate Error (MSE)
│
▼
Gradient Descent → Adjust m and b
│
▼
Repeat until Error is Minimized
│
▼
Final Model → Ready to Predict on New Data
