Machine Learning Logistic Regression
Logistic Regression is a classification algorithm that predicts which category an input belongs to. Despite having "regression" in its name, it does not predict numbers — it predicts probabilities and converts them into class labels. It is one of the most widely used algorithms for binary (two-class) classification problems.
The Difference from Linear Regression
Linear Regression: Input: Study Hours = 6 Output: Exam Score = 78 (a number) Logistic Regression: Input: Study Hours = 6 Output: Pass or Fail (a category) Linear Regression predicts a continuous value. Logistic Regression predicts a class label.
Why Not Use Linear Regression for Classification?
Linear Regression can predict values below 0 or above 1, which makes no sense for probabilities. A probability must always stay between 0 and 1. Logistic Regression uses a special function to guarantee this constraint.
Linear Regression output for classification: Probability of spam = 1.8 ← impossible Probability of spam = -0.3 ← impossible Logistic Regression fixes this: Probability always between 0.0 and 1.0 ✓
The Sigmoid Function
Logistic Regression uses the Sigmoid function to convert any number — positive, negative, or very large — into a value between 0 and 1. This output is interpreted as a probability.
Sigmoid Function:
σ(z) = 1 / (1 + e^(-z))
Where z = m1×X1 + m2×X2 + ... + b (same as linear formula)
Visual shape of Sigmoid:
Probability
1.0 | ──────────────
0.9 | ───/
0.8 | ──/
0.7 | ─/
0.5 | ─/ ← Decision boundary
0.3 | ─/
0.2 | ─/
0.1 | ────────/
0.0 |──────────────────────────────────────
───────────────────────────────────────►
Negative z Positive z
Key observations:
When z is very positive → output approaches 1.0
When z is very negative → output approaches 0.0
When z = 0 → output = 0.5 (decision boundary)
Making a Prediction: Threshold
The Sigmoid function gives a probability. A threshold (usually 0.5) converts this probability into a class label.
Output Probability = 0.82 → 0.82 > 0.5 → Predict Class 1 (Spam / Yes / Pass) Output Probability = 0.31 → 0.31 < 0.5 → Predict Class 0 (Not Spam / No / Fail) The threshold of 0.5 can be changed depending on the problem: Medical diagnosis: lower threshold (catch more true positives) Marketing email: higher threshold (target only confident leads)
Complete Example: Loan Approval Prediction
Features:
X1 = Credit Score
X2 = Annual Income (in Lakhs)
X3 = Existing Debt (in Lakhs)
Learned Formula:
z = 0.005×CreditScore + 0.3×Income - 0.4×Debt - 2.0
Applicant:
Credit Score = 750
Income = 8L
Debt = 1L
z = 0.005×750 + 0.3×8 - 0.4×1 - 2.0
= 3.75 + 2.4 - 0.4 - 2.0
= 3.75
Sigmoid(3.75) = 1 / (1 + e^(-3.75)) = 0.977
Probability of Approval = 97.7% → Predict: APPROVED ✓
Binary vs Multi-Class Logistic Regression
Binary Classification (2 classes):
Output: Spam or Not Spam
Output: Fraud or Not Fraud
Output: Pass or Fail
Multi-Class Classification (3+ classes):
Output: Dog, Cat, or Rabbit
Output: Grade A, B, C, D, or F
For multi-class problems, Logistic Regression uses two strategies:
1. One-vs-Rest (OvR):
Train one classifier per class.
Each classifier answers: "Is this class X or not?"
Final prediction = class with highest probability.
2. Softmax (Multinomial Logistic Regression):
Outputs a probability for EVERY class simultaneously.
All probabilities sum to exactly 1.0.
Example: Image classification
Cat: 42%
Dog: 51%
Rabbit: 7%
Prediction → Dog ✓
How Logistic Regression Learns
Logistic Regression minimizes a cost function called Log Loss (also called Binary Cross Entropy). Unlike MSE, Log Loss applies heavy penalties when the model is confidently wrong.
Log Loss Behavior:
If Actual = 1 (Spam):
Predicted probability = 0.95 → Small loss (correct and confident)
Predicted probability = 0.50 → Medium loss (uncertain)
Predicted probability = 0.05 → Very large loss (confidently wrong)
If Actual = 0 (Not Spam):
Predicted probability = 0.05 → Small loss (correct and confident)
Predicted probability = 0.95 → Very large loss (confidently wrong)
The model learns by minimizing total Log Loss across all records.
Evaluation Metrics for Logistic Regression
Confusion Matrix (for binary classification):
Predicted: Yes Predicted: No
Actual: Yes │ True Positive (TP) │ False Negative (FN) │
Actual: No │ False Positive (FP) │ True Negative (TN) │
Metrics derived from Confusion Matrix:
Accuracy = (TP + TN) / Total
→ Overall correctness
Precision = TP / (TP + FP)
→ Of all "Yes" predictions, how many were right?
Recall = TP / (TP + FN)
→ Of all actual "Yes" cases, how many did we catch?
F1 Score = 2 × (Precision × Recall) / (Precision + Recall)
→ Balance between Precision and Recall
Example (Spam Filter, 100 emails):
TP=45 (spam caught) FN=5 (spam missed)
FP=3 (ham flagged) TN=47 (ham passed)
Accuracy = (45+47)/100 = 92%
Precision = 45/(45+3) = 93.75%
Recall = 45/(45+5) = 90%
F1 = 2×(0.9375×0.90)/(0.9375+0.90) = 91.8%
Advantages and Limitations
Advantages: ✓ Simple and fast to train ✓ Outputs probabilities (not just class labels) ✓ Easy to interpret — each feature has a clear coefficient ✓ Works well when classes are linearly separable ✓ Good baseline model before trying complex algorithms Limitations: ✗ Assumes a linear decision boundary ✗ Struggles with complex, non-linear relationships ✗ Sensitive to outliers in the feature space ✗ Requires feature scaling for best performance ✗ Needs independent features (multicollinearity hurts it)
Logistic Regression Flow Diagram
Input Features (X1, X2, X3...)
│
▼
Linear Combination: z = m1×X1 + m2×X2 + b
│
▼
Sigmoid Function → Probability between 0 and 1
│
▼
Apply Threshold (default 0.5)
│
├── Probability ≥ 0.5 → Class 1 (Positive)
│
└── Probability < 0.5 → Class 0 (Negative)
