Machine Learning Feature Engineering

Feature engineering is the process of using domain knowledge and creativity to create, transform, or select input columns (features) that help a Machine Learning model learn better. A well-engineered dataset often produces better results than a complex algorithm applied to raw data.

What is a Feature?

A feature is any input variable the model uses to make predictions. Raw data often contains features in a form that does not expose useful patterns directly. Feature engineering reshapes these inputs to make patterns more visible to the algorithm.

Raw Feature         → Engineered Feature

Date of Birth       → Age (calculated from today's date)
Order Date + Ship   → Delivery Days (difference between dates)
Date
First Name + Last   → Full Name (combined)
Name
Raw Text Review     → Sentiment Score (positive / negative)
Latitude + Longitude→ Distance from City Center (calculated)

Why Feature Engineering Matters

Algorithms find patterns in numbers. A feature like "date joined: 2019-03-15" means nothing to a model. But "days since joining: 1800" carries real information. The model can now use that number to find patterns about long-term vs short-term customers.

Without Feature Engineering:
  Column: signup_date = "2019-03-15"
  → Model sees a date string. Cannot compute patterns.

With Feature Engineering:
  New Column: days_as_customer = 1800
  → Model can now learn: longer customers buy more often.

Types of Feature Engineering Techniques

1. Feature Creation

Building entirely new columns from existing ones.

Example — E-commerce Dataset:

Existing Columns: total_spent, number_of_orders
New Feature: average_order_value = total_spent / number_of_orders

┌─────────────┬──────────────────┬──────────────────────┐
│ Total Spent │ Number of Orders │ Avg Order Value (NEW)│
├─────────────┼──────────────────┼──────────────────────┤
│ ₹10,000     │ 5                │ ₹2,000               │
│ ₹30,000     │ 3                │ ₹10,000              │
│ ₹6,000      │ 12               │ ₹500                 │
└─────────────┴──────────────────┴──────────────────────┘

Insight: Customer 3 orders frequently but spends very little per order.
Customer 2 makes fewer but high-value purchases.
These are very different customer types — the new column reveals that.

2. Feature Transformation

Changing the shape of an existing column's values to improve its distribution or relationship with the target.

Log Transformation (for right-skewed data):

Income column (very skewed):
  Values: 20000, 22000, 25000, 30000, 2500000

After log transformation:
  log(20000) = 9.90
  log(22000) = 9.99
  log(25000) = 10.13
  log(30000) = 10.31
  log(2500000) = 14.73

The extreme value 2500000 no longer dominates.
The scale is more balanced for the algorithm to learn from.

3. Binning (Discretization)

Converting a continuous numerical column into categories (bins). Useful when the actual number matters less than the range it falls into.

Age column → Age Group:

┌─────┬───────────────┐
│ Age │ Age Group     │
├─────┼───────────────┤
│ 17  │ Teen (0–17)   │
│ 23  │ Young Adult   │
│ 35  │ Adult (26–45) │
│ 55  │ Senior (46+)  │
│ 72  │ Senior (46+)  │
└─────┴───────────────┘

Use case: A bank's risk model might treat age groups
differently rather than every individual age number.

4. Interaction Features

Combining two features to capture their combined effect — something neither feature captures alone.

Example — House Pricing:

Feature 1: total_rooms = 6
Feature 2: total_people = 6

Alone, these say nothing specific.
Together:
  rooms_per_person = total_rooms / total_people = 1.0
  → 1 room per person (crowded)

Another house:
  total_rooms = 8, total_people = 2
  rooms_per_person = 4.0
  → 4 rooms per person (spacious)

The interaction feature reveals comfort level, which
neither original feature expressed on its own.

5. Datetime Feature Extraction

Breaking a date or timestamp into multiple useful components.

Raw Column: transaction_date = "2024-03-15 14:35:00"

Extracted Features:
  year          = 2024
  month         = 3 (March)
  day           = 15
  hour          = 14
  day_of_week   = 4 (Friday)
  is_weekend    = 0 (No)
  quarter       = 1

Use case: A retail model learns that Fridays and
December generate more sales — patterns the raw
datetime string could never reveal directly.

6. Encoding Text Features

Converting text into numbers. The simplest form counts how often specific words appear.

Product Reviews:
  Review 1: "very good product quality"
  Review 2: "terrible quality, very bad"

Word Count Matrix (Bag of Words):
┌──────────┬──────┬──────┬──────────┬──────────┐
│ Review   │ good │ very │ terrible │ quality  │
├──────────┼──────┼──────┼──────────┼──────────┤
│ Review 1 │ 1    │ 1    │ 0        │ 1        │
│ Review 2 │ 0    │ 1    │ 1        │ 1        │
└──────────┴──────┴──────┴──────────┴──────────┘

Now the text is a set of numbers the model can process.

Feature Selection: Keeping Only What Matters

After creating and transforming features, the next task is removing features that do not help the model. Irrelevant or redundant features add noise and slow training without improving accuracy.

Common Feature Selection Methods

┌─────────────────────────┬─────────────────────────────────────────┐
│ Method                  │ How it Works                            │
├─────────────────────────┼─────────────────────────────────────────┤
│ Correlation Filter      │ Remove features with very low           │
│                         │ correlation to the target label         │
│ Variance Threshold      │ Remove features with nearly zero        │
│                         │ variance (almost all same value)        │
│ Feature Importance      │ Tree models rank features by how much   │
│ (from Tree Models)      │ each one helps split data correctly     │
│ Recursive Feature       │ Repeatedly removes weakest features     │
│ Elimination (RFE)       │ until best set remains                  │
└─────────────────────────┴─────────────────────────────────────────┘

Feature Engineering Workflow

Raw Dataset
     │
     ▼
Understand each column (EDA findings)
     │
     ▼
Create new features from existing ones
     │
     ▼
Transform skewed or poorly scaled features
     │
     ▼
Bin, encode, or extract datetime features
     │
     ▼
Evaluate all features (importance / correlation)
     │
     ▼
Drop weak or redundant features
     │
     ▼
Final Feature Set → Ready for Model Training

Common Mistakes in Feature Engineering

┌───────────────────────────────┬────────────────────────────────────┐
│ Mistake                       │ Effect                             │
├───────────────────────────────┼────────────────────────────────────┤
│ Including the target in       │ Model "cheats" — perfect training  │
│ features (data leakage)       │ score, fails on new data           │
│ Creating too many features    │ Curse of dimensionality — model    │
│                               │ gets confused by noise             │
│ Transforming test data        │ Always fit transformations on      │
│ separately from training data │ training data, then apply to test  │
│ Ignoring domain knowledge     │ Missing meaningful feature ideas   │
└───────────────────────────────┴────────────────────────────────────┘

Feature engineering is part science and part craft. Strong feature engineering often matters more than which algorithm gets chosen. A simple algorithm with great features outperforms a complex algorithm with poor ones.

Previous lessons

Back to courses

Next lessons