Machine Learning K-Means Clustering

K-Means is an unsupervised learning algorithm that groups unlabeled data into K distinct clusters based on similarity. Data points that are close together end up in the same cluster. The algorithm automatically discovers these natural groupings without any labels or guidance.

When is K-Means Used?

K-Means answers questions like:
  - Which customers have similar buying behavior?
  - Can these news articles be grouped by topic?
  - Which cities have similar crime patterns?
  - Are there natural groups in this medical dataset?

No labels needed. The algorithm finds structure on its own.

How K-Means Works

Step 1: Choose the number of clusters K (e.g., K=3)

Step 2: Randomly place K centroids in the data space
  (A centroid is the center point of a cluster)

Step 3: Assign every data point to its nearest centroid
  (Use Euclidean distance)

Step 4: Move each centroid to the average position of all
  points currently assigned to it

Step 5: Repeat Steps 3–4 until centroids stop moving
  (convergence = assignments no longer change)

Step 6: Final clusters are the result

Visual Walkthrough

Initial random centroids (★):

  ○ ○  ★        ●  ★  ●
  ○ ○            ●  ●
         ★
         △ △ △
         △ △

Iteration 1 — Assign each point to nearest centroid:
  ○ = cluster 1 (near top-left centroid)
  ● = cluster 2 (near top-right centroid)
  △ = cluster 3 (near bottom centroid)

Move centroids to averages of their clusters:
  ★₁ moves to center of ○ group
  ★₂ moves to center of ● group
  ★₃ moves to center of △ group

Repeat until stable → Final clusters:
  Cluster 1: ○ ○ ○ ○ ○ (all circles)
  Cluster 2: ● ● ● ● ● (all filled circles)
  Cluster 3: △ △ △ △ △ (all triangles)

Choosing the Right K: Elbow Method

Problem: K-Means requires K to be specified upfront.
How do we know the right number of clusters?

Elbow Method:
  Run K-Means for K = 1, 2, 3, 4, 5, 6...
  For each K, calculate the Within-Cluster Sum of Squares (WCSS)
  WCSS = total distance of all points from their centroid

  Plot K vs WCSS:

WCSS
  │
  │●
  │  ●
  │     ●
  │        ● ← Elbow point (K=4)
  │           ●  ●  ●
  └──────────────────────────►
  1   2   3   4   5   6   K

The "elbow" is where adding more clusters gives
diminishing returns. That K is the best choice.
In this example, K=4 is optimal.

Real-World Example: Customer Segmentation

Dataset: 500 customers
Features: Annual Spend, Purchase Frequency

After K-Means (K=4):

Cluster 1 — "Big Spenders" (87 customers):
  Avg Spend: ₹50,000/year  |  Avg Purchases: 25/year
  → High value, loyal customers

Cluster 2 — "Occasional Buyers" (163 customers):
  Avg Spend: ₹8,000/year   |  Avg Purchases: 5/year
  → Infrequent but decent spend

Cluster 3 — "Bargain Hunters" (189 customers):
  Avg Spend: ₹15,000/year  |  Avg Purchases: 30/year
  → Frequent buyers, low per-purchase value

Cluster 4 — "Inactive" (61 customers):
  Avg Spend: ₹1,200/year   |  Avg Purchases: 1/year
  → Nearly dormant — need re-engagement

Business Action:
  Cluster 1 → VIP loyalty program
  Cluster 4 → Win-back email campaign

Limitations of K-Means

┌──────────────────────────┬───────────────────────────────────────┐
│ Limitation               │ Explanation                           │
├──────────────────────────┼───────────────────────────────────────┤
│ K must be chosen upfront │ Requires domain knowledge or Elbow   │
│                          │ Method to determine K                 │
│ Sensitive to initial     │ Different random starts → different   │
│ centroid placement       │ results (use K-Means++ to improve)    │
│ Assumes spherical         │ Cannot find elongated or odd-shaped  │
│ clusters                 │ clusters                              │
│ Sensitive to outliers    │ One extreme outlier distorts centroids│
│ Needs scaled features    │ Large-range features dominate distance│
└──────────────────────────┴───────────────────────────────────────┘

K-Means++ Initialization

Standard K-Means: places initial centroids randomly
→ Risk: two centroids start near each other → poor clusters

K-Means++:
  Step 1: Place first centroid randomly
  Step 2: Each next centroid is placed far from existing ones
          (probability proportional to distance from nearest centroid)
  Result: Better spread initial positions → faster convergence
          and more consistent final clusters

Previous lesson

Back to course

Next lesson