Machine Learning K-Means Clustering
K-Means is an unsupervised learning algorithm that groups unlabeled data into K distinct clusters based on similarity. Data points that are close together end up in the same cluster. The algorithm automatically discovers these natural groupings without any labels or guidance.
When is K-Means Used?
K-Means answers questions like: - Which customers have similar buying behavior? - Can these news articles be grouped by topic? - Which cities have similar crime patterns? - Are there natural groups in this medical dataset? No labels needed. The algorithm finds structure on its own.
How K-Means Works
Step 1: Choose the number of clusters K (e.g., K=3) Step 2: Randomly place K centroids in the data space (A centroid is the center point of a cluster) Step 3: Assign every data point to its nearest centroid (Use Euclidean distance) Step 4: Move each centroid to the average position of all points currently assigned to it Step 5: Repeat Steps 3–4 until centroids stop moving (convergence = assignments no longer change) Step 6: Final clusters are the result
Visual Walkthrough
Initial random centroids (★):
○ ○ ★ ● ★ ●
○ ○ ● ●
★
△ △ △
△ △
Iteration 1 — Assign each point to nearest centroid:
○ = cluster 1 (near top-left centroid)
● = cluster 2 (near top-right centroid)
△ = cluster 3 (near bottom centroid)
Move centroids to averages of their clusters:
★₁ moves to center of ○ group
★₂ moves to center of ● group
★₃ moves to center of △ group
Repeat until stable → Final clusters:
Cluster 1: ○ ○ ○ ○ ○ (all circles)
Cluster 2: ● ● ● ● ● (all filled circles)
Cluster 3: △ △ △ △ △ (all triangles)
Choosing the Right K: Elbow Method
Problem: K-Means requires K to be specified upfront. How do we know the right number of clusters? Elbow Method: Run K-Means for K = 1, 2, 3, 4, 5, 6... For each K, calculate the Within-Cluster Sum of Squares (WCSS) WCSS = total distance of all points from their centroid Plot K vs WCSS: WCSS │ │● │ ● │ ● │ ● ← Elbow point (K=4) │ ● ● ● └──────────────────────────► 1 2 3 4 5 6 K The "elbow" is where adding more clusters gives diminishing returns. That K is the best choice. In this example, K=4 is optimal.
Real-World Example: Customer Segmentation
Dataset: 500 customers Features: Annual Spend, Purchase Frequency After K-Means (K=4): Cluster 1 — "Big Spenders" (87 customers): Avg Spend: ₹50,000/year | Avg Purchases: 25/year → High value, loyal customers Cluster 2 — "Occasional Buyers" (163 customers): Avg Spend: ₹8,000/year | Avg Purchases: 5/year → Infrequent but decent spend Cluster 3 — "Bargain Hunters" (189 customers): Avg Spend: ₹15,000/year | Avg Purchases: 30/year → Frequent buyers, low per-purchase value Cluster 4 — "Inactive" (61 customers): Avg Spend: ₹1,200/year | Avg Purchases: 1/year → Nearly dormant — need re-engagement Business Action: Cluster 1 → VIP loyalty program Cluster 4 → Win-back email campaign
Limitations of K-Means
┌──────────────────────────┬───────────────────────────────────────┐ │ Limitation │ Explanation │ ├──────────────────────────┼───────────────────────────────────────┤ │ K must be chosen upfront │ Requires domain knowledge or Elbow │ │ │ Method to determine K │ │ Sensitive to initial │ Different random starts → different │ │ centroid placement │ results (use K-Means++ to improve) │ │ Assumes spherical │ Cannot find elongated or odd-shaped │ │ clusters │ clusters │ │ Sensitive to outliers │ One extreme outlier distorts centroids│ │ Needs scaled features │ Large-range features dominate distance│ └──────────────────────────┴───────────────────────────────────────┘
K-Means++ Initialization
Standard K-Means: places initial centroids randomly
→ Risk: two centroids start near each other → poor clusters
K-Means++:
Step 1: Place first centroid randomly
Step 2: Each next centroid is placed far from existing ones
(probability proportional to distance from nearest centroid)
Result: Better spread initial positions → faster convergence
and more consistent final clusters
