AWS Elastic Load Balancing and Auto Scaling
Elastic Load Balancing (ELB) and Auto Scaling are two services that work together to deliver high availability and automatic capacity management for AWS applications. ELB distributes incoming traffic across multiple servers. Auto Scaling adjusts the number of servers based on current demand. Together, they ensure applications handle both low and peak traffic efficiently — with no manual intervention.
What Is Elastic Load Balancing?
A load balancer sits in front of multiple EC2 instances and distributes incoming requests among them. If one instance becomes overwhelmed or fails, the load balancer routes traffic to the healthy ones.
Analogy: Imagine 4 checkout counters at a supermarket. A manager (the load balancer) directs each customer to the counter with the shortest queue. If one counter closes, the manager sends all customers to the remaining three counters.
Internet Traffic
|
[Elastic Load Balancer]
/ | \
[EC2: Web1] [EC2: Web2] [EC2: Web3]
| | |
[RDS Database]
Types of AWS Load Balancers
| Type | OSI Layer | Protocol | Best For |
|---|---|---|---|
| Application Load Balancer (ALB) | Layer 7 (HTTP/HTTPS) | HTTP, HTTPS, WebSocket | Web apps, REST APIs, microservices |
| Network Load Balancer (NLB) | Layer 4 (TCP/UDP) | TCP, UDP, TLS | Ultra-low latency, gaming, financial systems |
| Gateway Load Balancer (GWLB) | Layer 3 | IP | Third-party network appliances, firewalls |
| Classic Load Balancer | Layer 4/7 | HTTP, HTTPS, TCP | Legacy applications (not recommended for new projects) |
Application Load Balancer (ALB) — Deep Dive
ALB is the most commonly used load balancer for web applications. It routes traffic based on HTTP request content — URL path, hostname, HTTP headers, and query strings.
Path-Based Routing
ALB can route different URL paths to different groups of servers (Target Groups):
[ALB] |-- /api/* → Target Group: API Servers (EC2) |-- /static/* → Target Group: Static File Servers (EC2) |-- /admin/* → Target Group: Admin App Servers (EC2)
Host-Based Routing
Multiple domains can be handled by one ALB:
[ALB] |-- api.myapp.com → Backend API servers |-- app.myapp.com → Frontend web servers |-- admin.myapp.com → Admin panel servers
Health Checks
ALB continuously checks the health of registered instances by sending periodic requests to a health check endpoint (e.g., /health). If an instance fails the health check, ALB stops sending traffic to it automatically. When the instance recovers, ALB resumes sending traffic.
What Is Auto Scaling?
Auto Scaling automatically adjusts the number of EC2 instances based on actual demand. When traffic increases, new instances are added. When traffic drops, extra instances are removed. This ensures the right amount of capacity at all times — no over-provisioning (wasting money) and no under-provisioning (poor performance).
Auto Scaling Core Components
1. Launch Template
A Launch Template defines the configuration for new EC2 instances that Auto Scaling creates. It includes: AMI, instance type, key pair, security group, IAM role, and User Data script. Every new instance launched by Auto Scaling uses this template.
2. Auto Scaling Group (ASG)
An Auto Scaling Group is the logical container that manages a collection of EC2 instances. Key settings:
- Minimum size: The fewest instances that should ever run (protects against scaling to zero). Example: 2.
- Desired capacity: The target number of instances under normal conditions. Example: 3.
- Maximum size: The most instances that can run (protects against unlimited scaling costs). Example: 10.
Auto Scaling Group +---------------------------------------+ | Min: 2 | Desired: 3 | Max: 10 | | | | [EC2-1] [EC2-2] [EC2-3] | | | | Spread across: AZ-a, AZ-b, AZ-c | +---------------------------------------+
3. Scaling Policies
Scaling policies define when and how the Auto Scaling Group adds or removes instances:
| Policy Type | How It Works | Example |
|---|---|---|
| Target Tracking | Maintain a target metric value. Most recommended. | Keep average CPU at 50% — add/remove instances as needed |
| Step Scaling | Scale by different amounts at different alarm thresholds | CPU 70–80%: add 1. CPU 80–90%: add 2. CPU 90%+: add 3 |
| Simple Scaling | Add or remove a fixed number when an alarm triggers | CPU > 80%: add 1 instance |
| Scheduled Scaling | Scale at defined times | Add 5 instances every Friday at 6 PM (peak time) |
| Predictive Scaling | Use ML to forecast and pre-scale | Automatically anticipates Monday morning traffic |
ELB + Auto Scaling — Combined Architecture
[Users]
|
[Application Load Balancer]
|
[Auto Scaling Group]
| | |
[EC2-1] [EC2-2] [EC2-3]
| | |
[RDS Database]
Scale-Out Event (CPU > 70%):
[EC2-1] [EC2-2] [EC2-3] + [EC2-4] [EC2-5] ← new instances added
ALB automatically routes to new instances
Scale-In Event (CPU < 30%):
[EC2-1] [EC2-2] ← instances terminated
ALB stops routing to terminated instances
Sticky Sessions
By default, ALB distributes each request to any available instance — the same user may hit different servers on consecutive requests. For applications that store session data locally (not in a shared database), sticky sessions ensure a user's requests always go to the same instance.
A better practice is to make applications stateless — store session data in ElastiCache or DynamoDB — so any instance can handle any request. This makes scaling simpler and more reliable.
Cross-Zone Load Balancing
Cross-zone load balancing distributes traffic evenly across all registered instances in all AZs, regardless of which AZ the traffic arrives in. Without cross-zone balancing, if AZ-a has 4 instances and AZ-b has 1 instance, AZ-b's single instance receives 50% of traffic. Cross-zone balancing distributes traffic evenly across all 5 instances.
Real-World Example — Flash Sale on an E-Commerce Site
An e-commerce website normally runs on 3 EC2 instances. During a flash sale, traffic jumps 10x in minutes.
- Auto Scaling detects CPU climbing above 60% (Target Tracking policy).
- New EC2 instances launch automatically using the Launch Template — 3, 5, 8, 10 instances within minutes.
- The ALB immediately starts distributing traffic across all running instances.
- After the sale, traffic drops. Auto Scaling scales back down to 3 instances over 15 minutes (scale-in cooldown prevents immediate termination).
Summary
- Elastic Load Balancing distributes traffic across multiple instances for availability and performance.
- ALB routes HTTP/HTTPS traffic and supports path-based and host-based routing. NLB handles TCP/UDP at ultra-low latency.
- Auto Scaling adjusts EC2 instance count based on real demand — scaling out during high traffic and in during low traffic.
- Launch Templates define instance configuration. Auto Scaling Groups define min, desired, and max instance counts.
- Target Tracking Scaling is the simplest and most effective policy for most use cases.
