AWS Elastic Load Balancing and Auto Scaling

Elastic Load Balancing (ELB) and Auto Scaling are two services that work together to deliver high availability and automatic capacity management for AWS applications. ELB distributes incoming traffic across multiple servers. Auto Scaling adjusts the number of servers based on current demand. Together, they ensure applications handle both low and peak traffic efficiently — with no manual intervention.

What Is Elastic Load Balancing?

A load balancer sits in front of multiple EC2 instances and distributes incoming requests among them. If one instance becomes overwhelmed or fails, the load balancer routes traffic to the healthy ones.

Analogy: Imagine 4 checkout counters at a supermarket. A manager (the load balancer) directs each customer to the counter with the shortest queue. If one counter closes, the manager sends all customers to the remaining three counters.

           Internet Traffic
                 |
        [Elastic Load Balancer]
        /         |          \
[EC2: Web1] [EC2: Web2] [EC2: Web3]
      |            |           |
              [RDS Database]

Types of AWS Load Balancers

Type	OSI Layer	Protocol	Best For
Application Load Balancer (ALB)	Layer 7 (HTTP/HTTPS)	HTTP, HTTPS, WebSocket	Web apps, REST APIs, microservices
Network Load Balancer (NLB)	Layer 4 (TCP/UDP)	TCP, UDP, TLS	Ultra-low latency, gaming, financial systems
Gateway Load Balancer (GWLB)	Layer 3	IP	Third-party network appliances, firewalls
Classic Load Balancer	Layer 4/7	HTTP, HTTPS, TCP	Legacy applications (not recommended for new projects)

Application Load Balancer (ALB) — Deep Dive

ALB is the most commonly used load balancer for web applications. It routes traffic based on HTTP request content — URL path, hostname, HTTP headers, and query strings.

Path-Based Routing

ALB can route different URL paths to different groups of servers (Target Groups):

[ALB]
  |-- /api/*        → Target Group: API Servers (EC2)
  |-- /static/*     → Target Group: Static File Servers (EC2)
  |-- /admin/*      → Target Group: Admin App Servers (EC2)

Host-Based Routing

Multiple domains can be handled by one ALB:

[ALB]
  |-- api.myapp.com     → Backend API servers
  |-- app.myapp.com     → Frontend web servers
  |-- admin.myapp.com   → Admin panel servers

Health Checks

ALB continuously checks the health of registered instances by sending periodic requests to a health check endpoint (e.g., /health). If an instance fails the health check, ALB stops sending traffic to it automatically. When the instance recovers, ALB resumes sending traffic.

What Is Auto Scaling?

Auto Scaling automatically adjusts the number of EC2 instances based on actual demand. When traffic increases, new instances are added. When traffic drops, extra instances are removed. This ensures the right amount of capacity at all times — no over-provisioning (wasting money) and no under-provisioning (poor performance).

Auto Scaling Core Components

1. Launch Template

A Launch Template defines the configuration for new EC2 instances that Auto Scaling creates. It includes: AMI, instance type, key pair, security group, IAM role, and User Data script. Every new instance launched by Auto Scaling uses this template.

2. Auto Scaling Group (ASG)

An Auto Scaling Group is the logical container that manages a collection of EC2 instances. Key settings:

Minimum size: The fewest instances that should ever run (protects against scaling to zero). Example: 2.
Desired capacity: The target number of instances under normal conditions. Example: 3.
Maximum size: The most instances that can run (protects against unlimited scaling costs). Example: 10.

Auto Scaling Group
+---------------------------------------+
| Min: 2  |  Desired: 3  |  Max: 10     |
|                                       |
| [EC2-1]  [EC2-2]  [EC2-3]            |
|                                       |
| Spread across: AZ-a, AZ-b, AZ-c      |
+---------------------------------------+

3. Scaling Policies

Scaling policies define when and how the Auto Scaling Group adds or removes instances:

Policy Type	How It Works	Example
Target Tracking	Maintain a target metric value. Most recommended.	Keep average CPU at 50% — add/remove instances as needed
Step Scaling	Scale by different amounts at different alarm thresholds	CPU 70–80%: add 1. CPU 80–90%: add 2. CPU 90%+: add 3
Simple Scaling	Add or remove a fixed number when an alarm triggers	CPU > 80%: add 1 instance
Scheduled Scaling	Scale at defined times	Add 5 instances every Friday at 6 PM (peak time)
Predictive Scaling	Use ML to forecast and pre-scale	Automatically anticipates Monday morning traffic

ELB + Auto Scaling — Combined Architecture

[Users]
   |
[Application Load Balancer]
   |
[Auto Scaling Group]
   |         |         |
[EC2-1]   [EC2-2]   [EC2-3]
   |         |         |
         [RDS Database]

Scale-Out Event (CPU > 70%):
[EC2-1] [EC2-2] [EC2-3] + [EC2-4] [EC2-5] ← new instances added
ALB automatically routes to new instances

Scale-In Event (CPU < 30%):
[EC2-1] [EC2-2] ← instances terminated
ALB stops routing to terminated instances

Sticky Sessions

By default, ALB distributes each request to any available instance — the same user may hit different servers on consecutive requests. For applications that store session data locally (not in a shared database), sticky sessions ensure a user's requests always go to the same instance.

A better practice is to make applications stateless — store session data in ElastiCache or DynamoDB — so any instance can handle any request. This makes scaling simpler and more reliable.

Cross-Zone Load Balancing

Cross-zone load balancing distributes traffic evenly across all registered instances in all AZs, regardless of which AZ the traffic arrives in. Without cross-zone balancing, if AZ-a has 4 instances and AZ-b has 1 instance, AZ-b's single instance receives 50% of traffic. Cross-zone balancing distributes traffic evenly across all 5 instances.

Real-World Example — Flash Sale on an E-Commerce Site

An e-commerce website normally runs on 3 EC2 instances. During a flash sale, traffic jumps 10x in minutes.

Auto Scaling detects CPU climbing above 60% (Target Tracking policy).
New EC2 instances launch automatically using the Launch Template — 3, 5, 8, 10 instances within minutes.
The ALB immediately starts distributing traffic across all running instances.
After the sale, traffic drops. Auto Scaling scales back down to 3 instances over 15 minutes (scale-in cooldown prevents immediate termination).

Summary

Elastic Load Balancing distributes traffic across multiple instances for availability and performance.
ALB routes HTTP/HTTPS traffic and supports path-based and host-based routing. NLB handles TCP/UDP at ultra-low latency.
Auto Scaling adjusts EC2 instance count based on real demand — scaling out during high traffic and in during low traffic.
Launch Templates define instance configuration. Auto Scaling Groups define min, desired, and max instance counts.
Target Tracking Scaling is the simplest and most effective policy for most use cases.

Previous lessons

Back to courses

Next lessons