System Design Load Balancing

Load balancing is the process of distributing incoming network traffic across multiple servers so no single server gets overwhelmed. Instead of one server handling all requests alone, a load balancer acts as a traffic controller that directs each request to the most suitable server at that moment.

Picture a highway with multiple toll booths. Without load balancing, all cars (requests) pile up at one booth while others sit empty. A load balancer opens all booths and directs cars evenly, keeping traffic flowing smoothly.

Why Load Balancing Is Essential

A single server has limits — memory, CPU, and network bandwidth. When traffic exceeds these limits, the server slows down or crashes. Load balancing solves this by spreading the work across many servers, enabling the system to handle far more traffic than any single machine could manage.

Load balancing also improves reliability. If one server fails, the load balancer automatically routes requests to the remaining healthy servers, so users experience no downtime.

How a Load Balancer Works

                        +----------+
                        |  Server  |
            +---------> |    A     |
            |           +----------+
            |
+----------+|           +----------+
|          ||           |  Server  |
|  Client  |+---------> |    B     |
| Requests ||           +----------+
|          ||
+----------+|           +----------+
     |      |           |  Server  |
     |      +---------> |    C     |
     v                  +----------+
+----------+
|  Load    |
| Balancer |
+----------+

All client requests arrive at the load balancer's single IP address.
The load balancer evaluates which server should handle the request.
The request gets forwarded to the selected server.
The server processes the request and responds directly to the client (or through the load balancer).

Load Balancing Algorithms

Load balancers use different algorithms to decide which server gets each request:

1. Round Robin

Requests distribute to servers in a fixed circular order. Server A gets request 1, Server B gets request 2, Server C gets request 3, then back to Server A for request 4.

Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A  (back to start)
Request 5 → Server B

Best for: Servers with equal capacity handling similar workloads.

Problem: Ignores server current load. A slow server still receives the same share of requests.

2. Weighted Round Robin

Each server gets a weight based on its capacity. Powerful servers receive more requests proportional to their weight.

Server A (weight 3): handles 3 requests per cycle
Server B (weight 1): handles 1 request per cycle

Cycle: A, A, A, B, A, A, A, B ...

Best for: Server clusters where machines have different specs.

3. Least Connections

The load balancer always routes the next request to the server with the fewest active connections at that moment.

Server A: 50 active connections
Server B: 20 active connections  ← Next request goes here
Server C: 35 active connections

Best for: Long-running connections like WebSockets, database sessions.

4. IP Hash

The client's IP address determines which server handles its requests. The same client always reaches the same server.

Client IP 192.168.1.10 → Always → Server A
Client IP 192.168.1.20 → Always → Server B

Best for: Stateful sessions where the server needs to remember the user (shopping cart stored in server memory).

5. Random

A server gets selected at random for each request. Simple but statistically tends to distribute evenly over time.

6. Resource-Based (Adaptive)

The load balancer checks each server's real-time CPU, memory, and response time, then routes to the least loaded server. This requires active health monitoring of all servers.

Types of Load Balancers

Layer 4 Load Balancer (Transport Layer)

Operates at the TCP/UDP level. Routes traffic based on IP address and port number without inspecting the actual content of the request. Very fast because it does minimal processing.

Decision based on:
- Source IP: 192.168.1.5
- Destination Port: 443 (HTTPS)
→ Route to Server A

Example tools: AWS Network Load Balancer, HAProxy in TCP mode

Layer 7 Load Balancer (Application Layer)

Operates at the HTTP/HTTPS level. Inspects the actual request content — URL, headers, cookies — and routes intelligently based on this information.

Decision based on:
- URL path: /api/images → Route to Image Server
- URL path: /api/payments → Route to Payment Server
- Cookie: session_id=abc → Route to same server as previous request

Example tools: Nginx, HAProxy in HTTP mode, AWS Application Load Balancer

Feature	Layer 4 Load Balancer	Layer 7 Load Balancer
Speed	Faster (less processing)	Slower (reads request content)
Intelligence	Basic (IP + port only)	Smart (URL, headers, cookies)
SSL Termination	No	Yes
Content-based Routing	No	Yes
Use Case	Raw TCP traffic, gaming, VoIP	Web apps, APIs, microservices

Health Checks

Load balancers continuously monitor servers using health checks. If a server fails a check, the load balancer stops sending traffic to it until it recovers.

Load Balancer pings each server every 5 seconds:

Server A: Response 200 OK   → Healthy ✓ (receives traffic)
Server B: Response 200 OK   → Healthy ✓ (receives traffic)
Server C: No response       → Unhealthy ✗ (removed from rotation)

Server C recovers after 30 seconds:
Server C: Response 200 OK   → Healthy ✓ (added back to rotation)

Health checks can be simple (ping the server) or detailed (call a specific health endpoint that checks database connectivity, cache availability, etc.).

Session Persistence (Sticky Sessions)

Some applications store user session data on the server. If a user gets routed to a different server on each request, the session is lost. Sticky sessions solve this by ensuring a user always reaches the same server.

User logs in → Load Balancer assigns to Server A
User's next request → Load Balancer checks cookie → Routes to Server A again

Drawback: Sticky sessions reduce the effectiveness of load balancing. If Server A is overloaded, users stuck on it still suffer poor performance. A better solution is to store sessions in a shared cache like Redis instead of on individual servers.

Global Load Balancing

For systems serving users worldwide, load balancing happens at the geographic level. DNS-based load balancing routes users to the data center nearest to them.

User in India    → DNS resolves to Mumbai data center
User in Germany  → DNS resolves to Frankfurt data center
User in USA      → DNS resolves to Virginia data center

This reduces latency because data travels a shorter distance. It also provides disaster recovery — if one data center goes offline, DNS routes all traffic to another region.

Load Balancer vs Reverse Proxy

Feature	Load Balancer	Reverse Proxy
Primary Purpose	Distribute traffic across servers	Forward requests on behalf of servers
Number of Backends	Multiple servers	Can be one or many servers
SSL Termination	Sometimes	Yes, commonly
Caching	Rarely	Often yes
Example	AWS ALB distributing API traffic	Nginx serving as a secure front door

In practice, many tools (like Nginx and HAProxy) act as both a load balancer and a reverse proxy simultaneously.

Redundant Load Balancers

A single load balancer is itself a single point of failure. If it goes down, the entire system becomes unreachable. Production systems run two load balancers: one active and one standby. If the active load balancer fails, the standby takes over immediately using a technique called failover.

+------------------+
| Active LB        | ← All traffic goes here
+------------------+
        |
        | Heartbeat (monitors health)
        |
+------------------+
| Standby LB       | ← Takes over if Active fails
+------------------+

Summary

Load balancing is what allows a system to serve millions of users by spreading work across many servers. Choosing the right algorithm — round robin, least connections, or IP hash — depends on the application's nature and traffic patterns. Health checks ensure failed servers are removed automatically. For stateful applications, shared session storage eliminates the need for sticky sessions. Load balancers are one of the most important components in any scalable system architecture.

Previous lessons

Back to courses

Next lessons