System Design Load Balancing
Load balancing is the process of distributing incoming network traffic across multiple servers so no single server gets overwhelmed. Instead of one server handling all requests alone, a load balancer acts as a traffic controller that directs each request to the most suitable server at that moment.
Picture a highway with multiple toll booths. Without load balancing, all cars (requests) pile up at one booth while others sit empty. A load balancer opens all booths and directs cars evenly, keeping traffic flowing smoothly.
Why Load Balancing Is Essential
A single server has limits — memory, CPU, and network bandwidth. When traffic exceeds these limits, the server slows down or crashes. Load balancing solves this by spreading the work across many servers, enabling the system to handle far more traffic than any single machine could manage.
Load balancing also improves reliability. If one server fails, the load balancer automatically routes requests to the remaining healthy servers, so users experience no downtime.
How a Load Balancer Works
+----------+
| Server |
+---------> | A |
| +----------+
|
+----------+| +----------+
| || | Server |
| Client |+---------> | B |
| Requests || +----------+
| ||
+----------+| +----------+
| | | Server |
| +---------> | C |
v +----------+
+----------+
| Load |
| Balancer |
+----------+
- All client requests arrive at the load balancer's single IP address.
- The load balancer evaluates which server should handle the request.
- The request gets forwarded to the selected server.
- The server processes the request and responds directly to the client (or through the load balancer).
Load Balancing Algorithms
Load balancers use different algorithms to decide which server gets each request:
1. Round Robin
Requests distribute to servers in a fixed circular order. Server A gets request 1, Server B gets request 2, Server C gets request 3, then back to Server A for request 4.
Request 1 → Server A Request 2 → Server B Request 3 → Server C Request 4 → Server A (back to start) Request 5 → Server B
Best for: Servers with equal capacity handling similar workloads.
Problem: Ignores server current load. A slow server still receives the same share of requests.
2. Weighted Round Robin
Each server gets a weight based on its capacity. Powerful servers receive more requests proportional to their weight.
Server A (weight 3): handles 3 requests per cycle Server B (weight 1): handles 1 request per cycle Cycle: A, A, A, B, A, A, A, B ...
Best for: Server clusters where machines have different specs.
3. Least Connections
The load balancer always routes the next request to the server with the fewest active connections at that moment.
Server A: 50 active connections Server B: 20 active connections ← Next request goes here Server C: 35 active connections
Best for: Long-running connections like WebSockets, database sessions.
4. IP Hash
The client's IP address determines which server handles its requests. The same client always reaches the same server.
Client IP 192.168.1.10 → Always → Server A Client IP 192.168.1.20 → Always → Server B
Best for: Stateful sessions where the server needs to remember the user (shopping cart stored in server memory).
5. Random
A server gets selected at random for each request. Simple but statistically tends to distribute evenly over time.
6. Resource-Based (Adaptive)
The load balancer checks each server's real-time CPU, memory, and response time, then routes to the least loaded server. This requires active health monitoring of all servers.
Types of Load Balancers
Layer 4 Load Balancer (Transport Layer)
Operates at the TCP/UDP level. Routes traffic based on IP address and port number without inspecting the actual content of the request. Very fast because it does minimal processing.
Decision based on: - Source IP: 192.168.1.5 - Destination Port: 443 (HTTPS) → Route to Server A
Example tools: AWS Network Load Balancer, HAProxy in TCP mode
Layer 7 Load Balancer (Application Layer)
Operates at the HTTP/HTTPS level. Inspects the actual request content — URL, headers, cookies — and routes intelligently based on this information.
Decision based on: - URL path: /api/images → Route to Image Server - URL path: /api/payments → Route to Payment Server - Cookie: session_id=abc → Route to same server as previous request
Example tools: Nginx, HAProxy in HTTP mode, AWS Application Load Balancer
| Feature | Layer 4 Load Balancer | Layer 7 Load Balancer |
|---|---|---|
| Speed | Faster (less processing) | Slower (reads request content) |
| Intelligence | Basic (IP + port only) | Smart (URL, headers, cookies) |
| SSL Termination | No | Yes |
| Content-based Routing | No | Yes |
| Use Case | Raw TCP traffic, gaming, VoIP | Web apps, APIs, microservices |
Health Checks
Load balancers continuously monitor servers using health checks. If a server fails a check, the load balancer stops sending traffic to it until it recovers.
Load Balancer pings each server every 5 seconds: Server A: Response 200 OK → Healthy ✓ (receives traffic) Server B: Response 200 OK → Healthy ✓ (receives traffic) Server C: No response → Unhealthy ✗ (removed from rotation) Server C recovers after 30 seconds: Server C: Response 200 OK → Healthy ✓ (added back to rotation)
Health checks can be simple (ping the server) or detailed (call a specific health endpoint that checks database connectivity, cache availability, etc.).
Session Persistence (Sticky Sessions)
Some applications store user session data on the server. If a user gets routed to a different server on each request, the session is lost. Sticky sessions solve this by ensuring a user always reaches the same server.
User logs in → Load Balancer assigns to Server A User's next request → Load Balancer checks cookie → Routes to Server A again
Drawback: Sticky sessions reduce the effectiveness of load balancing. If Server A is overloaded, users stuck on it still suffer poor performance. A better solution is to store sessions in a shared cache like Redis instead of on individual servers.
Global Load Balancing
For systems serving users worldwide, load balancing happens at the geographic level. DNS-based load balancing routes users to the data center nearest to them.
User in India → DNS resolves to Mumbai data center User in Germany → DNS resolves to Frankfurt data center User in USA → DNS resolves to Virginia data center
This reduces latency because data travels a shorter distance. It also provides disaster recovery — if one data center goes offline, DNS routes all traffic to another region.
Load Balancer vs Reverse Proxy
| Feature | Load Balancer | Reverse Proxy |
|---|---|---|
| Primary Purpose | Distribute traffic across servers | Forward requests on behalf of servers |
| Number of Backends | Multiple servers | Can be one or many servers |
| SSL Termination | Sometimes | Yes, commonly |
| Caching | Rarely | Often yes |
| Example | AWS ALB distributing API traffic | Nginx serving as a secure front door |
In practice, many tools (like Nginx and HAProxy) act as both a load balancer and a reverse proxy simultaneously.
Redundant Load Balancers
A single load balancer is itself a single point of failure. If it goes down, the entire system becomes unreachable. Production systems run two load balancers: one active and one standby. If the active load balancer fails, the standby takes over immediately using a technique called failover.
+------------------+
| Active LB | ← All traffic goes here
+------------------+
|
| Heartbeat (monitors health)
|
+------------------+
| Standby LB | ← Takes over if Active fails
+------------------+
Summary
Load balancing is what allows a system to serve millions of users by spreading work across many servers. Choosing the right algorithm — round robin, least connections, or IP hash — depends on the application's nature and traffic patterns. Health checks ensure failed servers are removed automatically. For stateful applications, shared session storage eliminates the need for sticky sessions. Load balancers are one of the most important components in any scalable system architecture.
