Zoom-in: Load Balancer

One domain. Millions of requests per day. No single server can handle that alone — and even if one could, a single server is a single point of failure.
Zoom in on how traffic gets distributed.
Layer 1 — The problem: single point of failure
Without a load balancer, every request goes to one server.
graph LR
U1["👤 User 1"] -->|"request"| S["🖥️ Server"]
U2["👤 User 2"] -->|"request"| S
U3["👤 User 3"] -->|"request"| S
S -.->|"💥 down"| X[" "]
style U1 fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd
style U2 fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd
style U3 fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd
style S fill:#7f1d1d,stroke:#ef4444,color:#fca5a5
Server down → entire service down. New deploy → downtime. Traffic spike → server overloaded. A load balancer solves all three by sitting in front of multiple servers and distributing requests.
Layer 2 — Distribution algorithms
Three common algorithms, three different trade-offs.
graph TD
LB["⚖️ Load Balancer"]
LB -->|"Round-robin: in turn"| S1["🖥️ Server 1"]
LB -->|"Round-robin: in turn"| S2["🖥️ Server 2"]
LB -->|"Round-robin: in turn"| S3["🖥️ Server 3"]
style LB fill:#3b2a1a,stroke:#f59e0b,color:#fcd34d
style S1 fill:#1a3a2a,stroke:#22c55e,color:#86efac
style S2 fill:#1a3a2a,stroke:#22c55e,color:#86efac
style S3 fill:#1a3a2a,stroke:#22c55e,color:#86efac
Round-robin: each request goes to the next server in sequence. Simple, stateless. Downside: doesn't account for Server 1 processing a heavy request while Server 2 is idle.
Least connections: request goes to the server with the fewest active connections. Better suited when request processing time varies — for example, some DB queries run much longer than others.
IP hash: the client's IP determines which server receives the request. The same client always lands on the same server — useful for session affinity without sticky sessions at the LB layer.
Layer 3 — Health checks: only route to healthy servers
The load balancer periodically checks each backend server.
sequenceDiagram
participant LB as ⚖️ Load Balancer
participant S1 as 🖥️ Server 1 (healthy)
participant S2 as 🖥️ Server 2 (down)
loop Every 10 seconds
LB->>S1: GET /health
S1-->>LB: 200 OK
LB->>S2: GET /health
Note over S2: timeout
LB->>S2: GET /health (retry)
Note over S2: timeout again
Note over LB: Mark S2 unhealthy
end
Note over LB: Only route traffic to S1
Health checks can be TCP pings (just verifying the port is open) or HTTP requests to a /health endpoint (verifying the app actually responds). HTTP health checks are more accurate — a server can accept TCP connections while the app inside has crashed.
When a server recovers, the load balancer automatically returns it to the pool after several consecutive successful health checks.
Layer 4 — L4 vs L7: two layers, two capabilities
graph LR
subgraph "Layer 4 LB (TCP)"
LB4["⚖️ L4 LB"] -->|"by IP:port"| S4A["🖥️ Server A"]
LB4 -->|"by IP:port"| S4B["🖥️ Server B"]
end
subgraph "Layer 7 LB (HTTP)"
LB7["⚖️ L7 LB"] -->|"/api/* → API server"| SA["🖥️ API Server"]
LB7 -->|"/static/* → CDN origin"| SB["🖥️ Static Server"]
LB7 -->|"Host: admin.* → admin"| SC["🖥️ Admin Server"]
end
style LB4 fill:#3b2a1a,stroke:#f59e0b,color:#fcd34d
style LB7 fill:#3b2a1a,stroke:#f59e0b,color:#fcd34d
style S4A fill:#1a3a2a,stroke:#22c55e,color:#86efac
style S4B fill:#1a3a2a,stroke:#22c55e,color:#86efac
style SA fill:#1a3a2a,stroke:#22c55e,color:#86efac
style SB fill:#1a3a2a,stroke:#22c55e,color:#86efac
style SC fill:#1a3a2a,stroke:#22c55e,color:#86efac
L4 load balancer (AWS NLB, HAProxy in TCP mode): operates at the transport layer. Fast, low overhead, but only sees IP and port — cannot route by URL path or HTTP headers.
L7 load balancer (AWS ALB, nginx, Traefik): operates at the application layer. Can read HTTP headers, URLs, and cookies — enabling content-based routing. Can also terminate TLS, inject headers, perform A/B testing, and rate-limit by path.
Most web services use L7 for the flexibility. L4 fits when ultra-low latency is required or the protocol isn't HTTP.
Layer 5 — Sticky sessions: pinning users to a server
Sometimes the same user must always reach the same server.
sequenceDiagram
participant U as 👤 User
participant LB as ⚖️ L7 Load Balancer
participant S1 as 🖥️ Server 1
participant S2 as 🖥️ Server 2
U->>LB: First request
LB->>S1: Forward request
S1-->>LB: Response + Set-Cookie: SERVERID=s1
LB-->>U: Response + cookie
U->>LB: Second request (cookie: SERVERID=s1)
Note over LB: Cookie → route to S1
LB->>S1: Forward request
S1-->>U: Response
Sticky sessions solve the stateful session problem — but create a new one: if S1 goes down, all sessions pinned to S1 are lost. The better solution is to externalize state to Redis or a shared DB — any server can then read any user's session, eliminating the need for stickiness.
Full picture
graph TD
U1["👤 Users"] -->|"requests"| LB
subgraph "Load Balancer"
LB["⚖️ L7 LB\n(nginx / ALB)"]
LB --> HC["🔍 Health Check\n(every 10s)"]
end
LB -->|"round-robin / least-conn"| S1["🖥️ Server 1\n✓ healthy"]
LB -->|"round-robin / least-conn"| S2["🖥️ Server 2\n✓ healthy"]
LB -.->|"🚫 skip"| S3["🖥️ Server 3\n✗ unhealthy"]
S1 & S2 --> DB[("💾 Shared DB\n+ Redis")]
style LB fill:#3b2a1a,stroke:#f59e0b,color:#fcd34d
style S1 fill:#1a3a2a,stroke:#22c55e,color:#86efac
style S2 fill:#1a3a2a,stroke:#22c55e,color:#86efac
style S3 fill:#7f1d1d,stroke:#ef4444,color:#fca5a5
style DB fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd
Takeaway
A load balancer isn't just "split requests evenly." It's the decision layer: which servers are alive, where each request goes, how sessions are handled. Choosing L4 or L7 is an architectural decision, not just a config choice.
Stateful server-side session is the most common reason horizontal scaling becomes hard: two consecutive requests from the same user may land on two different servers. Externalizing state to shared storage is the prerequisite for a load balancer to work as intended.
This post was assisted by Amy 🌸 - AI Assistant. Content has been reviewed by the author.
Related Posts
Zoom-in: TCP
Every HTTP request runs on TCP — but before the first byte of real data crosses the wire, three packets are exchanged carrying no data at all. TCP solves the problem the Internet doesn't.
Zoom-in: DNS
Type 'google.com', press Enter. Your machine doesn't understand domain names — it only understands IP addresses. Between those two is a four-layer distributed lookup system.
Zoom-in: HTTP
Every web app starts from a simple model: client asks, server answers. HTTP is the language of that conversation — but five layers of infrastructure make it work.