Zoom-in: Load Balancer

One domain. Millions of requests per day. No single server can handle that alone — and even if one could, a single server is a single point of failure.

Zoom in on how traffic gets distributed.

Layer 1 — The problem: single point of failure

Without a load balancer, every request goes to one server.

graph LR
    U1["👤 User 1"] -->|"request"| S["🖥️ Server"]
    U2["👤 User 2"] -->|"request"| S
    U3["👤 User 3"] -->|"request"| S
    S -.->|"💥 down"| X[" "]
    style U1 fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd
    style U2 fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd
    style U3 fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd
    style S fill:#7f1d1d,stroke:#ef4444,color:#fca5a5

Server down → entire service down. New deploy → downtime. Traffic spike → server overloaded. A load balancer solves all three by sitting in front of multiple servers and distributing requests.

→ Problem remaining: multiple backend servers are now in place — but which server does each request go to? A distribution algorithm is needed.

Layer 2 — Distribution algorithms

Three common algorithms, three different trade-offs.

graph TD
    LB["⚖️ Load Balancer"]

    LB -->|"Round-robin: in turn"| S1["🖥️ Server 1"]
    LB -->|"Round-robin: in turn"| S2["🖥️ Server 2"]
    LB -->|"Round-robin: in turn"| S3["🖥️ Server 3"]

    style LB fill:#3b2a1a,stroke:#f59e0b,color:#fcd34d
    style S1 fill:#1a3a2a,stroke:#22c55e,color:#86efac
    style S2 fill:#1a3a2a,stroke:#22c55e,color:#86efac
    style S3 fill:#1a3a2a,stroke:#22c55e,color:#86efac

Round-robin: each request goes to the next server in sequence. Simple, stateless. Downside: doesn't account for Server 1 processing a heavy request while Server 2 is idle.

Least connections: request goes to the server with the fewest active connections. Better suited when request processing time varies — for example, some DB queries run much longer than others.

IP hash: the client's IP determines which server receives the request. The same client always lands on the same server — useful for session affinity without sticky sessions at the LB layer.

→ Problem remaining: what if the algorithm sends a request to a server that's down? The load balancer needs to know which servers are alive.

Layer 3 — Health checks: only route to healthy servers

The load balancer periodically checks each backend server.

sequenceDiagram
    participant LB as ⚖️ Load Balancer
    participant S1 as 🖥️ Server 1 (healthy)
    participant S2 as 🖥️ Server 2 (down)

    loop Every 10 seconds
        LB->>S1: GET /health
        S1-->>LB: 200 OK
        LB->>S2: GET /health
        Note over S2: timeout
        LB->>S2: GET /health (retry)
        Note over S2: timeout again
        Note over LB: Mark S2 unhealthy
    end

    Note over LB: Only route traffic to S1

Health checks can be TCP pings (just verifying the port is open) or HTTP requests to a /health endpoint (verifying the app actually responds). HTTP health checks are more accurate — a server can accept TCP connections while the app inside has crashed.

When a server recovers, the load balancer automatically returns it to the pool after several consecutive successful health checks.

→ Problem remaining: at which layer of the network stack is the load balancer operating? The answer determines its routing capabilities.

Layer 4 — L4 vs L7: two layers, two capabilities

graph LR
    subgraph "Layer 4 LB (TCP)"
        LB4["⚖️ L4 LB"] -->|"by IP:port"| S4A["🖥️ Server A"]
        LB4 -->|"by IP:port"| S4B["🖥️ Server B"]
    end

    subgraph "Layer 7 LB (HTTP)"
        LB7["⚖️ L7 LB"] -->|"/api/* → API server"| SA["🖥️ API Server"]
        LB7 -->|"/static/* → CDN origin"| SB["🖥️ Static Server"]
        LB7 -->|"Host: admin.* → admin"| SC["🖥️ Admin Server"]
    end

    style LB4 fill:#3b2a1a,stroke:#f59e0b,color:#fcd34d
    style LB7 fill:#3b2a1a,stroke:#f59e0b,color:#fcd34d
    style S4A fill:#1a3a2a,stroke:#22c55e,color:#86efac
    style S4B fill:#1a3a2a,stroke:#22c55e,color:#86efac
    style SA fill:#1a3a2a,stroke:#22c55e,color:#86efac
    style SB fill:#1a3a2a,stroke:#22c55e,color:#86efac
    style SC fill:#1a3a2a,stroke:#22c55e,color:#86efac

L4 load balancer (AWS NLB, HAProxy in TCP mode): operates at the transport layer. Fast, low overhead, but only sees IP and port — cannot route by URL path or HTTP headers.

L7 load balancer (AWS ALB, nginx, Traefik): operates at the application layer. Can read HTTP headers, URLs, and cookies — enabling content-based routing. Can also terminate TLS, inject headers, perform A/B testing, and rate-limit by path.

Most web services use L7 for the flexibility. L4 fits when ultra-low latency is required or the protocol isn't HTTP.

→ Problem remaining: when an app holds session state on the server (WebSocket, in-memory cart), round-robin will send consecutive requests from the same user to different servers.

Layer 5 — Sticky sessions: pinning users to a server

Sometimes the same user must always reach the same server.

sequenceDiagram
    participant U as 👤 User
    participant LB as ⚖️ L7 Load Balancer
    participant S1 as 🖥️ Server 1
    participant S2 as 🖥️ Server 2

    U->>LB: First request
    LB->>S1: Forward request
    S1-->>LB: Response + Set-Cookie: SERVERID=s1
    LB-->>U: Response + cookie

    U->>LB: Second request (cookie: SERVERID=s1)
    Note over LB: Cookie → route to S1
    LB->>S1: Forward request
    S1-->>U: Response

Sticky sessions solve the stateful session problem — but create a new one: if S1 goes down, all sessions pinned to S1 are lost. The better solution is to externalize state to Redis or a shared DB — any server can then read any user's session, eliminating the need for stickiness.

Full picture

graph TD
    U1["👤 Users"] -->|"requests"| LB

    subgraph "Load Balancer"
        LB["⚖️ L7 LB\n(nginx / ALB)"]
        LB --> HC["🔍 Health Check\n(every 10s)"]
    end

    LB -->|"round-robin / least-conn"| S1["🖥️ Server 1\n✓ healthy"]
    LB -->|"round-robin / least-conn"| S2["🖥️ Server 2\n✓ healthy"]
    LB -.->|"🚫 skip"| S3["🖥️ Server 3\n✗ unhealthy"]

    S1 & S2 --> DB[("💾 Shared DB\n+ Redis")]

    style LB fill:#3b2a1a,stroke:#f59e0b,color:#fcd34d
    style S1 fill:#1a3a2a,stroke:#22c55e,color:#86efac
    style S2 fill:#1a3a2a,stroke:#22c55e,color:#86efac
    style S3 fill:#7f1d1d,stroke:#ef4444,color:#fca5a5
    style DB fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd

Takeaway

A load balancer isn't just "split requests evenly." It's the decision layer: which servers are alive, where each request goes, how sessions are handled. Choosing L4 or L7 is an architectural decision, not just a config choice.

Stateful server-side session is the most common reason horizontal scaling becomes hard: two consecutive requests from the same user may land on two different servers. Externalizing state to shared storage is the prerequisite for a load balancer to work as intended.

This post was assisted by Amy 🌸 - AI Assistant. Content has been reviewed by the author.

Zoom-in: Load Balancer

Layer 1 — The problem: single point of failure

Layer 2 — Distribution algorithms

Layer 3 — Health checks: only route to healthy servers

Layer 4 — L4 vs L7: two layers, two capabilities

Layer 5 — Sticky sessions: pinning users to a server

Full picture

Takeaway

Related Posts

Zoom-in: TCP

Zoom-in: DNS

Zoom-in: HTTP