Zoom-in: Rate Limiter

You send too many requests in a short time to an API, and the system returns a sharp error: “429 Too Many Requests”.

graph LR
    C(["💻 Client"]) -->|"Rapid Requests"| RL["🛡️ Rate Limiter"]
    RL -->|"Limit Exceeded"| Reject["❌ 429 Too Many Requests"]
    RL -->|"Valid"| API["🖥️ API Service"]
    style C fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd
    style RL fill:#3b2a1a,stroke:#f59e0b,color:#fcd34d
    style Reject fill:#1e293b,stroke:#e11d48,color:#fda4af
    style API fill:#1a3a2a,stroke:#22c55e,color:#86efac

Let's zoom in on this gatekeeper.

Layer 1 — Token Bucket: The Simple Counting Algorithm

To block excess requests, the system needs an algorithm that counts request frequency efficiently using minimal memory. The most common choice is the Token Bucket algorithm.

graph TD
    Refill["💧 Refill rate: +2 tokens/sec"] --> Bucket{"🪣 Token Bucket\n(Max: 5 tokens)"}
    Req["📥 New Request"] --> Check{"Check Bucket"}
    Check -->|"Token available (Consume 1)"| Allow["✅ Allow through"]
    Check -->|"No tokens left"| Reject["❌ Reject with 429"]
    Bucket -.-> Check

    style Refill fill:#1e293b,stroke:#475569,color:#cbd5e1
    style Bucket fill:#3b2a1a,stroke:#f59e0b,color:#fcd34d
    style Req fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd

Each client (identified by IP or API Key) is assigned a virtual "bucket" that holds tokens. The bucket has a maximum capacity (e.g., 5 tokens) and refills at a constant rate (e.g., 2 tokens per second).

Every incoming request consumes 1 token. If the bucket runs out of tokens, the request is immediately rejected. This algorithm is memory-efficient and allows the system to handle short bursts of traffic when the bucket is full.

→ Remaining problem: simple Fixed Window counters can be bypassed if an attacker sends twice the allowed requests at the boundary line between two time windows.

Layer 2 — Sliding Window Counter: Precision Over Time

To solve the boundary issue of Fixed Window algorithms (e.g., allowing 10 requests between 1:00 and 2:00, then resetting at 2:00), the Rate Limiter uses a Sliding Window Counter.

graph LR
    subgraph Previous Window: 10 requests
        W1["Old Window (last min)"]
    end
    subgraph Current Window: 2 requests
        W2["New Window (this min)"]
    end
    W1 -.->|"Slid 30% into new min"| SW["Actual Sliding Window"]
    W2 -.-> SW
    style W1 fill:#1e293b,stroke:#334155,color:#94a3b8
    style W2 fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd
    style SW fill:#3b2a1a,stroke:#f59e0b,color:#fcd34d

Instead of resetting the request count to zero at the start of a new minute, the algorithm estimates the request rate by sliding the time window.

For example, if we are 18 seconds into the current minute (30% elapsed), the Rate Limiter calculates the active count as: requests in old window * 70% + requests in new window. This calculation prevents attackers from bypassing limits by grouping requests around time-window resets.

→ Remaining problem: when scaling out across multiple application servers, how do we synchronize client request counts without causing database write conflicts?

Layer 3 — Distributed Limiting: Atomic Actions With Redis and Lua Scripts

When your application runs on a cluster of multiple servers, tracking request counts in local server memory is ineffective. You need a centralized store like Redis.

sequenceDiagram
    participant S1 as App Server 1
    participant R as Redis (Shared)
    participant S2 as App Server 2

    Note over S1,S2: Simultaneous requests from IP 1.2.3.4
    S1->>R: Execute Lua Script (Check & decrement)
    Note over R: Single-threaded (Atomic)<br/>Decrements 1 token successfully
    R-->>S1: Approve (Allow request)

    S2->>R: Execute Lua Script (Check & decrement)
    Note over R: No tokens left!
    R-->>S2: Reject (Return 429)

If two servers read a count from Redis, add one, and write it back at the same time, a race condition occurs, leading to incorrect counts.

To solve this, developers execute the check-and-update logic as a single Lua Script on Redis. Because Redis is single-threaded, it runs the script atomically, preventing concurrent requests from corrupting the count data.

Full picture

sequenceDiagram
    participant C as Client
    participant LB as Load Balancer
    participant App as App Server
    participant Redis as Redis Cache

    C->>LB: GET /api/resource (IP: 1.2.3.4)
    LB->>App: Forward Request
    App->>Redis: EVALSHA rate_limit.lua (Key: limit:1.2.3.4)
    Note over Redis: Run Lua Script atomically:<br/>1. Read current tokens<br/>2. If remaining, decrement and update TTL
    alt Tokens available in bucket
        Redis-->>App: Return: [1, remaining_tokens]
        App->>App: Process business logic
        App-->>C: 200 OK
    else No tokens left
        Redis-->>App: Return: [0, 0]
        App-->>C: 429 Too Many Requests (Retry-After: 30s)
    end

Takeaway

A Rate Limiter is the primary defense that protects an API from application-layer overload. Designing a distributed rate-limiting system requires balancing accuracy and performance: running Lua scripts on a centralized Redis cache ensures absolute consistency, but it can also turn your Redis cache into a new architectural bottleneck under massive loads.

This post was assisted by Amy 🌸 - AI Assistant. Content has been reviewed by the author.

Zoom-in: Rate Limiter

Layer 1 — Token Bucket: The Simple Counting Algorithm

Layer 2 — Sliding Window Counter: Precision Over Time

Layer 3 — Distributed Limiting: Atomic Actions With Redis and Lua Scripts

Full picture

Takeaway

Related Posts

Zoom-in: WebSocket

Zoom-in: Virtual Memory

Zoom-in: Tokenizer