Zoom-in: Rate Limiter

You send too many requests in a short time to an API, and the system returns a sharp error: “429 Too Many Requests”.
graph LR
C(["💻 Client"]) -->|"Rapid Requests"| RL["🛡️ Rate Limiter"]
RL -->|"Limit Exceeded"| Reject["❌ 429 Too Many Requests"]
RL -->|"Valid"| API["🖥️ API Service"]
style C fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd
style RL fill:#3b2a1a,stroke:#f59e0b,color:#fcd34d
style Reject fill:#1e293b,stroke:#e11d48,color:#fda4af
style API fill:#1a3a2a,stroke:#22c55e,color:#86efac
Let's zoom in on this gatekeeper.
Layer 1 — Token Bucket: The Simple Counting Algorithm
To block excess requests, the system needs an algorithm that counts request frequency efficiently using minimal memory. The most common choice is the Token Bucket algorithm.
graph TD
Refill["💧 Refill rate: +2 tokens/sec"] --> Bucket{"🪣 Token Bucket\n(Max: 5 tokens)"}
Req["📥 New Request"] --> Check{"Check Bucket"}
Check -->|"Token available (Consume 1)"| Allow["✅ Allow through"]
Check -->|"No tokens left"| Reject["❌ Reject with 429"]
Bucket -.-> Check
style Refill fill:#1e293b,stroke:#475569,color:#cbd5e1
style Bucket fill:#3b2a1a,stroke:#f59e0b,color:#fcd34d
style Req fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd
Each client (identified by IP or API Key) is assigned a virtual "bucket" that holds tokens. The bucket has a maximum capacity (e.g., 5 tokens) and refills at a constant rate (e.g., 2 tokens per second).
Every incoming request consumes 1 token. If the bucket runs out of tokens, the request is immediately rejected. This algorithm is memory-efficient and allows the system to handle short bursts of traffic when the bucket is full.
Layer 2 — Sliding Window Counter: Precision Over Time
To solve the boundary issue of Fixed Window algorithms (e.g., allowing 10 requests between 1:00 and 2:00, then resetting at 2:00), the Rate Limiter uses a Sliding Window Counter.
graph LR
subgraph Previous Window: 10 requests
W1["Old Window (last min)"]
end
subgraph Current Window: 2 requests
W2["New Window (this min)"]
end
W1 -.->|"Slid 30% into new min"| SW["Actual Sliding Window"]
W2 -.-> SW
style W1 fill:#1e293b,stroke:#334155,color:#94a3b8
style W2 fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd
style SW fill:#3b2a1a,stroke:#f59e0b,color:#fcd34d
Instead of resetting the request count to zero at the start of a new minute, the algorithm estimates the request rate by sliding the time window.
For example, if we are 18 seconds into the current minute (30% elapsed), the Rate Limiter calculates the active count as: requests in old window * 70% + requests in new window. This calculation prevents attackers from bypassing limits by grouping requests around time-window resets.
Layer 3 — Distributed Limiting: Atomic Actions With Redis and Lua Scripts
When your application runs on a cluster of multiple servers, tracking request counts in local server memory is ineffective. You need a centralized store like Redis.
sequenceDiagram
participant S1 as App Server 1
participant R as Redis (Shared)
participant S2 as App Server 2
Note over S1,S2: Simultaneous requests from IP 1.2.3.4
S1->>R: Execute Lua Script (Check & decrement)
Note over R: Single-threaded (Atomic)<br/>Decrements 1 token successfully
R-->>S1: Approve (Allow request)
S2->>R: Execute Lua Script (Check & decrement)
Note over R: No tokens left!
R-->>S2: Reject (Return 429)
If two servers read a count from Redis, add one, and write it back at the same time, a race condition occurs, leading to incorrect counts.
To solve this, developers execute the check-and-update logic as a single Lua Script on Redis. Because Redis is single-threaded, it runs the script atomically, preventing concurrent requests from corrupting the count data.
Full picture
sequenceDiagram
participant C as Client
participant LB as Load Balancer
participant App as App Server
participant Redis as Redis Cache
C->>LB: GET /api/resource (IP: 1.2.3.4)
LB->>App: Forward Request
App->>Redis: EVALSHA rate_limit.lua (Key: limit:1.2.3.4)
Note over Redis: Run Lua Script atomically:<br/>1. Read current tokens<br/>2. If remaining, decrement and update TTL
alt Tokens available in bucket
Redis-->>App: Return: [1, remaining_tokens]
App->>App: Process business logic
App-->>C: 200 OK
else No tokens left
Redis-->>App: Return: [0, 0]
App-->>C: 429 Too Many Requests (Retry-After: 30s)
end
Takeaway
A Rate Limiter is the primary defense that protects an API from application-layer overload. Designing a distributed rate-limiting system requires balancing accuracy and performance: running Lua scripts on a centralized Redis cache ensures absolute consistency, but it can also turn your Redis cache into a new architectural bottleneck under massive loads.
This post was assisted by Amy 🌸 - AI Assistant. Content has been reviewed by the author.
Related Posts
Zoom-in: WebSocket
Your chat app updates instantly without reloading the page. How WebSocket breaks free from the one-way limits of HTTP.
Zoom-in: Virtual Memory
Run multiple apps at once, and each one acts like it owns all of your RAM. How does the operating system isolate memory space so securely?
Zoom-in: Tokenizer
The model doesn't read letters like you think. Zoom in on how text is sliced and how it impacts both API costs and context memory.