Rate Limiting — Core Concepts
Interview Relevance: Very High — "How would you prevent API abuse?" appears in nearly every FAANG system design or deep-dive round. Know the algorithms and where to enforce them.
Rate limiting controls how many requests a client can make in a given time window, protecting your system from:
- 🚨 DDoS attacks — malicious flooding
- 🤖 Bot scraping — automated abuse
- 🐛 Runaway clients — buggy code in a loop
- 💰 Cost overruns — unbounded API usage in pay-per-use models
Where to Enforce Rate Limiting
Enforce at the API Gateway for most systems. It centralizes logic, keeps services clean, and works across all clients.
The 5 Rate Limiting Algorithms
Algorithm 1 — Fixed Window Counter
Divide time into fixed windows. Count requests per window. Reject when over the limit.
✅ Simple — just an integer counter per user per window
✅ Very fast — single Redis INCR + EXPIRE
❌ Boundary attack: burst 2× the limit at window edgesRedis implementation:
INCR rate:user:42:1701388800 ← key = user + unix minute timestamp
EXPIRE rate:user:42:1701388800 60
IF count > limit THEN rejectAlgorithm 2 — Sliding Window Log
Store a timestamp for every request. Count how many timestamps fall in the last N seconds.
✅ Perfectly accurate — no boundary attack
❌ High memory: must store every request timestamp
❌ O(N) cleanup: must remove old timestamps per request
Use: When precision is critical and traffic is lowRedis implementation:
ZADD rate:user:42 <timestamp> <request_id>
ZREMRANGEBYSCORE rate:user:42 0 <now - 60s>
count = ZCARD rate:user:42
EXPIRE rate:user:42 60Algorithm 3 — Sliding Window Counter ⭐ Best Balance
Hybrid approach: uses fixed windows but approximates a sliding window using the previous window's count.
✅ Approximates sliding window accuracy
✅ Only 2 counters per user (prev + curr window)
✅ Much lower memory than Sliding Window Log
❌ Approximation (not 100% exact — but good enough)
Best for: Most production rate limiters (Cloudflare, Nginx use this)Algorithm 4 — Token Bucket ⭐ Most Commonly Used
A bucket holds tokens. Each request consumes a token. Tokens refill at a fixed rate.
✅ Allows bursting (use saved-up tokens for traffic spikes)
✅ Smooth refill — doesn't allow sudden boundary attacks
✅ Simple to implement with Redis
✅ Used by AWS API Gateway, Stripe, Twitter API
❌ Harder to reason about exact request counts per windowToken Bucket with Redis:
-- Lua script (atomic in Redis)
local tokens = redis.call('GET', key) or capacity
local now = tonumber(ARGV[1])
local last = redis.call('GET', key..':last') or now
local delta = (now - last) * refill_rate
tokens = math.min(capacity, tokens + delta)
if tokens >= 1 then
tokens = tokens - 1
redis.call('SET', key, tokens)
return 1 -- allow
else
return 0 -- reject
endAlgorithm 5 — Leaky Bucket
Requests enter a queue (bucket). They are processed at a constant rate, regardless of how fast they arrive. Excess requests overflow and are dropped.
✅ Guarantees smooth, predictable output rate to the server
✅ Prevents any burst from overwhelming downstream
❌ Bursty legitimate traffic gets queued or dropped
❌ Not great for user-facing APIs where bursts are expected
Best for: Rate-smoothing at infrastructure level (network packets, DB writes)Algorithm Comparison
| Algorithm | Memory | Accuracy | Burst Support | Complexity | Best For |
|---|---|---|---|---|---|
| Fixed Window | ✅ O(1) | ❌ Boundary attack | ❌ No | ✅ Trivial | Simple internal limits |
| Sliding Window Log | ❌ O(N) | ✅ Exact | ✅ Natural | Medium | Low-traffic, precise billing |
| Sliding Window Counter | ✅ O(1) | ✅ ~Exact | ⚠️ Limited | Low | Most production APIs |
| Token Bucket | ✅ O(1) | ✅ Good | ✅ Yes | Medium | User-facing APIs, SDKs |
| Leaky Bucket | O(queue) | ✅ Good | ❌ Drops bursts | Medium | Traffic shaping, infrastructure |
Distributed Rate Limiting
A single Redis node is a bottleneck for rate limiting at scale. Here's how to distribute it.
Strategies for distributed rate limiting:
| Strategy | How | Trade-off |
|---|---|---|
| Centralized Redis | All GW nodes share one Redis counter | Accurate, but Redis is bottleneck + SPOF |
| Redis Cluster | Shard counters across Redis nodes | High availability, consistent |
| Local approximation | Each node tracks limit / N, syncs periodically | Slightly inaccurate, very fast |
| Sliding window at edge | CDN (Cloudflare) handles before traffic hits origin | Stops DDoS at edge, no infra load |
HTTP Response Headers for Rate Limiting
Always return these headers so clients can self-throttle:
HTTP/1.1 200 OK
X-RateLimit-Limit: 100 ← max requests per window
X-RateLimit-Remaining: 47 ← requests remaining in current window
X-RateLimit-Reset: 1701388860 ← Unix timestamp when window resets
HTTP/1.1 429 Too Many Requests
Retry-After: 30 ← seconds until client can retry
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0Worked Example: Rate Limiting the URL Shortener API
Rate limit tiers for the URL Shortener:
| Tier | Limit | Algorithm | Key | Reason |
|---|---|---|---|---|
| Anonymous reads | 1,000 req/min/IP | Token Bucket | rate:ip:<ip> | Prevent scraping |
| Authenticated shortens | 100/day/user | Sliding Window Counter | rate:user:<id>:day | Quota enforcement |
| API key (paid tier) | 10,000/day/key | Token Bucket | rate:key:<api_key> | Billing tiers |
| Viral short code | 50,000 req/min | Token Bucket | rate:code:<code> | Hot link protection |
Interview Cheat Sheet
One-Line Summaries
Fixed Window: Count per time window — simple, boundary attack risk
Sliding Window Log: Exact but O(N) memory — timestamp per request
Sliding Window Counter: Approximate sliding with 2 counters — best balance
Token Bucket: Tokens refill at rate — allows bursting, widely used
Leaky Bucket: Smooth output rate — good for infra, bad for user APIs
Distributed RL: Local approx + periodic Redis sync — scale across GW nodesThe Interview Phrase
"I'd implement rate limiting at the API Gateway using a Token Bucket
algorithm backed by Redis. Token Bucket is ideal because it allows
legitimate traffic bursts while still enforcing an average rate.
Each user gets a Redis key holding their current token count; a Lua
script atomically checks and decrements tokens to avoid race conditions.
For distributed deployments across multiple gateway instances, I'd
shard the Redis keys by user_id to distribute load. I'd return
X-RateLimit-Remaining headers so clients can self-throttle, and
429 Too Many Requests with a Retry-After header on rejection."Red Flags vs. Green Flags
| 🔴 Red Flag | 🟢 Green Flag |
|---|---|
| "Just reject requests above the limit" | Specify the algorithm and justify it |
| Enforce rate limits in every individual service | Centralize at API Gateway |
| Ignore distributed rate limiting | Address multi-instance rate limit counters |
| No 429 / Retry-After headers | Return proper HTTP 429 with Retry-After |
| Use Fixed Window without mentioning boundary attack | Know the limitation and mention Sliding Window or Token Bucket |
| One rate limit for all clients | Different tiers: anonymous, authenticated, paid API key |
IMPORTANT
Use a Lua script in Redis for atomic token bucket operations. A non-atomic GET + SET sequence has a race condition: two requests can both read "1 token remaining" and both succeed, bypassing the limit.
TIP
Mentioning different rate limit tiers (anonymous vs. authenticated vs. paid API key) and returning proper HTTP headers (X-RateLimit-Remaining, Retry-After) shows real production experience.
