Rate Limiting — Core Concepts

Interview Relevance: Very High — "How would you prevent API abuse?" appears in nearly every FAANG system design or deep-dive round. Know the algorithms and where to enforce them.

Rate limiting controls how many requests a client can make in a given time window, protecting your system from:

🚨 DDoS attacks — malicious flooding
🤖 Bot scraping — automated abuse
🐛 Runaway clients — buggy code in a loop
💰 Cost overruns — unbounded API usage in pay-per-use models

Where to Enforce Rate Limiting

Enforce at the API Gateway for most systems. It centralizes logic, keeps services clean, and works across all clients.

The 5 Rate Limiting Algorithms

Algorithm 1 — Fixed Window Counter

Divide time into fixed windows. Count requests per window. Reject when over the limit.

✅ Simple — just an integer counter per user per window
✅ Very fast — single Redis INCR + EXPIRE
❌ Boundary attack: burst 2× the limit at window edges

Redis implementation:

INCR rate:user:42:1701388800   ← key = user + unix minute timestamp
EXPIRE rate:user:42:1701388800 60
IF count > limit THEN reject

Algorithm 2 — Sliding Window Log

Store a timestamp for every request. Count how many timestamps fall in the last N seconds.

✅ Perfectly accurate — no boundary attack
❌ High memory: must store every request timestamp
❌ O(N) cleanup: must remove old timestamps per request
Use: When precision is critical and traffic is low

Redis implementation:

ZADD rate:user:42 <timestamp> <request_id>
ZREMRANGEBYSCORE rate:user:42 0 <now - 60s>
count = ZCARD rate:user:42
EXPIRE rate:user:42 60

Algorithm 3 — Sliding Window Counter ⭐ Best Balance

Hybrid approach: uses fixed windows but approximates a sliding window using the previous window's count.

✅ Approximates sliding window accuracy
✅ Only 2 counters per user (prev + curr window)
✅ Much lower memory than Sliding Window Log
❌ Approximation (not 100% exact — but good enough)
Best for: Most production rate limiters (Cloudflare, Nginx use this)

Algorithm 4 — Token Bucket ⭐ Most Commonly Used

A bucket holds tokens. Each request consumes a token. Tokens refill at a fixed rate.

✅ Allows bursting (use saved-up tokens for traffic spikes)
✅ Smooth refill — doesn't allow sudden boundary attacks
✅ Simple to implement with Redis
✅ Used by AWS API Gateway, Stripe, Twitter API
❌ Harder to reason about exact request counts per window

Token Bucket with Redis:

lua

-- Lua script (atomic in Redis)
local tokens = redis.call('GET', key) or capacity
local now = tonumber(ARGV[1])
local last = redis.call('GET', key..':last') or now
local delta = (now - last) * refill_rate
tokens = math.min(capacity, tokens + delta)
if tokens >= 1 then
    tokens = tokens - 1
    redis.call('SET', key, tokens)
    return 1  -- allow
else
    return 0  -- reject
end

Algorithm 5 — Leaky Bucket

Requests enter a queue (bucket). They are processed at a constant rate, regardless of how fast they arrive. Excess requests overflow and are dropped.

✅ Guarantees smooth, predictable output rate to the server
✅ Prevents any burst from overwhelming downstream
❌ Bursty legitimate traffic gets queued or dropped
❌ Not great for user-facing APIs where bursts are expected
Best for: Rate-smoothing at infrastructure level (network packets, DB writes)

Algorithm Comparison

Algorithm	Memory	Accuracy	Burst Support	Complexity	Best For
Fixed Window	✅ O(1)	❌ Boundary attack	❌ No	✅ Trivial	Simple internal limits
Sliding Window Log	❌ O(N)	✅ Exact	✅ Natural	Medium	Low-traffic, precise billing
Sliding Window Counter	✅ O(1)	✅ ~Exact	⚠️ Limited	Low	Most production APIs
Token Bucket	✅ O(1)	✅ Good	✅ Yes	Medium	User-facing APIs, SDKs
Leaky Bucket	O(queue)	✅ Good	❌ Drops bursts	Medium	Traffic shaping, infrastructure

Distributed Rate Limiting

A single Redis node is a bottleneck for rate limiting at scale. Here's how to distribute it.

Strategies for distributed rate limiting:

Strategy	How	Trade-off
Centralized Redis	All GW nodes share one Redis counter	Accurate, but Redis is bottleneck + SPOF
Redis Cluster	Shard counters across Redis nodes	High availability, consistent
Local approximation	Each node tracks `limit / N`, syncs periodically	Slightly inaccurate, very fast
Sliding window at edge	CDN (Cloudflare) handles before traffic hits origin	Stops DDoS at edge, no infra load

HTTP Response Headers for Rate Limiting

Always return these headers so clients can self-throttle:

HTTP/1.1 200 OK
X-RateLimit-Limit: 100         ← max requests per window
X-RateLimit-Remaining: 47      ← requests remaining in current window
X-RateLimit-Reset: 1701388860  ← Unix timestamp when window resets

HTTP/1.1 429 Too Many Requests
Retry-After: 30                ← seconds until client can retry
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0

Worked Example: Rate Limiting the URL Shortener API

Rate limit tiers for the URL Shortener:

Tier	Limit	Algorithm	Key	Reason
Anonymous reads	1,000 req/min/IP	Token Bucket	`rate:ip:<ip>`	Prevent scraping
Authenticated shortens	100/day/user	Sliding Window Counter	`rate:user:<id>:day`	Quota enforcement
API key (paid tier)	10,000/day/key	Token Bucket	`rate:key:<api_key>`	Billing tiers
Viral short code	50,000 req/min	Token Bucket	`rate:code:<code>`	Hot link protection

Interview Cheat Sheet

One-Line Summaries

Fixed Window:           Count per time window — simple, boundary attack risk
Sliding Window Log:     Exact but O(N) memory — timestamp per request
Sliding Window Counter: Approximate sliding with 2 counters — best balance
Token Bucket:           Tokens refill at rate — allows bursting, widely used
Leaky Bucket:           Smooth output rate — good for infra, bad for user APIs
Distributed RL:         Local approx + periodic Redis sync — scale across GW nodes

The Interview Phrase

"I'd implement rate limiting at the API Gateway using a Token Bucket
 algorithm backed by Redis. Token Bucket is ideal because it allows
 legitimate traffic bursts while still enforcing an average rate.
 Each user gets a Redis key holding their current token count; a Lua
 script atomically checks and decrements tokens to avoid race conditions.
 For distributed deployments across multiple gateway instances, I'd
 shard the Redis keys by user_id to distribute load. I'd return
 X-RateLimit-Remaining headers so clients can self-throttle, and
 429 Too Many Requests with a Retry-After header on rejection."

Red Flags vs. Green Flags

🔴 Red Flag	🟢 Green Flag
"Just reject requests above the limit"	Specify the algorithm and justify it
Enforce rate limits in every individual service	Centralize at API Gateway
Ignore distributed rate limiting	Address multi-instance rate limit counters
No 429 / Retry-After headers	Return proper HTTP 429 with Retry-After
Use Fixed Window without mentioning boundary attack	Know the limitation and mention Sliding Window or Token Bucket
One rate limit for all clients	Different tiers: anonymous, authenticated, paid API key

IMPORTANT

Use a Lua script in Redis for atomic token bucket operations. A non-atomic GET + SET sequence has a race condition: two requests can both read "1 token remaining" and both succeed, bypassing the limit.

TIP

Mentioning different rate limit tiers (anonymous vs. authenticated vs. paid API key) and returning proper HTTP headers (X-RateLimit-Remaining, Retry-After) shows real production experience.

Rate Limiting — Core Concepts ​

Where to Enforce Rate Limiting ​

The 5 Rate Limiting Algorithms ​

Algorithm 1 — Fixed Window Counter ​

Algorithm 2 — Sliding Window Log ​

Algorithm 3 — Sliding Window Counter ⭐ Best Balance ​

Algorithm 4 — Token Bucket ⭐ Most Commonly Used ​

Algorithm 5 — Leaky Bucket ​

Algorithm Comparison ​

Distributed Rate Limiting ​

HTTP Response Headers for Rate Limiting ​

Worked Example: Rate Limiting the URL Shortener API ​

Interview Cheat Sheet ​

One-Line Summaries ​

The Interview Phrase ​

Red Flags vs. Green Flags ​