Skip to content

Rate Limiting — Core Concepts

Interview Relevance: Very High — "How would you prevent API abuse?" appears in nearly every FAANG system design or deep-dive round. Know the algorithms and where to enforce them.

Rate limiting controls how many requests a client can make in a given time window, protecting your system from:

  • 🚨 DDoS attacks — malicious flooding
  • 🤖 Bot scraping — automated abuse
  • 🐛 Runaway clients — buggy code in a loop
  • 💰 Cost overruns — unbounded API usage in pay-per-use models

Where to Enforce Rate Limiting

Enforce at the API Gateway for most systems. It centralizes logic, keeps services clean, and works across all clients.


The 5 Rate Limiting Algorithms

Algorithm 1 — Fixed Window Counter

Divide time into fixed windows. Count requests per window. Reject when over the limit.

✅ Simple — just an integer counter per user per window
✅ Very fast — single Redis INCR + EXPIRE
❌ Boundary attack: burst 2× the limit at window edges

Redis implementation:

INCR rate:user:42:1701388800   ← key = user + unix minute timestamp
EXPIRE rate:user:42:1701388800 60
IF count > limit THEN reject

Algorithm 2 — Sliding Window Log

Store a timestamp for every request. Count how many timestamps fall in the last N seconds.

✅ Perfectly accurate — no boundary attack
❌ High memory: must store every request timestamp
❌ O(N) cleanup: must remove old timestamps per request
Use: When precision is critical and traffic is low

Redis implementation:

ZADD rate:user:42 <timestamp> <request_id>
ZREMRANGEBYSCORE rate:user:42 0 <now - 60s>
count = ZCARD rate:user:42
EXPIRE rate:user:42 60

Algorithm 3 — Sliding Window Counter ⭐ Best Balance

Hybrid approach: uses fixed windows but approximates a sliding window using the previous window's count.

✅ Approximates sliding window accuracy
✅ Only 2 counters per user (prev + curr window)
✅ Much lower memory than Sliding Window Log
❌ Approximation (not 100% exact — but good enough)
Best for: Most production rate limiters (Cloudflare, Nginx use this)

Algorithm 4 — Token Bucket ⭐ Most Commonly Used

A bucket holds tokens. Each request consumes a token. Tokens refill at a fixed rate.

✅ Allows bursting (use saved-up tokens for traffic spikes)
✅ Smooth refill — doesn't allow sudden boundary attacks
✅ Simple to implement with Redis
✅ Used by AWS API Gateway, Stripe, Twitter API
❌ Harder to reason about exact request counts per window

Token Bucket with Redis:

lua
-- Lua script (atomic in Redis)
local tokens = redis.call('GET', key) or capacity
local now = tonumber(ARGV[1])
local last = redis.call('GET', key..':last') or now
local delta = (now - last) * refill_rate
tokens = math.min(capacity, tokens + delta)
if tokens >= 1 then
    tokens = tokens - 1
    redis.call('SET', key, tokens)
    return 1  -- allow
else
    return 0  -- reject
end

Algorithm 5 — Leaky Bucket

Requests enter a queue (bucket). They are processed at a constant rate, regardless of how fast they arrive. Excess requests overflow and are dropped.

✅ Guarantees smooth, predictable output rate to the server
✅ Prevents any burst from overwhelming downstream
❌ Bursty legitimate traffic gets queued or dropped
❌ Not great for user-facing APIs where bursts are expected
Best for: Rate-smoothing at infrastructure level (network packets, DB writes)

Algorithm Comparison

AlgorithmMemoryAccuracyBurst SupportComplexityBest For
Fixed Window✅ O(1)❌ Boundary attack❌ No✅ TrivialSimple internal limits
Sliding Window Log❌ O(N)✅ Exact✅ NaturalMediumLow-traffic, precise billing
Sliding Window Counter✅ O(1)✅ ~Exact⚠️ LimitedLowMost production APIs
Token Bucket✅ O(1)✅ Good✅ YesMediumUser-facing APIs, SDKs
Leaky BucketO(queue)✅ Good❌ Drops burstsMediumTraffic shaping, infrastructure

Distributed Rate Limiting

A single Redis node is a bottleneck for rate limiting at scale. Here's how to distribute it.

Strategies for distributed rate limiting:

StrategyHowTrade-off
Centralized RedisAll GW nodes share one Redis counterAccurate, but Redis is bottleneck + SPOF
Redis ClusterShard counters across Redis nodesHigh availability, consistent
Local approximationEach node tracks limit / N, syncs periodicallySlightly inaccurate, very fast
Sliding window at edgeCDN (Cloudflare) handles before traffic hits originStops DDoS at edge, no infra load

HTTP Response Headers for Rate Limiting

Always return these headers so clients can self-throttle:

HTTP/1.1 200 OK
X-RateLimit-Limit: 100         ← max requests per window
X-RateLimit-Remaining: 47      ← requests remaining in current window
X-RateLimit-Reset: 1701388860  ← Unix timestamp when window resets

HTTP/1.1 429 Too Many Requests
Retry-After: 30                ← seconds until client can retry
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0

Worked Example: Rate Limiting the URL Shortener API

Rate limit tiers for the URL Shortener:

TierLimitAlgorithmKeyReason
Anonymous reads1,000 req/min/IPToken Bucketrate:ip:<ip>Prevent scraping
Authenticated shortens100/day/userSliding Window Counterrate:user:<id>:dayQuota enforcement
API key (paid tier)10,000/day/keyToken Bucketrate:key:<api_key>Billing tiers
Viral short code50,000 req/minToken Bucketrate:code:<code>Hot link protection

Interview Cheat Sheet

One-Line Summaries

Fixed Window:           Count per time window — simple, boundary attack risk
Sliding Window Log:     Exact but O(N) memory — timestamp per request
Sliding Window Counter: Approximate sliding with 2 counters — best balance
Token Bucket:           Tokens refill at rate — allows bursting, widely used
Leaky Bucket:           Smooth output rate — good for infra, bad for user APIs
Distributed RL:         Local approx + periodic Redis sync — scale across GW nodes

The Interview Phrase

"I'd implement rate limiting at the API Gateway using a Token Bucket
 algorithm backed by Redis. Token Bucket is ideal because it allows
 legitimate traffic bursts while still enforcing an average rate.
 Each user gets a Redis key holding their current token count; a Lua
 script atomically checks and decrements tokens to avoid race conditions.
 For distributed deployments across multiple gateway instances, I'd
 shard the Redis keys by user_id to distribute load. I'd return
 X-RateLimit-Remaining headers so clients can self-throttle, and
 429 Too Many Requests with a Retry-After header on rejection."

Red Flags vs. Green Flags

🔴 Red Flag🟢 Green Flag
"Just reject requests above the limit"Specify the algorithm and justify it
Enforce rate limits in every individual serviceCentralize at API Gateway
Ignore distributed rate limitingAddress multi-instance rate limit counters
No 429 / Retry-After headersReturn proper HTTP 429 with Retry-After
Use Fixed Window without mentioning boundary attackKnow the limitation and mention Sliding Window or Token Bucket
One rate limit for all clientsDifferent tiers: anonymous, authenticated, paid API key

IMPORTANT

Use a Lua script in Redis for atomic token bucket operations. A non-atomic GET + SET sequence has a race condition: two requests can both read "1 token remaining" and both succeed, bypassing the limit.

TIP

Mentioning different rate limit tiers (anonymous vs. authenticated vs. paid API key) and returning proper HTTP headers (X-RateLimit-Remaining, Retry-After) shows real production experience.

Released under the ISC License.