🔗 System Design: URL Shortener (like bit.ly) ​
Classic beginner system design. Master this first.
Step 1: Clarify Requirements ​
Functional Requirements ​
- Given a long URL, generate a short URL
- Given a short URL, redirect to the original URL
- URLs expire after X days (optional)
- Custom aliases (e.g.,
short.ly/my-brand)
Non-Functional Requirements ​
- High availability (99.99% uptime)
- Low latency redirects (< 10ms)
- Scale: 100M URLs created/day, 10B redirects/day
Step 2: Estimate Scale ​
text
Write (URL creation):
100M URLs/day = 100M / 86,400 = ~1,160 writes/sec
Read (URL redirect):
10B redirects/day = ~116,000 reads/sec
Read:Write ratio = 100:1
Storage:
Each URL record: ~500 bytes
100M/day × 365 × 5 years = ~182 billion records
182B × 500 bytes = ~91 TB
Bandwidth:
Reads: 116,000/sec × 500 bytes = ~58 MB/sStep 3: High-Level Design ​
Example: URL Creation (Express.js) ​
javascript
const express = require("express");
const app = express();
app.use(express.json());
// Mock database and ID generator
let nextId = 12345678;
const db = new Map();
// Base62 Encoding function
const ALPHABET =
"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
function encodeBase62(num) {
if (num === 0) return ALPHABET[0];
let res = "";
while (num > 0) {
res = ALPHABET[num % 62] + res;
num = Math.floor(num / 62);
}
return res;
}
app.post("/shorten", async (req, res) => {
const { longUrl } = req.body;
if (!longUrl) return res.status(400).json({ error: "longUrl is required" });
// 1. Get unique auto-incremented ID
const id = nextId++;
// 2. Convert ID to Base62 shortcode
const shortCode = encodeBase62(id);
// 3. Save to database
db.set(id, { shortCode, longUrl, createdAt: new Date() });
res.json({ shortUrl: `http://short.ly/${shortCode}`, longUrl });
});Step 4: URL Shortening Algorithm ​
Option A: Hash + Truncate ​
text
long_url = "https://www.example.com/very/long/path"
hash = MD5(long_url) = "1a2b3c4d5e6f..."
short_code = base62(hash[:8]) = "dX3kR9mQ"Problem: Hash collisions! Two different URLs could produce same short code.
Option B: Auto-increment ID + Base62 Encode ✅ Recommended ​
text
DB auto-increment: id = 12345678
base62(12345678) = "5Fgk2"
Characters: [0-9, a-z, A-Z] = 62 characters
7-char code: 62^7 = 3.5 trillion unique URLsOption C: Snowflake ID (Distributed) ​
text
64-bit ID = [timestamp 41 bits][machine_id 10 bits][sequence 12 bits]
Machine ID ensures no collision across distributed ID generatorsStep 5: Database Schema ​
sql
CREATE TABLE urls (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
short_code VARCHAR(10) UNIQUE NOT NULL,
long_url TEXT NOT NULL,
user_id BIGINT,
created_at TIMESTAMP DEFAULT NOW(),
expires_at TIMESTAMP,
click_count BIGINT DEFAULT 0
);
CREATE INDEX idx_short_code ON urls(short_code);Step 6: Redirect Flow ​
Example: Redirect Logic (Express.js & Redis) ​
javascript
const Redis = require("ioredis");
const redis = new Redis();
app.get("/:shortCode", async (req, res) => {
const { shortCode } = req.params;
// 1. Check Redis Cache
const cachedUrl = await redis.get(`url:${shortCode}`);
if (cachedUrl) {
// Cache HIT
// Use 302 (Found) if we want to track every click in Analytics
// Use 301 (Moved Permanently) to let browser cache it and offload server
return res.redirect(302, cachedUrl);
}
// 2. Cache MISS: Query Database
const dbRecord = await getUrlFromDb(shortCode); // Mock DB call
if (!dbRecord) {
return res.status(404).json({ error: "URL not found" });
}
// 3. Save to Redis for next time (e.g., expire in 24 hours)
await redis.setex(`url:${shortCode}`, 86400, dbRecord.longUrl);
// 4. Redirect
res.redirect(302, dbRecord.longUrl);
});Note on HTTP 301 vs 302:
301 (Permanent): Browser caches it, less server load, can't track clicks.302 (Temporary): No browser cache, every visit hits server, can track analytics.
Step 7: Full Architecture ​
Example: Async Analytics Logging via Kafka ​
javascript
const { Kafka } = require("kafkajs");
const kafka = new Kafka({
clientId: "url-shortener",
brokers: ["localhost:9092"],
});
const producer = kafka.producer();
// Inside the GET /:shortCode route (after finding the URL):
async function logClickEvent(shortCode, req) {
try {
await producer.send({
topic: "url-clicks",
messages: [
{
key: shortCode,
value: JSON.stringify({
shortCode,
timestamp: Date.now(),
ip: req.ip,
userAgent: req.headers["user-agent"],
}),
},
],
});
} catch (err) {
// Log error, but DO NOT fail the redirect if analytics fails
console.error("Failed to log analytics:", err);
}
}Step 8: Trade-offs & Edge Cases ​
| Concern | Solution |
|---|---|
| Spam/malicious URLs | URL scanning on creation |
| Same URL submitted twice | Check existing short code (idempotency) |
| URL expiration | Background job deletes expired entries |
| High read load | Redis cache layer |
| ID generation at scale | Distributed Snowflake IDs |
| Analytics | Async write to Kafka, consumer updates counts |
📊 Summary ​
text
Scale: 100M creates/day, 10B redirects/day
Core: Base62(auto-increment ID) for short codes
Cache: Redis (hot URLs) with 99%+ cache hit ratio
DB: MySQL (sharded by short_code range if needed)
Redirect: HTTP 302 for analytics, 301 for performance