Skip to content

🔗 System Design: URL Shortener (like bit.ly) ​

Classic beginner system design. Master this first.


Step 1: Clarify Requirements ​

Functional Requirements ​

  • Given a long URL, generate a short URL
  • Given a short URL, redirect to the original URL
  • URLs expire after X days (optional)
  • Custom aliases (e.g., short.ly/my-brand)

Non-Functional Requirements ​

  • High availability (99.99% uptime)
  • Low latency redirects (< 10ms)
  • Scale: 100M URLs created/day, 10B redirects/day

Step 2: Estimate Scale ​

text
Write (URL creation):
  100M URLs/day = 100M / 86,400 = ~1,160 writes/sec

Read (URL redirect):
  10B redirects/day = ~116,000 reads/sec
  Read:Write ratio = 100:1

Storage:
  Each URL record: ~500 bytes
  100M/day × 365 × 5 years = ~182 billion records
  182B × 500 bytes = ~91 TB

Bandwidth:
  Reads: 116,000/sec × 500 bytes = ~58 MB/s

Step 3: High-Level Design ​

Example: URL Creation (Express.js) ​

javascript
const express = require("express");
const app = express();
app.use(express.json());

// Mock database and ID generator
let nextId = 12345678;
const db = new Map();

// Base62 Encoding function
const ALPHABET =
  "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
function encodeBase62(num) {
  if (num === 0) return ALPHABET[0];
  let res = "";
  while (num > 0) {
    res = ALPHABET[num % 62] + res;
    num = Math.floor(num / 62);
  }
  return res;
}

app.post("/shorten", async (req, res) => {
  const { longUrl } = req.body;
  if (!longUrl) return res.status(400).json({ error: "longUrl is required" });

  // 1. Get unique auto-incremented ID
  const id = nextId++;

  // 2. Convert ID to Base62 shortcode
  const shortCode = encodeBase62(id);

  // 3. Save to database
  db.set(id, { shortCode, longUrl, createdAt: new Date() });

  res.json({ shortUrl: `http://short.ly/${shortCode}`, longUrl });
});

Step 4: URL Shortening Algorithm ​

Option A: Hash + Truncate ​

text
long_url = "https://www.example.com/very/long/path"
hash = MD5(long_url) = "1a2b3c4d5e6f..."
short_code = base62(hash[:8]) = "dX3kR9mQ"

Problem: Hash collisions! Two different URLs could produce same short code.

text
DB auto-increment: id = 12345678
base62(12345678) = "5Fgk2"

Characters: [0-9, a-z, A-Z] = 62 characters
7-char code: 62^7 = 3.5 trillion unique URLs

Option C: Snowflake ID (Distributed) ​

text
64-bit ID = [timestamp 41 bits][machine_id 10 bits][sequence 12 bits]

Machine ID ensures no collision across distributed ID generators

Step 5: Database Schema ​

sql
CREATE TABLE urls (
  id           BIGINT PRIMARY KEY AUTO_INCREMENT,
  short_code   VARCHAR(10) UNIQUE NOT NULL,
  long_url     TEXT NOT NULL,
  user_id      BIGINT,
  created_at   TIMESTAMP DEFAULT NOW(),
  expires_at   TIMESTAMP,
  click_count  BIGINT DEFAULT 0
);

CREATE INDEX idx_short_code ON urls(short_code);

Step 6: Redirect Flow ​

Example: Redirect Logic (Express.js & Redis) ​

javascript
const Redis = require("ioredis");
const redis = new Redis();

app.get("/:shortCode", async (req, res) => {
  const { shortCode } = req.params;

  // 1. Check Redis Cache
  const cachedUrl = await redis.get(`url:${shortCode}`);

  if (cachedUrl) {
    // Cache HIT
    // Use 302 (Found) if we want to track every click in Analytics
    // Use 301 (Moved Permanently) to let browser cache it and offload server
    return res.redirect(302, cachedUrl);
  }

  // 2. Cache MISS: Query Database
  const dbRecord = await getUrlFromDb(shortCode); // Mock DB call

  if (!dbRecord) {
    return res.status(404).json({ error: "URL not found" });
  }

  // 3. Save to Redis for next time (e.g., expire in 24 hours)
  await redis.setex(`url:${shortCode}`, 86400, dbRecord.longUrl);

  // 4. Redirect
  res.redirect(302, dbRecord.longUrl);
});

Note on HTTP 301 vs 302:

  • 301 (Permanent): Browser caches it, less server load, can't track clicks.
  • 302 (Temporary): No browser cache, every visit hits server, can track analytics.

Step 7: Full Architecture ​

Example: Async Analytics Logging via Kafka ​

javascript
const { Kafka } = require("kafkajs");
const kafka = new Kafka({
  clientId: "url-shortener",
  brokers: ["localhost:9092"],
});
const producer = kafka.producer();

// Inside the GET /:shortCode route (after finding the URL):
async function logClickEvent(shortCode, req) {
  try {
    await producer.send({
      topic: "url-clicks",
      messages: [
        {
          key: shortCode,
          value: JSON.stringify({
            shortCode,
            timestamp: Date.now(),
            ip: req.ip,
            userAgent: req.headers["user-agent"],
          }),
        },
      ],
    });
  } catch (err) {
    // Log error, but DO NOT fail the redirect if analytics fails
    console.error("Failed to log analytics:", err);
  }
}

Step 8: Trade-offs & Edge Cases ​

ConcernSolution
Spam/malicious URLsURL scanning on creation
Same URL submitted twiceCheck existing short code (idempotency)
URL expirationBackground job deletes expired entries
High read loadRedis cache layer
ID generation at scaleDistributed Snowflake IDs
AnalyticsAsync write to Kafka, consumer updates counts

📊 Summary ​

text
Scale: 100M creates/day, 10B redirects/day
Core: Base62(auto-increment ID) for short codes
Cache: Redis (hot URLs) with 99%+ cache hit ratio
DB: MySQL (sharded by short_code range if needed)
Redirect: HTTP 302 for analytics, 301 for performance

Released under the ISC License.