Skip to content

💬 System Design: WhatsApp / Chat Application ​

Real-time messaging at 2 billion user scale.


Step 1: Requirements ​

Functional ​

  • 1:1 messaging
  • Group chats (up to 256 members)
  • Message delivery receipts (sent ✓, delivered ✓✓, read 🔵✓✓)
  • Media sharing (images, video, voice)
  • Last seen / online status
  • End-to-end encryption

Non-Functional ​

  • 2 billion users, 100B messages/day
  • Delivery latency < 100ms
  • High availability 99.99%
  • No message loss

Step 2: Core Protocol — WebSocket ​

txt
Why WebSocket over HTTP polling?

HTTP Polling (bad):
  Client asks every 5 seconds: "Any new messages?"
  2B users × 1 req/5sec = 400M req/sec wasted!

WebSocket (good):
  00: Persistent TCP connection
  1A: Server pushes messages instantly
  2B: users × 1 connection = maintained open connections
  3C: High-Level Architecture

2.1: Apache Kafka Workflow ​


Step 3: High-Level Architecture ​

Example: Basic WebSocket Server for Chat Connections ​

javascript
const { WebSocketServer } = require("ws");
const Redis = require("ioredis");

const wss = new WebSocketServer({ port: 8080 });
const redis = new Redis();

// Keep track of connected users locally on this server instance
const activeConnections = new Map(); // userId -> ws

wss.on("connection", async function connection(ws, req) {
  // 1. Authenticate and extract userId
  const userId = extractUserId(req);
  activeConnections.set(userId, ws);

  // 2. Update Presence in Redis (TTL-based heartbeat)
  await redis.set(`presence:${userId}`, "online", "EX", 30);

  // 3. Listen for incoming messages
  ws.on("message", function message(data) {
    const msg = JSON.parse(data);
    handleIncomingMessage(userId, msg);

    // Refresh presence heartbeat
    redis.expire(`presence:${userId}`, 30);
  });

  ws.on("close", () => {
    activeConnections.delete(userId);
    // Let Redis TTL expire, or explicitly set to offline
    redis.del(`presence:${userId}`);
  });
});

Step 4: Message Delivery Flow ​

Example: Publishing Message to Kafka ​

javascript
const { Kafka } = require("kafkajs");
const kafka = new Kafka({ clientId: "chat-server", brokers: ["kafka1:9092"] });
const producer = kafka.producer();

async function handleIncomingMessage(senderId, msgPayload) {
  const { receiverId, content, messageId } = msgPayload;

  // 1. Validate and prep the message payload
  const messageEvent = {
    messageId,
    senderId,
    receiverId,
    content, // In reality, this is an encrypted blob (Signal protocol)
    timestamp: Date.now(),
    status: "SENT",
  };

  // 2. Publish to Kafka topic
  // We partition by receiverId so all messages FOR a user go to the same partition
  await producer.send({
    topic: "chat-messages",
    messages: [
      { key: String(receiverId), value: JSON.stringify(messageEvent) },
    ],
  });

  // Note: A separate consumer writes these sequentially to Cassandra
}

Step 5: Message Delivery Receipts ​

Example: Client-side Receipt Handling ​

javascript
// On Bob's device (Browser or Mobile Client)
const ws = new WebSocket("wss://chat.whatsapp.com");

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);

  if (data.type === "NEW_MESSAGE") {
    // 1. Message arrived on device
    displayMessageInBackground(data.message);

    // 2. Immediately send DELIVERED receipt back to server
    ws.send(
      JSON.stringify({
        type: "RECEIPT",
        messageId: data.message.id,
        status: "DELIVERED",
        senderId: data.message.senderId,
      })
    );
  }
};

// When Bob actually opens the chat screen
function onChatOpened(activeChatId, unreadMessageIds) {
  unreadMessageIds.forEach((msgId) => {
    // 3. Send READ receipt
    ws.send(
      JSON.stringify({
        type: "RECEIPT",
        messageId: msgId,
        status: "READ",
        senderId: activeChatId,
      })
    );
  });
}

Step 6: Group Messages ​

Example: Group Message Receipt Processing ​

javascript
// Group Service handling incoming receipts for a group message
async function processGroupReceipt(messageId, groupId, memberId, newStatus) {
  // 1. Update this specific member's status in DB
  await updateMemberReceiptStatus(messageId, memberId, newStatus);

  // 2. Fetch all members' statuses for this message
  // Example: { "member1": "READ", "member2": "DELIVERED", ... }
  const allStatuses = await getMessageStatuses(messageId);
  const totalMembers = Object.keys(allStatuses).length; // e.g., 256

  // 3. Check aggregate status
  const readCount = Object.values(allStatuses).filter(
    (s) => s === "READ"
  ).length;
  const deliveredCount = Object.values(allStatuses).filter(
    (s) => s === "DELIVERED" || s === "READ"
  ).length;

  // 4. If EVERYONE has hit the milestone, emit an update to the sender
  if (readCount === totalMembers) {
    emitGroupStatusToSender(messageId, "READ"); // Upgrade to 🔵✓✓
  } else if (deliveredCount === totalMembers) {
    emitGroupStatusToSender(messageId, "DELIVERED"); // Upgrade to ✓✓
  }
}

Step 7: Database Schema ​

sql
-- Messages (Cassandra — append-only, high write)
CREATE TABLE messages (
  chat_id     UUID,
  message_id  TIMEUUID,  -- sortable by time
  sender_id   BIGINT,
  content     TEXT,
  media_url   TEXT,
  status      TINYINT,  -- 1=sent, 2=delivered, 3=read
  created_at  TIMESTAMP,
  PRIMARY KEY (chat_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);

-- User presence (Redis — fast TTL-based)
SET presence:{user_id} "online" EX 30  -- Expires in 30s if no heartbeat

-- Push tokens (MySQL)
CREATE TABLE push_tokens (
  user_id   BIGINT,
  token     VARCHAR(255),
  platform  ENUM('ios', 'android'),
  PRIMARY KEY (user_id)
);

Step 8: End-to-End Encryption ​

txt
WhatsApp uses Signal Protocol:

Key Exchange:
  Alice and Bob exchange public keys via server
  Server NEVER sees private keys

Encryption:
  Alice encrypts message with Bob's public key
  Only Bob's private key can decrypt

Double Ratchet Algorithm:
  New encryption key generated for each message
  Even if one key is compromised, past/future messages safe

Result: Server stores ENCRYPTED blobs — cannot read messages

📊 Summary ​

ComponentTechnology
Client-ServerWebSocket (persistent)
Message StorageCassandra (chat_id as partition key)
Message QueueKafka
PresenceRedis (TTL-based)
Push NotificationsAPNS (iOS), FCM (Android)
Media StorageS3 + CDN
EncryptionSignal Protocol (E2E)

Key insight: The hardest parts are maintaining billions of WebSocket connections and the group message fan-out. WhatsApp famously handled 1 million concurrent connections on a single Erlang-based server.

Released under the ISC License.