Skip to content

🏛️ System Design Interview Patterns

In a system design interview, you are often asked how to handle failures, distributed consistency, and infrastructure decoupling. These architectural patterns provide standardized solutions to these macro-level problems.


1. Circuit Breaker Pattern

Goal: Prevent a failure in one service from cascading and bringing down the entire system.

🔌 The Concept

Like an electrical circuit breaker, this pattern "trips" when a service starts failing. Instead of wasting resources calling a dead service, the caller gets an immediate error (or fallback), allowing the failing service time to recover.

📊 Diagram: State Machine

💻 Code Example (Node.js)

javascript
class CircuitBreaker {
  constructor(service, threshold, timeout) {
    this.service = service;
    this.threshold = threshold; // Max failures before tripping
    this.timeout = timeout; // Time to wait before half-open
    this.failures = 0;
    this.state = "CLOSED";
    this.lastFailureTime = null;
  }

  async call(args) {
    if (this.state === "OPEN") {
      if (Date.now() - this.lastFailureTime > this.timeout) {
        this.state = "HALF-OPEN";
      } else {
        throw new Error("Circuit is OPEN (Service Unavailable)");
      }
    }

    try {
      const result = await this.service(args);
      this.reset();
      return result;
    } catch (err) {
      this.onFailure();
      throw err;
    }
  }

  onFailure() {
    this.failures++;
    if (this.failures >= this.threshold) {
      this.state = "OPEN";
      this.lastFailureTime = Date.now();
    }
  }

  reset() {
    this.failures = 0;
    this.state = "CLOSED";
  }
}

2. Saga Pattern

Goal: Manage distributed transactions across multiple microservices without using slow 2-Phase Commit (2PC).

🔄 The Concept

A Saga is a sequence of local transactions. If one local transaction fails, the Saga executes a series of compensating transactions to undo the changes made by previous steps.

📊 Diagram: Orchestration vs Choreography

💻 Code Example (Simplified Orchestrator)

javascript
async function createOrderSaga(orderData) {
  const steps = [
    {
      action: () => orderService.create(orderData),
      undo: (id) => orderService.cancel(id),
    },
    {
      action: () => paymentService.charge(orderData.amount),
      undo: () => paymentService.refund(orderData.amount),
    },
    {
      action: () => shippingService.ship(orderData),
      undo: () => shippingService.cancelShipment(orderData),
    },
  ];

  const completedSteps = [];
  try {
    for (const step of steps) {
      const result = await step.action();
      completedSteps.push({ step, result });
    }
  } catch (err) {
    console.error("Saga failed, starting compensation...");
    // Rollback in reverse order
    for (const { step, result } of completedSteps.reverse()) {
      await step.undo(result?.id);
    }
    throw new Error("Transaction Failed and Rolled Back");
  }
}

3. Retry Pattern with Exponential Backoff

Goal: Handle transient failures (network blips, temporary service overload) by retrying with increasing delays.

📊 Diagram: Backoff Strategy

💻 Code Example

javascript
async function retryRequest(fn, maxRetries = 3, baseDelay = 100) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (err) {
      if (i === maxRetries - 1) throw err;

      const delay = baseDelay * Math.pow(2, i); // 100, 200, 400ms...
      console.log(`Retry ${i + 1} failed. Waiting ${delay}ms...`);
      await new Promise((resolve) => setTimeout(resolve, delay));
    }
  }
}

4. Sidecar Pattern

Goal: Offload common infrastructure tasks (logging, monitoring, security, proxying) to a separate container/process.

📊 Diagram: Application + Sidecar

Interview Use Case: When asked how to add service-mesh features (like mTLS or observability) to an old legacy application without changing its code.


5. CQRS (Command Query Responsibility Segregation)

Goal: Separate the data model for writing (commands) from the data model for reading (queries).

📖 The Concept

In many systems, the read and write workloads are vastly different. CQRS allows you to optimize them independently. For example, use a relational DB for writes and an optimized search index (Elasticsearch) or a denormalized cache (Redis) for reads.

📊 Diagram: Segregated Paths


6. Bulkhead Pattern

Goal: Isolate resources to prevent a failure in one area from exhausting all resources (like threads or memory).

🚢 The Concept

Named after the partitions in a ship's hull. If one section of the ship is breached, the bulkheads prevent the water from flooding the entire ship. In system design, you might use separate thread pools or separate service instances for different types of requests.

📊 Diagram: Resource Isolation


💡 Interview Cheat Sheet

PatternUse When...Interview Keyphrase
Circuit BreakerA service is slow or down"Prevent cascading failure"
SagaAtomic action spans multiple DBs"Distributed transactions / Eventual consistency"
RetryNetwork is flaky"Handle transient errors with exponential backoff"
SidecarNeed logging/auth without changing code"Offload cross-cutting concerns"
BulkheadOne slow API shouldn't block others"Resource isolation"
CQRSRead and Write loads are asymmetric"Optimize reads vs writes independently"

⬅️ Previous: GoF Design Patterns

Released under the ISC License.