🏛️ System Design Interview Patterns
In a system design interview, you are often asked how to handle failures, distributed consistency, and infrastructure decoupling. These architectural patterns provide standardized solutions to these macro-level problems.
1. Circuit Breaker Pattern
Goal: Prevent a failure in one service from cascading and bringing down the entire system.
🔌 The Concept
Like an electrical circuit breaker, this pattern "trips" when a service starts failing. Instead of wasting resources calling a dead service, the caller gets an immediate error (or fallback), allowing the failing service time to recover.
📊 Diagram: State Machine
💻 Code Example (Node.js)
class CircuitBreaker {
constructor(service, threshold, timeout) {
this.service = service;
this.threshold = threshold; // Max failures before tripping
this.timeout = timeout; // Time to wait before half-open
this.failures = 0;
this.state = "CLOSED";
this.lastFailureTime = null;
}
async call(args) {
if (this.state === "OPEN") {
if (Date.now() - this.lastFailureTime > this.timeout) {
this.state = "HALF-OPEN";
} else {
throw new Error("Circuit is OPEN (Service Unavailable)");
}
}
try {
const result = await this.service(args);
this.reset();
return result;
} catch (err) {
this.onFailure();
throw err;
}
}
onFailure() {
this.failures++;
if (this.failures >= this.threshold) {
this.state = "OPEN";
this.lastFailureTime = Date.now();
}
}
reset() {
this.failures = 0;
this.state = "CLOSED";
}
}2. Saga Pattern
Goal: Manage distributed transactions across multiple microservices without using slow 2-Phase Commit (2PC).
🔄 The Concept
A Saga is a sequence of local transactions. If one local transaction fails, the Saga executes a series of compensating transactions to undo the changes made by previous steps.
📊 Diagram: Orchestration vs Choreography
💻 Code Example (Simplified Orchestrator)
async function createOrderSaga(orderData) {
const steps = [
{
action: () => orderService.create(orderData),
undo: (id) => orderService.cancel(id),
},
{
action: () => paymentService.charge(orderData.amount),
undo: () => paymentService.refund(orderData.amount),
},
{
action: () => shippingService.ship(orderData),
undo: () => shippingService.cancelShipment(orderData),
},
];
const completedSteps = [];
try {
for (const step of steps) {
const result = await step.action();
completedSteps.push({ step, result });
}
} catch (err) {
console.error("Saga failed, starting compensation...");
// Rollback in reverse order
for (const { step, result } of completedSteps.reverse()) {
await step.undo(result?.id);
}
throw new Error("Transaction Failed and Rolled Back");
}
}3. Retry Pattern with Exponential Backoff
Goal: Handle transient failures (network blips, temporary service overload) by retrying with increasing delays.
📊 Diagram: Backoff Strategy
💻 Code Example
async function retryRequest(fn, maxRetries = 3, baseDelay = 100) {
for (let i = 0; i < maxRetries; i++) {
try {
return await fn();
} catch (err) {
if (i === maxRetries - 1) throw err;
const delay = baseDelay * Math.pow(2, i); // 100, 200, 400ms...
console.log(`Retry ${i + 1} failed. Waiting ${delay}ms...`);
await new Promise((resolve) => setTimeout(resolve, delay));
}
}
}4. Sidecar Pattern
Goal: Offload common infrastructure tasks (logging, monitoring, security, proxying) to a separate container/process.
📊 Diagram: Application + Sidecar
Interview Use Case: When asked how to add service-mesh features (like mTLS or observability) to an old legacy application without changing its code.
5. CQRS (Command Query Responsibility Segregation)
Goal: Separate the data model for writing (commands) from the data model for reading (queries).
📖 The Concept
In many systems, the read and write workloads are vastly different. CQRS allows you to optimize them independently. For example, use a relational DB for writes and an optimized search index (Elasticsearch) or a denormalized cache (Redis) for reads.
📊 Diagram: Segregated Paths
6. Bulkhead Pattern
Goal: Isolate resources to prevent a failure in one area from exhausting all resources (like threads or memory).
🚢 The Concept
Named after the partitions in a ship's hull. If one section of the ship is breached, the bulkheads prevent the water from flooding the entire ship. In system design, you might use separate thread pools or separate service instances for different types of requests.
📊 Diagram: Resource Isolation
💡 Interview Cheat Sheet
| Pattern | Use When... | Interview Keyphrase |
|---|---|---|
| Circuit Breaker | A service is slow or down | "Prevent cascading failure" |
| Saga | Atomic action spans multiple DBs | "Distributed transactions / Eventual consistency" |
| Retry | Network is flaky | "Handle transient errors with exponential backoff" |
| Sidecar | Need logging/auth without changing code | "Offload cross-cutting concerns" |
| Bulkhead | One slow API shouldn't block others | "Resource isolation" |
| CQRS | Read and Write loads are asymmetric | "Optimize reads vs writes independently" |
⬅️ Previous: GoF Design Patterns
