Skip to content

Amazon E-commerce System Design: Scaling to Millions of SKUs

Amazon is the gold standard for global e-commerce. Its architecture is built on a massive web of microservices that handle everything from product search to global logistics.


1. Requirements

Functional

  • Product Search: Users can find products by keywords, categories, or filters.
  • Shopping Cart: Users can add/remove items and persist them across sessions.
  • Checkout & Payments: Securely process orders and payments.
  • Inventory Management: Track stock levels across thousands of warehouses.
  • Order Tracking: Real-time status updates of shipments.

Non-Functional

  • High Availability: The "Buy" button must always work.
  • Partition Tolerance: The system must handle network failures across global regions.
  • Consistency vs Availability: Choosing Eventual Consistency for the catalog but Strong Consistency for inventory and payments.
  • Low Latency: Page loads must be < 100ms for a good UX.

2. High-Level Architecture

Amazon uses a "Service-Oriented Architecture" (SOA) where each team owns their data and their service.


3. Technical Deep Dives

A. Shopping Cart: The Write-Heavy Challenge

The shopping cart is a high-availability component. Amazon often uses DynamoDB (or similar NoSQL) because:

  • Scalability: It handles millions of concurrent "add to cart" operations without locking.
  • Key-Value Nature: A cart is essentially user_id -> list of items.
  • Availability: It prioritizes being "always up," even if a cart update is slightly delayed in syncing across regions.

B. Distributed Transactions: The Saga Pattern

In a microservice world, placing an order involves multiple databases (Payment, Inventory, Shipping). We cannot use a "2-Phase Commit" (2PC) at this scale because it creates bottlenecks. Instead, we use a Saga.

The Workflow:

  1. Order Service creates a "Pending" order.
  2. Payment Service reserves the funds. (If it fails, the Saga aborts).
  3. Inventory Service "locks" the stock. (If it fails, Payment is refunded).
  4. Shipping Service schedules delivery.
  5. Order Service marks the order as "Completed."

C. Inventory Management & Race Conditions

When 1,000 people try to buy the last 5 items (e.g., during Prime Day), we face a race condition.

  • Solution: We use Distributed Locking (Redis/Zookeeper) or Database Constraints to ensure stock never goes below zero.
  • Optimization: We can "soft-reserve" an item for 15 minutes once it's in the checkout flow to prevent users from seeing "Out of Stock" at the final payment step.

4. Implementation Example: Order Saga Orchestrator

This code demonstrates how an Order Orchestrator manages the distributed steps of a checkout.

typescript
type OrderStatus = "PENDING" | "PAID" | "STOCKED" | "SHIPPED" | "FAILED";

class OrderOrchestrator {
  async placeOrder(orderData: any) {
    const orderId = await this.orderService.create(orderData);

    try {
      // Step 1: Payment
      await this.paymentService.charge(orderId, orderData.amount);
      console.log("Payment Successful");

      // Step 2: Inventory
      await this.inventoryService.reserve(orderId, orderData.items);
      console.log("Inventory Reserved");

      // Step 3: Shipping
      await this.shippingService.schedule(orderId);
      console.log("Shipping Scheduled");

      await this.orderService.updateStatus(orderId, "COMPLETED");
    } catch (error) {
      console.error("Order Failed, initiating Compensation Logic...");
      await this.compensate(orderId, orderData);
    }
  }

  /**
   * Compensation Logic (The 'Undo' buttons)
   */
  private async compensate(orderId: string, orderData: any) {
    // If payment was made but inventory failed, refund payment
    await this.paymentService.refund(orderId);
    // If inventory was reserved but shipping failed, release stock
    await this.inventoryService.release(orderId, orderData.items);

    await this.orderService.updateStatus(orderId, "FAILED");
  }
}

5. Summary: Key Architecture Trade-offs

ComponentChoiceRationale
CatalogElasticSearchBest for multi-faceted search (price range, brand, color).
TransactionsSaga PatternDecouples services and avoids long-lived locks, ensuring high scale.
InventoryStrong ConsistencyWe cannot "eventually" find out we sold more items than we have.
CommunicationEvent-Driven (Kafka)Decouples order placement from non-critical tasks like notifications or analytics.

Released under the ISC License.