Amazon E-commerce System Design: Scaling to Millions of SKUs
Amazon is the gold standard for global e-commerce. Its architecture is built on a massive web of microservices that handle everything from product search to global logistics.
1. Requirements
Functional
- Product Search: Users can find products by keywords, categories, or filters.
- Shopping Cart: Users can add/remove items and persist them across sessions.
- Checkout & Payments: Securely process orders and payments.
- Inventory Management: Track stock levels across thousands of warehouses.
- Order Tracking: Real-time status updates of shipments.
Non-Functional
- High Availability: The "Buy" button must always work.
- Partition Tolerance: The system must handle network failures across global regions.
- Consistency vs Availability: Choosing Eventual Consistency for the catalog but Strong Consistency for inventory and payments.
- Low Latency: Page loads must be < 100ms for a good UX.
2. High-Level Architecture
Amazon uses a "Service-Oriented Architecture" (SOA) where each team owns their data and their service.
3. Technical Deep Dives
A. Shopping Cart: The Write-Heavy Challenge
The shopping cart is a high-availability component. Amazon often uses DynamoDB (or similar NoSQL) because:
- Scalability: It handles millions of concurrent "add to cart" operations without locking.
- Key-Value Nature: A cart is essentially
user_id -> list of items. - Availability: It prioritizes being "always up," even if a cart update is slightly delayed in syncing across regions.
B. Distributed Transactions: The Saga Pattern
In a microservice world, placing an order involves multiple databases (Payment, Inventory, Shipping). We cannot use a "2-Phase Commit" (2PC) at this scale because it creates bottlenecks. Instead, we use a Saga.
The Workflow:
- Order Service creates a "Pending" order.
- Payment Service reserves the funds. (If it fails, the Saga aborts).
- Inventory Service "locks" the stock. (If it fails, Payment is refunded).
- Shipping Service schedules delivery.
- Order Service marks the order as "Completed."
C. Inventory Management & Race Conditions
When 1,000 people try to buy the last 5 items (e.g., during Prime Day), we face a race condition.
- Solution: We use Distributed Locking (Redis/Zookeeper) or Database Constraints to ensure stock never goes below zero.
- Optimization: We can "soft-reserve" an item for 15 minutes once it's in the checkout flow to prevent users from seeing "Out of Stock" at the final payment step.
4. Implementation Example: Order Saga Orchestrator
This code demonstrates how an Order Orchestrator manages the distributed steps of a checkout.
type OrderStatus = "PENDING" | "PAID" | "STOCKED" | "SHIPPED" | "FAILED";
class OrderOrchestrator {
async placeOrder(orderData: any) {
const orderId = await this.orderService.create(orderData);
try {
// Step 1: Payment
await this.paymentService.charge(orderId, orderData.amount);
console.log("Payment Successful");
// Step 2: Inventory
await this.inventoryService.reserve(orderId, orderData.items);
console.log("Inventory Reserved");
// Step 3: Shipping
await this.shippingService.schedule(orderId);
console.log("Shipping Scheduled");
await this.orderService.updateStatus(orderId, "COMPLETED");
} catch (error) {
console.error("Order Failed, initiating Compensation Logic...");
await this.compensate(orderId, orderData);
}
}
/**
* Compensation Logic (The 'Undo' buttons)
*/
private async compensate(orderId: string, orderData: any) {
// If payment was made but inventory failed, refund payment
await this.paymentService.refund(orderId);
// If inventory was reserved but shipping failed, release stock
await this.inventoryService.release(orderId, orderData.items);
await this.orderService.updateStatus(orderId, "FAILED");
}
}5. Summary: Key Architecture Trade-offs
| Component | Choice | Rationale |
|---|---|---|
| Catalog | ElasticSearch | Best for multi-faceted search (price range, brand, color). |
| Transactions | Saga Pattern | Decouples services and avoids long-lived locks, ensuring high scale. |
| Inventory | Strong Consistency | We cannot "eventually" find out we sold more items than we have. |
| Communication | Event-Driven (Kafka) | Decouples order placement from non-critical tasks like notifications or analytics. |
