ποΈ Data Storage Patterns in System Design β
How you store data is as critical as how you process it. The right storage pattern determines your system's scalability, reliability, and performance.
Overview β
1. π¦ Normalization vs Denormalization β
Normalization β Eliminate Redundancy β
Organize data to reduce duplication. Data is split into multiple related tables.
β When to Use: Write-heavy systems, financial applications, when data integrity is critical.
-- Normalized Query: Join required
SELECT u.name, p.name, oi.quantity
FROM users u
JOIN orders o ON u.user_id = o.user_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE u.user_id = 42;Denormalization β Optimize for Reads β
Intentionally duplicate data to avoid expensive JOINs.
β When to Use: Read-heavy workloads, analytics dashboards, reporting systems.
// MongoDB denormalized document β no joins needed
const order = {
orderId: "ord_123",
user: { id: 42, name: "Alice", email: "alice@example.com" },
items: [
{ productId: "p1", name: "Laptop", price: 999.99, qty: 1 },
{ productId: "p2", name: "Mouse", price: 29.99, qty: 2 },
],
totalAmount: 1059.97,
createdAt: new Date(),
};| Normalization | Denormalization | |
|---|---|---|
| Storage | Less (no duplication) | More (data duplicated) |
| Read Speed | Slower (JOINs needed) | Faster (single query) |
| Write Speed | Faster (one place) | Slower (update many places) |
| Consistency | Easier | Harder |
2. π Key-Value Store Pattern β
Data stored as simple key β value pairs. Extremely fast lookups by key.
Real-world example: Session Management
// Store session β O(1) write
await redis.setex(
`session:${userId}`,
3600,
JSON.stringify({
userId,
role: "admin",
loginAt: Date.now(),
})
);
// Retrieve session β O(1) read
const session = JSON.parse(await redis.get(`session:${userId}`));
// Rate limiting with atomic increment
const hits = await redis.incr(`rate:${ip}:${minute}`);
if (hits > 100) throw new Error("Rate limit exceeded");β
Best For: Sessions, caching, rate limiting, feature flags, leaderboards
β Not For: Complex queries, relationships, full-text search
3. π Document Store Pattern β
Stores self-contained JSON/BSON documents. No rigid schema required.
Real-world example: E-commerce Product Catalog
// Products can have completely different structures β schema-flexible
const smartphone = {
_id: "prod_001",
name: "iPhone 15",
category: "electronics",
specs: { storage: "256GB", camera: "48MP", battery: "3877mAh" },
variants: [
{ color: "Black", stock: 50 },
{ color: "White", stock: 30 },
],
price: 999,
tags: ["apple", "5g", "flagship"],
};
const book = {
_id: "prod_002",
name: "Clean Code",
category: "books",
author: "Robert C. Martin",
isbn: "978-0132350884",
pages: 464,
price: 35,
};
// Query with embedded arrays β no JOIN needed
const results = await db.products.find({
category: "electronics",
"variants.stock": { $gt: 0 },
tags: "5g",
});β
Best For: Catalogs, CMS, user profiles, event logs
β Not For: Multi-entity transactions, complex reporting with aggregations
4. ποΈ Wide-Column Store Pattern β
Rows have dynamic columns grouped into column families. Optimized for time-series and heavy writes.
Real-world example: IoT Sensor Time-Series
// Cassandra data model β partition by device, sort by time
// Table: CREATE TABLE sensor_data (
// device_id text,
// recorded_at timestamp,
// temperature float,
// humidity float,
// PRIMARY KEY (device_id, recorded_at)
// ) WITH CLUSTERING ORDER BY (recorded_at DESC);
// Insert β extremely fast, append-only
await client.execute(
`INSERT INTO sensor_data (device_id, recorded_at, temperature, humidity)
VALUES (?, ?, ?, ?)`,
["device_abc", new Date(), 23.5, 61.2],
{ prepare: true }
);
// Range query on one partition β millisecond response even at billions of rows
const result = await client.execute(
`SELECT * FROM sensor_data
WHERE device_id = 'device_abc'
AND recorded_at > ? AND recorded_at < ?`,
[yesterday, now],
{ prepare: true }
);β
Best For: Time-series, IoT, analytics, write-heavy append logs
β Not For: Ad-hoc queries, joins, secondary filtering
5. πΈοΈ Graph Database Pattern β
Entities are nodes; relationships are edges with properties. Perfect for connected data.
Real-world example: Social Network Friend Recommendations
// Neo4j Cypher Query β Find friends-of-friends (2nd degree connections)
MATCH (me:User {id: "alice"})
-[:FOLLOWS]->(:User)
-[:FOLLOWS]->(recommended:User)
WHERE NOT (me)-[:FOLLOWS]->(recommended)
AND me <> recommended
RETURN recommended.name, COUNT(*) AS mutualConnections
ORDER BY mutualConnections DESC
LIMIT 10;// Node.js with Neo4j driver
const session = driver.session();
const result = await session.run(
`
MATCH (u:User {id: $userId})-[:PURCHASED]->(p:Product)
<-[:PURCHASED]-(similar:User)-[:PURCHASED]->(rec:Product)
WHERE NOT (u)-[:PURCHASED]->(rec)
RETURN rec.name, COUNT(*) AS score
ORDER BY score DESC LIMIT 5
`,
{ userId: "user_42" }
);
const recommendations = result.records.map((r) => r.get("rec.name"));β
Best For: Social graphs, fraud detection, recommendation engines, knowledge graphs
β Not For: Simple key lookups, bulk analytics, high-throughput writes
6. β‘ CQRS Pattern (Command Query Responsibility Segregation) β
Separate the write model (Commands) from the read model (Queries).
Real-world example: Order Management System
// COMMAND β Write side (strict validation, normalized)
class PlaceOrderCommand {
constructor(
public readonly userId: string,
public readonly items: OrderItem[]
) {}
}
class OrderCommandHandler {
async handle(cmd: PlaceOrderCommand) {
const order = Order.create(cmd.userId, cmd.items); // domain logic
await this.orderRepo.save(order); // save to write DB
await this.eventBus.publish(new OrderPlacedEvent(order)); // notify
}
}
// QUERY β Read side (pre-aggregated, fast)
class GetUserOrdersQuery {
constructor(public readonly userId: string) {}
}
class OrderQueryHandler {
async handle(query: GetUserOrdersQuery) {
// Read from pre-projected, denormalized read model
return this.readModel.findByUserId(query.userId);
}
}
// READ PROJECTOR β keeps read DB in sync
class OrderProjector {
async on(event: OrderPlacedEvent) {
await this.readModel.upsert({
orderId: event.orderId,
userName: await this.userService.getName(event.userId),
itemCount: event.items.length,
totalAmount: event.items.reduce((s, i) => s + i.price, 0),
status: "pending",
createdAt: event.timestamp,
});
}
}β
Benefits: Independent scaling of reads/writes, optimized query models, audit trail
β οΈ Tradeoff: Eventual consistency, added complexity
7. π Event Sourcing Pattern β
Store state as a sequence of events, not current values. Reconstruct state by replaying events.
Real-world example: Bank Account
// Events β immutable facts that happened
type AccountEvent =
| { type: "AccountOpened"; initialBalance: number; timestamp: Date }
| { type: "MoneyDeposited"; amount: number; timestamp: Date }
| { type: "MoneyWithdrawn"; amount: number; timestamp: Date };
// Reconstruct state by replaying events (pure function)
function replayEvents(events: AccountEvent[]): { balance: number } {
return events.reduce(
(state, event) => {
switch (event.type) {
case "AccountOpened":
return { balance: event.initialBalance };
case "MoneyDeposited":
return { balance: state.balance + event.amount };
case "MoneyWithdrawn":
return { balance: state.balance - event.amount };
}
},
{ balance: 0 }
);
}
// Usage
const events = await eventStore.getEvents("account_42");
const currentState = replayEvents(events);
// { balance: 300 }
// Time travel β what was balance on Jan 10?
const jan10Events = events.filter((e) => e.timestamp <= new Date("2024-01-10"));
const historicState = replayEvents(jan10Events);β
Benefits: Full audit log, time-travel queries, event replay for debugging
β οΈ Tradeoff: Larger storage, complex to query current state
8. ποΈ Data Lake vs Data Warehouse Pattern β
| Data Lake | Data Warehouse | |
|---|---|---|
| Schema | Schema-on-read | Schema-on-write |
| Format | Raw (JSON, CSV, Parquet) | Structured, columnar |
| Cost | Cheap storage (S3) | Expensive compute |
| Users | Data Scientists | Analysts, BI |
| Latency | Hours/batch | Seconds/interactive |
| Example | AWS S3 + Athena | BigQuery, Snowflake |
9. π Replication Patterns β
Primary-Replica (Read Scaling) β
Multi-Primary (Write Scaling + High Availability) β
Quorum-Based Replication (Consistency Control) β
Formula: R + W > N (Quorum)
- N = total replicas (e.g., 3)
- W = write acknowledgments required (e.g., 2)
- R = read replicas to consult (e.g., 2)
- R + W > N = 4 > 3 β
Consistent reads guaranteed10. πΎ Caching Storage Patterns β
Cache-Aside (Most Common) β
async function getUser(userId: string) {
// 1. Check cache first
const cached = await redis.get(`user:${userId}`);
if (cached) return JSON.parse(cached); // Cache HIT
// 2. Cache MISS β fetch from DB
const user = await db.users.findOne({ id: userId });
// 3. Populate cache for next time (TTL: 1 hour)
await redis.setex(`user:${userId}`, 3600, JSON.stringify(user));
return user;
}
// On update β invalidate cache
async function updateUser(userId: string, data: Partial<User>) {
await db.users.updateOne({ id: userId }, data);
await redis.del(`user:${userId}`); // Invalidate!
}Write-Through β
async function saveUserProfile(userId: string, profile: UserProfile) {
// Write to BOTH cache and DB simultaneously
await Promise.all([
redis.setex(`user:${userId}`, 3600, JSON.stringify(profile)),
db.users.updateOne({ id: userId }, profile)
]);
}
// β
Cache always fresh β οΈ Higher write latencyWrite-Behind (Write-Back) β
// Write to cache immediately, flush to DB asynchronously
async function updateCounter(key: string, value: number) {
await redis.set(key, value); // Instant return
await queue.push({ type: "DB_WRITE", key, value }); // Async flush
}
// Background worker flushes every N seconds
setInterval(async () => {
const batch = await queue.drain();
await db.bulkWrite(batch); // Efficient bulk write
}, 5000);
// β
Very fast writes β οΈ Risk of data loss if cache crashes11. π§© Polyglot Persistence Pattern β
Use multiple database technologies for different parts of the same system.
Why polyglot? No single database is best for everything. Optimize each component with the right tool.
| Component | Storage | Reason |
|---|---|---|
| User Auth | PostgreSQL | ACID, relational integrity |
| Product Catalog | MongoDB | Flexible schema, nested docs |
| Shopping Cart | Redis | Sub-millisecond speed, TTL |
| Search | Elasticsearch | Full-text, faceted search |
| Analytics | ClickHouse | Columnar, OLAP queries |
| Media | S3 + CDN | Cheap blob storage, global edge |
| Recommendations | Neo4j | Graph traversal |
12. π Lambda vs Kappa Architecture β
Lambda Architecture (Batch + Streaming) β
Kappa Architecture (Streaming Only β Simpler) β
| Lambda | Kappa | |
|---|---|---|
| Complexity | High (two systems) | Lower (one system) |
| Accuracy | High (batch reprocessing) | Depends on stream |
| Latency | Mixed (batch=hours, stream=ms) | Low (ms) |
| Reprocessing | Easy (batch layer) | Replay from Kafka |
13. πΊοΈ Pattern Selection Guide β
β Checklist β
- [ ] I understand when to normalize vs denormalize
- [ ] I can explain Key-Value, Document, Wide-Column, and Graph stores
- [ ] I know how CQRS separates reads from writes
- [ ] I can explain Event Sourcing and its benefits (audit trail, time-travel)
- [ ] I understand Primary-Replica vs Multi-Primary replication
- [ ] I know the Quorum formula: R + W > N
- [ ] I can explain Cache-Aside, Write-Through, and Write-Behind
- [ ] I understand why Polyglot Persistence is used in large systems
- [ ] I can distinguish Lambda from Kappa architecture
- [ ] I can pick the right storage pattern for a given use case
π Related Topics β
β‘οΈ Next: Level 4 β Caching
