Skip to content

πŸ—ƒοΈ Data Storage Patterns in System Design ​

How you store data is as critical as how you process it. The right storage pattern determines your system's scalability, reliability, and performance.


Overview ​


1. πŸ“¦ Normalization vs Denormalization ​

Normalization β€” Eliminate Redundancy ​

Organize data to reduce duplication. Data is split into multiple related tables.

βœ… When to Use: Write-heavy systems, financial applications, when data integrity is critical.

sql
-- Normalized Query: Join required
SELECT u.name, p.name, oi.quantity
FROM users u
JOIN orders o ON u.user_id = o.user_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE u.user_id = 42;

Denormalization β€” Optimize for Reads ​

Intentionally duplicate data to avoid expensive JOINs.

βœ… When to Use: Read-heavy workloads, analytics dashboards, reporting systems.

javascript
// MongoDB denormalized document β€” no joins needed
const order = {
  orderId: "ord_123",
  user: { id: 42, name: "Alice", email: "alice@example.com" },
  items: [
    { productId: "p1", name: "Laptop", price: 999.99, qty: 1 },
    { productId: "p2", name: "Mouse", price: 29.99, qty: 2 },
  ],
  totalAmount: 1059.97,
  createdAt: new Date(),
};
NormalizationDenormalization
StorageLess (no duplication)More (data duplicated)
Read SpeedSlower (JOINs needed)Faster (single query)
Write SpeedFaster (one place)Slower (update many places)
ConsistencyEasierHarder

2. πŸ”‘ Key-Value Store Pattern ​

Data stored as simple key β†’ value pairs. Extremely fast lookups by key.

Real-world example: Session Management

javascript
// Store session β€” O(1) write
await redis.setex(
  `session:${userId}`,
  3600,
  JSON.stringify({
    userId,
    role: "admin",
    loginAt: Date.now(),
  })
);

// Retrieve session β€” O(1) read
const session = JSON.parse(await redis.get(`session:${userId}`));

// Rate limiting with atomic increment
const hits = await redis.incr(`rate:${ip}:${minute}`);
if (hits > 100) throw new Error("Rate limit exceeded");

βœ… Best For: Sessions, caching, rate limiting, feature flags, leaderboards
❌ Not For: Complex queries, relationships, full-text search


3. πŸ“„ Document Store Pattern ​

Stores self-contained JSON/BSON documents. No rigid schema required.

Real-world example: E-commerce Product Catalog

javascript
// Products can have completely different structures β€” schema-flexible
const smartphone = {
  _id: "prod_001",
  name: "iPhone 15",
  category: "electronics",
  specs: { storage: "256GB", camera: "48MP", battery: "3877mAh" },
  variants: [
    { color: "Black", stock: 50 },
    { color: "White", stock: 30 },
  ],
  price: 999,
  tags: ["apple", "5g", "flagship"],
};

const book = {
  _id: "prod_002",
  name: "Clean Code",
  category: "books",
  author: "Robert C. Martin",
  isbn: "978-0132350884",
  pages: 464,
  price: 35,
};

// Query with embedded arrays β€” no JOIN needed
const results = await db.products.find({
  category: "electronics",
  "variants.stock": { $gt: 0 },
  tags: "5g",
});

βœ… Best For: Catalogs, CMS, user profiles, event logs
❌ Not For: Multi-entity transactions, complex reporting with aggregations


4. πŸ›οΈ Wide-Column Store Pattern ​

Rows have dynamic columns grouped into column families. Optimized for time-series and heavy writes.

Real-world example: IoT Sensor Time-Series

javascript
// Cassandra data model β€” partition by device, sort by time
// Table: CREATE TABLE sensor_data (
//   device_id text,
//   recorded_at timestamp,
//   temperature float,
//   humidity float,
//   PRIMARY KEY (device_id, recorded_at)
// ) WITH CLUSTERING ORDER BY (recorded_at DESC);

// Insert β€” extremely fast, append-only
await client.execute(
  `INSERT INTO sensor_data (device_id, recorded_at, temperature, humidity)
   VALUES (?, ?, ?, ?)`,
  ["device_abc", new Date(), 23.5, 61.2],
  { prepare: true }
);

// Range query on one partition β€” millisecond response even at billions of rows
const result = await client.execute(
  `SELECT * FROM sensor_data
   WHERE device_id = 'device_abc'
   AND recorded_at > ? AND recorded_at < ?`,
  [yesterday, now],
  { prepare: true }
);

βœ… Best For: Time-series, IoT, analytics, write-heavy append logs
❌ Not For: Ad-hoc queries, joins, secondary filtering


5. πŸ•ΈοΈ Graph Database Pattern ​

Entities are nodes; relationships are edges with properties. Perfect for connected data.

Real-world example: Social Network Friend Recommendations

cypher
// Neo4j Cypher Query β€” Find friends-of-friends (2nd degree connections)
MATCH (me:User {id: "alice"})
      -[:FOLLOWS]->(:User)
      -[:FOLLOWS]->(recommended:User)
WHERE NOT (me)-[:FOLLOWS]->(recommended)
  AND me <> recommended
RETURN recommended.name, COUNT(*) AS mutualConnections
ORDER BY mutualConnections DESC
LIMIT 10;
javascript
// Node.js with Neo4j driver
const session = driver.session();
const result = await session.run(
  `
  MATCH (u:User {id: $userId})-[:PURCHASED]->(p:Product)
        <-[:PURCHASED]-(similar:User)-[:PURCHASED]->(rec:Product)
  WHERE NOT (u)-[:PURCHASED]->(rec)
  RETURN rec.name, COUNT(*) AS score
  ORDER BY score DESC LIMIT 5
`,
  { userId: "user_42" }
);

const recommendations = result.records.map((r) => r.get("rec.name"));

βœ… Best For: Social graphs, fraud detection, recommendation engines, knowledge graphs
❌ Not For: Simple key lookups, bulk analytics, high-throughput writes


6. ⚑ CQRS Pattern (Command Query Responsibility Segregation) ​

Separate the write model (Commands) from the read model (Queries).

Real-world example: Order Management System

typescript
// COMMAND β€” Write side (strict validation, normalized)
class PlaceOrderCommand {
  constructor(
    public readonly userId: string,
    public readonly items: OrderItem[]
  ) {}
}

class OrderCommandHandler {
  async handle(cmd: PlaceOrderCommand) {
    const order = Order.create(cmd.userId, cmd.items); // domain logic
    await this.orderRepo.save(order); // save to write DB
    await this.eventBus.publish(new OrderPlacedEvent(order)); // notify
  }
}

// QUERY β€” Read side (pre-aggregated, fast)
class GetUserOrdersQuery {
  constructor(public readonly userId: string) {}
}

class OrderQueryHandler {
  async handle(query: GetUserOrdersQuery) {
    // Read from pre-projected, denormalized read model
    return this.readModel.findByUserId(query.userId);
  }
}

// READ PROJECTOR β€” keeps read DB in sync
class OrderProjector {
  async on(event: OrderPlacedEvent) {
    await this.readModel.upsert({
      orderId: event.orderId,
      userName: await this.userService.getName(event.userId),
      itemCount: event.items.length,
      totalAmount: event.items.reduce((s, i) => s + i.price, 0),
      status: "pending",
      createdAt: event.timestamp,
    });
  }
}

βœ… Benefits: Independent scaling of reads/writes, optimized query models, audit trail
⚠️ Tradeoff: Eventual consistency, added complexity


7. πŸ“œ Event Sourcing Pattern ​

Store state as a sequence of events, not current values. Reconstruct state by replaying events.

Real-world example: Bank Account

typescript
// Events β€” immutable facts that happened
type AccountEvent =
  | { type: "AccountOpened"; initialBalance: number; timestamp: Date }
  | { type: "MoneyDeposited"; amount: number; timestamp: Date }
  | { type: "MoneyWithdrawn"; amount: number; timestamp: Date };

// Reconstruct state by replaying events (pure function)
function replayEvents(events: AccountEvent[]): { balance: number } {
  return events.reduce(
    (state, event) => {
      switch (event.type) {
        case "AccountOpened":
          return { balance: event.initialBalance };
        case "MoneyDeposited":
          return { balance: state.balance + event.amount };
        case "MoneyWithdrawn":
          return { balance: state.balance - event.amount };
      }
    },
    { balance: 0 }
  );
}

// Usage
const events = await eventStore.getEvents("account_42");
const currentState = replayEvents(events);
// { balance: 300 }

// Time travel β€” what was balance on Jan 10?
const jan10Events = events.filter((e) => e.timestamp <= new Date("2024-01-10"));
const historicState = replayEvents(jan10Events);

βœ… Benefits: Full audit log, time-travel queries, event replay for debugging
⚠️ Tradeoff: Larger storage, complex to query current state


8. πŸ—οΈ Data Lake vs Data Warehouse Pattern ​

Data LakeData Warehouse
SchemaSchema-on-readSchema-on-write
FormatRaw (JSON, CSV, Parquet)Structured, columnar
CostCheap storage (S3)Expensive compute
UsersData ScientistsAnalysts, BI
LatencyHours/batchSeconds/interactive
ExampleAWS S3 + AthenaBigQuery, Snowflake

9. πŸ”„ Replication Patterns ​

Primary-Replica (Read Scaling) ​

Multi-Primary (Write Scaling + High Availability) ​

Quorum-Based Replication (Consistency Control) ​

text
Formula: R + W > N (Quorum)
- N = total replicas (e.g., 3)
- W = write acknowledgments required (e.g., 2)
- R = read replicas to consult (e.g., 2)
- R + W > N = 4 > 3 βœ… Consistent reads guaranteed

10. πŸ’Ύ Caching Storage Patterns ​

Cache-Aside (Most Common) ​

javascript
async function getUser(userId: string) {
  // 1. Check cache first
  const cached = await redis.get(`user:${userId}`);
  if (cached) return JSON.parse(cached); // Cache HIT

  // 2. Cache MISS β€” fetch from DB
  const user = await db.users.findOne({ id: userId });

  // 3. Populate cache for next time (TTL: 1 hour)
  await redis.setex(`user:${userId}`, 3600, JSON.stringify(user));

  return user;
}

// On update β€” invalidate cache
async function updateUser(userId: string, data: Partial<User>) {
  await db.users.updateOne({ id: userId }, data);
  await redis.del(`user:${userId}`); // Invalidate!
}

Write-Through ​

javascript
async function saveUserProfile(userId: string, profile: UserProfile) {
  // Write to BOTH cache and DB simultaneously
  await Promise.all([
    redis.setex(`user:${userId}`, 3600, JSON.stringify(profile)),
    db.users.updateOne({ id: userId }, profile)
  ]);
}
// βœ… Cache always fresh  ⚠️ Higher write latency

Write-Behind (Write-Back) ​

javascript
// Write to cache immediately, flush to DB asynchronously
async function updateCounter(key: string, value: number) {
  await redis.set(key, value); // Instant return
  await queue.push({ type: "DB_WRITE", key, value }); // Async flush
}

// Background worker flushes every N seconds
setInterval(async () => {
  const batch = await queue.drain();
  await db.bulkWrite(batch); // Efficient bulk write
}, 5000);
// βœ… Very fast writes  ⚠️ Risk of data loss if cache crashes

11. 🧩 Polyglot Persistence Pattern ​

Use multiple database technologies for different parts of the same system.

Why polyglot? No single database is best for everything. Optimize each component with the right tool.

ComponentStorageReason
User AuthPostgreSQLACID, relational integrity
Product CatalogMongoDBFlexible schema, nested docs
Shopping CartRedisSub-millisecond speed, TTL
SearchElasticsearchFull-text, faceted search
AnalyticsClickHouseColumnar, OLAP queries
MediaS3 + CDNCheap blob storage, global edge
RecommendationsNeo4jGraph traversal

12. 🌊 Lambda vs Kappa Architecture ​

Lambda Architecture (Batch + Streaming) ​

Kappa Architecture (Streaming Only β€” Simpler) ​

LambdaKappa
ComplexityHigh (two systems)Lower (one system)
AccuracyHigh (batch reprocessing)Depends on stream
LatencyMixed (batch=hours, stream=ms)Low (ms)
ReprocessingEasy (batch layer)Replay from Kafka

13. πŸ—ΊοΈ Pattern Selection Guide ​


βœ… Checklist ​

  • [ ] I understand when to normalize vs denormalize
  • [ ] I can explain Key-Value, Document, Wide-Column, and Graph stores
  • [ ] I know how CQRS separates reads from writes
  • [ ] I can explain Event Sourcing and its benefits (audit trail, time-travel)
  • [ ] I understand Primary-Replica vs Multi-Primary replication
  • [ ] I know the Quorum formula: R + W > N
  • [ ] I can explain Cache-Aside, Write-Through, and Write-Behind
  • [ ] I understand why Polyglot Persistence is used in large systems
  • [ ] I can distinguish Lambda from Kappa architecture
  • [ ] I can pick the right storage pattern for a given use case


➑️ Next: Level 4 β€” Caching

Released under the ISC License.