Skip to content

Database Scaling — Core Concepts

Interview Relevance: Critical — "How would you scale this to 10× traffic?" is asked in every FAANG interview. Know replication, sharding, and consistent hashing cold.

A single database server has hard limits. At scale, you need strategies to spread load across multiple machines. There are two fundamental dimensions:

  • Read Scaling → Replication (more readers)
  • Write Scaling → Sharding / Partitioning (more writers)

The Scaling Ladder

Interview tip: Always go through this ladder. Jumping straight to sharding is over-engineering and a red flag.


Strategy 1 — Master-Slave Replication

How It Works

One node (Master) accepts all writes. Changes are streamed to one or more Slave (Replica) nodes which serve reads only.

Synchronous vs. Asynchronous Replication

ModeDurabilityWrite LatencyUse Case
Synchronous✅ Zero data lossHigher (waits for replica ACK)Financial transactions, critical data
Asynchronous⚠️ Small replication lagLower (no wait)Most web apps, analytics
Semi-synchronous✅ At least 1 replica ACKModerateMiddle ground (MySQL default option)

Failover: What Happens When Master Dies?

Tools for automatic failover:

  • MySQL: Orchestrator, MHA (Master High Availability)
  • PostgreSQL: Patroni, repmgr
  • Redis: Redis Sentinel, Redis Cluster

Master-Slave Trade-offs

✅ PROS:
  - Read throughput scales linearly (add more replicas)
  - Read replicas can serve reporting/analytics without impacting production
  - Simple to set up (built into MySQL, PostgreSQL, MongoDB)
  - Replicas can be used for backups and disaster recovery

❌ CONS:
  - Master is still a single point of failure for WRITES
  - Replication lag → reads from replica may return stale data
  - If master dies and async replicas are behind → potential data loss
  - All write traffic still funnels through one node

Strategy 2 — Multi-Master Replication

Every node can accept both reads and writes. Changes are replicated between all masters.

The Core Problem: Write Conflicts

Conflict Resolution Strategies:

StrategyHow It WorksRisk
Last-Write-Wins (LWW)Timestamp determines winnerClock skew → wrong value wins
Application-levelApp code detects and mergesComplex logic required
CRDTData structures designed for mergingLimited to specific data types
Avoid-conflictsRoute same entity to same masterPartially loses multi-master benefit
Multi-Master is RARELY the right answer in interviews.
Use it only for:
  ✅ Active-active geo-distributed writes (global apps)
  ✅ When regional write autonomy is required
  ✅ When you can accept eventual consistency with conflict resolution

For most systems: Prefer Master-Slave + Sharding instead.

Strategy 3 — Sharding (Horizontal Partitioning)

Split the dataset itself across multiple independent database servers. Each server (shard) owns a subset of the data.

Without Sharding vs. With Sharding

Shard Key Selection (Critical)

The shard key (also called partition key) determines which shard stores a piece of data. This is the most important sharding decision.

Good shard key properties:

  • High cardinality — many distinct values (user_id, order_id, URL hash)
  • Evenly distributed — no hot spots
  • Immutable — never changes (moving data between shards is expensive)
  • ❌ Avoid: timestamp, status, country/region (low cardinality → hot shards)

Consistent Hashing for Sharding

Simple hash(key) % N breaks catastrophically when N changes (add/remove a shard).

Simple Modulo Hashing:
  3 shards: hash("alice") % 3 = 0 → Shard 0
  Add shard: hash("alice") % 4 = 3 → Shard 3  ← MOVED!

  On adding 1 shard to 3: ~75% of all keys move!
  → Massive data migration + cache invalidation

Consistent hashing solves this by arranging servers on a virtual ring:

Consistent Hashing in the Real World

SystemUses Consistent Hashing For
CassandraToken ring — each node owns a token range
DynamoDBPartition key hashed to virtual node
Redis Cluster16,384 hash slots distributed across nodes
Chord / DynamoAcademic foundations of modern DHTs
MemcachedClient-side consistent hashing

Sharding Strategies

Range-Based Sharding

✅ Range queries are efficient (e.g., "get all orders from Jan")
✅ Simple to reason about
❌ Hot shards: recent data always hits the last shard
❌ Uneven distribution if data isn't uniformly spread

Hash-Based Sharding

✅ Even distribution by design
✅ No hot shards
❌ Range queries are expensive (must query all shards and merge)
❌ Adding/removing shards causes key redistribution (use consistent hashing!)

Directory-Based Sharding

A lookup table (shard map) stores which shard holds which data.

✅ Maximum flexibility — can move data between shards without changing key scheme
✅ Supports heterogeneous shard sizes
❌ Lookup table is itself a bottleneck and SPOF (must be cached)
❌ Extra network hop per request

The Sharding Challenges

Cross-Shard Joins

Solution: Design your shard key so that data that's queried together is stored together (co-location).

Resharding (Shard Split)

When one shard grows too large, it must be split:

Mitigation: Use consistent hashing from the start — adding nodes only remaps ~1/N keys.


Worked Example: URL Shortener Scaling Journey

Phase 3 Cassandra Details:

DecisionChoiceReason
Shard keyshort_code hashHigh cardinality, immutable, evenly distributed
Consistency levelQUORUM reads/writesBalance between consistency and availability
Replication factor3 per DCSurvive 1 node failure without data loss
Virtual nodes256 vnodes/nodeEven data distribution across heterogeneous nodes
CompactionLeveledCompactionOptimized for read-heavy redirect workload

Interview Cheat Sheet

One-Line Summaries

Master-Slave:       1 writer, N readers — scale reads linearly
Multi-Master:       N writers — avoids write SPOF, complex conflict resolution
Sharding:           Split data across servers — scale writes and storage
Range Sharding:     Data split by value ranges — great for scans, hot shard risk
Hash Sharding:      Data split by hash(key) — even distribution, no range queries
Directory Sharding: Lookup table maps keys to shards — flexible, extra hop
Consistent Hashing: Hash ring — only 1/N keys remapped when topology changes
Virtual Nodes:      Multiple ring positions per server — even load distribution

The Interview Phrase

"First I'd add read replicas and Redis caching — together these can
 handle 10x read traffic with zero schema changes. If writes become
 the bottleneck, I'd shard on user_id using consistent hashing so
 adding new shards only remaps ~1/N keys and avoids a full data
 migration. Cross-shard joins would be handled by denormalizing and
 co-locating related data on the same shard."

Red Flags vs. Green Flags

🔴 Red Flag🟢 Green Flag
Jump straight to shardingFollow the ladder: cache → replicas → sharding
Use created_at as shard keyPick high-cardinality, immutable key (user_id, UUID)
Forget cross-shard join problemAddress it with co-location or denormalization
Say "just add nodes"Explain consistent hashing to minimize resharding pain
Ignore replication lagState whether async/sync and consequences for reads
Multi-master for everythingReserve multi-master for active-active geo-distributed systems

IMPORTANT

In an interview, always explain why you chose the shard key. Say: "I'm sharding on user_id because it's high cardinality, immutable, and ensures all of a user's data is co-located, which avoids cross-shard joins for user-centric queries."

TIP

Mentioning the replication lag problem with Master-Slave and how you'd handle it (e.g., reading from master for critical paths, replica for non-critical) demonstrates real-world experience.

Released under the ISC License.