Candy Crush Saga System Design: 1 Billion Users at Scale
Candy Crush Saga by King is one of the highest-grossing mobile games ever made. At its peak, it serves over 200 million daily active users (DAU) with a registered base exceeding 1 billion. Designing this at scale requires solving problems in game state persistence, real-time leaderboards, lives/boosters, and live events.
1. Requirements
Functional
- Game Progress: Track each user's current level (1–12,000+) and completion state.
- Lives System: Users get 5 lives, each refills after 30 minutes.
- Leaderboards: Weekly "Sugar Crush Race" among Facebook friends.
- Boosters & In-App Purchases: Users can buy/earn power-ups.
- Live Events: Time-limited challenges (e.g., "Chocolate Box" event).
- Cross-Platform Sync: Progress synced across iOS, Android, and Web (Facebook).
Non-Functional
- 1 Billion Users: Support vast user base with varying activity levels.
- High Availability: Game must be playable even during server outages.
- Low Latency: Game loads < 2s, state saves in < 200ms.
- Offline Play: Core gameplay works offline; state syncs on reconnect.
- Eventual Consistency: Leaderboards and social features can lag by seconds.
2. Scale Estimations
| Metric | Estimate |
|---|---|
| Total Registered Users | 1 Billion |
| Daily Active Users (DAU) | ~200 Million |
| Peak Concurrent Users | ~10 Million |
| Game Sessions/Day | ~500 Million |
| State Saves/Second (peak) | ~50,000 |
| Leaderboard Updates/Day | ~1 Billion |
| Storage per User | ~2 KB (game state) |
| Total Game State Storage | ~2 TB |
3. High-Level Architecture
Candy Crush uses a microservices model with strong offline-first client design.
4. Technical Deep Dives
A. Game State: Offline-First Architecture
The most critical design decision is: the client is the source of truth during play.
- Local State: The game runs 100% locally on the device. No server call is needed to match 3 candies.
- Checkpoint Saves: When a level is completed or failed, the client sends a state delta to the server.
- Conflict Resolution: If the user plays on two devices, the server picks the highest level progress (no one wants to go backwards).
Level Completion Flow:
Client completes level 512
→ Sends: { userId, level: 512, stars: 3, score: 48000 }
→ PlayerService validates & persists to Cassandra
→ Returns: lives remaining, booster rewards, unlocked levelsB. Lives System: Time-Based Expiry with Redis
Lives regenerate at a fixed rate of 1 life every 30 minutes (up to a max of 5).
- Storage: Two values per user:
livesCountandnextRefillAttimestamp. - Calculation: When the client requests lives, the server calculates how many have regenerated since
nextRefillAt, up to the maximum. - Redis TTL: Setting a key in Redis with a TTL for the refill time allows the system to efficiently handle the "infinite timer" problem at scale.
Key: lives:{userId}
Value: { count: 3, lastRefillAt: 1712650000 }
Logic on read: drainedTime = now - lastRefillAt
regained = floor(drainedTime / 1800)
currentLives = min(5, count + regained)C. Leaderboard: Redis Sorted Sets
The weekly "Sugar Crush Race" shows your score vs. Facebook friends.
- Data Structure: Redis
ZADDstores(score, userId)pairs in a sorted set per "race group" (a group of up to 30 friends). - Sharding: Each friend group is a separate Redis key, so no single key becomes a bottleneck.
- Updates: When a score is updated, a Kafka event triggers an async update to the leaderboard.
Key: leaderboard:week:{weekId}:group:{groupId}
Commands:
ZADD leaderboard:week:14:group:7 score:"48000" userId:"user_abc"
ZREVRANK leaderboard:week:14:group:7 "user_abc" → returns user's rank
ZREVRANGE leaderboard:week:14:group:7 0 9 → returns top 105. Implementation Example: Lives Service
This TypeScript service demonstrates the time-based lives calculation at scale.
typescript
interface LivesState {
count: number;
lastRefillAt: number; // Unix timestamp in seconds
}
const MAX_LIVES = 5;
const REFILL_INTERVAL_SECONDS = 30 * 60; // 30 minutes
class LivesService {
constructor(private redis: any) {}
/**
* Gets the current lives count for a user, accounting for time-based regeneration.
*/
async getLives(
userId: string
): Promise<{ lives: number; nextRefillIn: number }> {
const raw = await this.redis.get(`lives:${userId}`);
const now = Math.floor(Date.now() / 1000);
if (!raw) {
// Default: new user starts with 5 lives
await this.setLives(userId, MAX_LIVES, now);
return { lives: MAX_LIVES, nextRefillIn: REFILL_INTERVAL_SECONDS };
}
const state: LivesState = JSON.parse(raw);
const elapsedSeconds = now - state.lastRefillAt;
const regained = Math.floor(elapsedSeconds / REFILL_INTERVAL_SECONDS);
const currentLives = Math.min(MAX_LIVES, state.count + regained);
// Time until the *next* life refills
const secondsIntoCurrentInterval = elapsedSeconds % REFILL_INTERVAL_SECONDS;
const nextRefillIn = REFILL_INTERVAL_SECONDS - secondsIntoCurrentInterval;
return { lives: currentLives, nextRefillIn };
}
/**
* Deducts a life when a user starts a level.
*/
async spendLife(
userId: string
): Promise<{ success: boolean; livesRemaining: number }> {
const { lives } = await this.getLives(userId);
if (lives <= 0) {
return { success: false, livesRemaining: 0 };
}
const now = Math.floor(Date.now() / 1000);
await this.setLives(userId, lives - 1, now);
return { success: true, livesRemaining: lives - 1 };
}
private async setLives(userId: string, count: number, lastRefillAt: number) {
const state: LivesState = { count, lastRefillAt };
await this.redis.set(`lives:${userId}`, JSON.stringify(state));
}
}6. Summary: Key Architecture Trade-offs
| Component | Choice | Rationale |
|---|---|---|
| Game State | Cassandra | Wide-column DB scales horizontally; user progress is a perfect key-value lookup by userId. |
| Lives | Redis | Sub-millisecond reads/writes; TTL-based expiry is a perfect fit for time-based refills. |
| Leaderboard | Redis Sorted Sets | Built-in ranking operations (ZADD, ZRANK) require zero custom logic. |
| Consistency | Offline-First Client | Decouples gameplay from network; ensures playability even on poor connections. |
| Transactions | PostgreSQL | Financial (IAP) data requires ACID guarantees; eventual consistency is unacceptable here. |
