Skip to content

Candy Crush Saga System Design: 1 Billion Users at Scale

Candy Crush Saga by King is one of the highest-grossing mobile games ever made. At its peak, it serves over 200 million daily active users (DAU) with a registered base exceeding 1 billion. Designing this at scale requires solving problems in game state persistence, real-time leaderboards, lives/boosters, and live events.


1. Requirements

Functional

  • Game Progress: Track each user's current level (1–12,000+) and completion state.
  • Lives System: Users get 5 lives, each refills after 30 minutes.
  • Leaderboards: Weekly "Sugar Crush Race" among Facebook friends.
  • Boosters & In-App Purchases: Users can buy/earn power-ups.
  • Live Events: Time-limited challenges (e.g., "Chocolate Box" event).
  • Cross-Platform Sync: Progress synced across iOS, Android, and Web (Facebook).

Non-Functional

  • 1 Billion Users: Support vast user base with varying activity levels.
  • High Availability: Game must be playable even during server outages.
  • Low Latency: Game loads < 2s, state saves in < 200ms.
  • Offline Play: Core gameplay works offline; state syncs on reconnect.
  • Eventual Consistency: Leaderboards and social features can lag by seconds.

2. Scale Estimations

MetricEstimate
Total Registered Users1 Billion
Daily Active Users (DAU)~200 Million
Peak Concurrent Users~10 Million
Game Sessions/Day~500 Million
State Saves/Second (peak)~50,000
Leaderboard Updates/Day~1 Billion
Storage per User~2 KB (game state)
Total Game State Storage~2 TB

3. High-Level Architecture

Candy Crush uses a microservices model with strong offline-first client design.


4. Technical Deep Dives

A. Game State: Offline-First Architecture

The most critical design decision is: the client is the source of truth during play.

  • Local State: The game runs 100% locally on the device. No server call is needed to match 3 candies.
  • Checkpoint Saves: When a level is completed or failed, the client sends a state delta to the server.
  • Conflict Resolution: If the user plays on two devices, the server picks the highest level progress (no one wants to go backwards).
Level Completion Flow:
Client completes level 512
  → Sends: { userId, level: 512, stars: 3, score: 48000 }
  → PlayerService validates & persists to Cassandra
  → Returns: lives remaining, booster rewards, unlocked levels

B. Lives System: Time-Based Expiry with Redis

Lives regenerate at a fixed rate of 1 life every 30 minutes (up to a max of 5).

  • Storage: Two values per user: livesCount and nextRefillAt timestamp.
  • Calculation: When the client requests lives, the server calculates how many have regenerated since nextRefillAt, up to the maximum.
  • Redis TTL: Setting a key in Redis with a TTL for the refill time allows the system to efficiently handle the "infinite timer" problem at scale.
Key: lives:{userId}
Value: { count: 3, lastRefillAt: 1712650000 }
Logic on read: drainedTime = now - lastRefillAt
              regained = floor(drainedTime / 1800)
              currentLives = min(5, count + regained)

C. Leaderboard: Redis Sorted Sets

The weekly "Sugar Crush Race" shows your score vs. Facebook friends.

  • Data Structure: Redis ZADD stores (score, userId) pairs in a sorted set per "race group" (a group of up to 30 friends).
  • Sharding: Each friend group is a separate Redis key, so no single key becomes a bottleneck.
  • Updates: When a score is updated, a Kafka event triggers an async update to the leaderboard.
Key: leaderboard:week:{weekId}:group:{groupId}
Commands:
  ZADD leaderboard:week:14:group:7 score:"48000" userId:"user_abc"
  ZREVRANK leaderboard:week:14:group:7 "user_abc"  → returns user's rank
  ZREVRANGE leaderboard:week:14:group:7 0 9         → returns top 10

5. Implementation Example: Lives Service

This TypeScript service demonstrates the time-based lives calculation at scale.

typescript
interface LivesState {
  count: number;
  lastRefillAt: number; // Unix timestamp in seconds
}

const MAX_LIVES = 5;
const REFILL_INTERVAL_SECONDS = 30 * 60; // 30 minutes

class LivesService {
  constructor(private redis: any) {}

  /**
   * Gets the current lives count for a user, accounting for time-based regeneration.
   */
  async getLives(
    userId: string
  ): Promise<{ lives: number; nextRefillIn: number }> {
    const raw = await this.redis.get(`lives:${userId}`);
    const now = Math.floor(Date.now() / 1000);

    if (!raw) {
      // Default: new user starts with 5 lives
      await this.setLives(userId, MAX_LIVES, now);
      return { lives: MAX_LIVES, nextRefillIn: REFILL_INTERVAL_SECONDS };
    }

    const state: LivesState = JSON.parse(raw);
    const elapsedSeconds = now - state.lastRefillAt;
    const regained = Math.floor(elapsedSeconds / REFILL_INTERVAL_SECONDS);
    const currentLives = Math.min(MAX_LIVES, state.count + regained);

    // Time until the *next* life refills
    const secondsIntoCurrentInterval = elapsedSeconds % REFILL_INTERVAL_SECONDS;
    const nextRefillIn = REFILL_INTERVAL_SECONDS - secondsIntoCurrentInterval;

    return { lives: currentLives, nextRefillIn };
  }

  /**
   * Deducts a life when a user starts a level.
   */
  async spendLife(
    userId: string
  ): Promise<{ success: boolean; livesRemaining: number }> {
    const { lives } = await this.getLives(userId);

    if (lives <= 0) {
      return { success: false, livesRemaining: 0 };
    }

    const now = Math.floor(Date.now() / 1000);
    await this.setLives(userId, lives - 1, now);
    return { success: true, livesRemaining: lives - 1 };
  }

  private async setLives(userId: string, count: number, lastRefillAt: number) {
    const state: LivesState = { count, lastRefillAt };
    await this.redis.set(`lives:${userId}`, JSON.stringify(state));
  }
}

6. Summary: Key Architecture Trade-offs

ComponentChoiceRationale
Game StateCassandraWide-column DB scales horizontally; user progress is a perfect key-value lookup by userId.
LivesRedisSub-millisecond reads/writes; TTL-based expiry is a perfect fit for time-based refills.
LeaderboardRedis Sorted SetsBuilt-in ranking operations (ZADD, ZRANK) require zero custom logic.
ConsistencyOffline-First ClientDecouples gameplay from network; ensures playability even on poor connections.
TransactionsPostgreSQLFinancial (IAP) data requires ACID guarantees; eventual consistency is unacceptable here.

Released under the ISC License.