Skip to content

YouTube System Design: Scaling Video to Exabytes

YouTube is one of the largest data platforms on Earth. Its primary challenge is not just "storing" video, but processing and delivering it to billions of users with zero buffering.


1. Requirements

Functional

  • Upload Video: Users can upload high-resolution video files.
  • Play Video: Users can stream videos in various qualities (ABR).
  • Search: Users can find videos by title/tags.
  • View Count: Real-time tracking of video popularity.
  • Comments/Likes: Interaction layer.

Non-Functional

  • Reliability: Uploaded videos should never be lost (99.999% durability).
  • Availability: High availability for playback.
  • Low Latency: Minimal start time and no mid-stream buffering.
  • Scalability: Handle 500+ hours of video uploaded every minute.

2. High-Level Architecture

YouTube separates the Write Path (Upload/Process) from the Read Path (Stream/Search).


3. Technical Deep Dives

A. The Transcoding Pipeline

Raw video files are massive. To serve them efficiently, YouTube must:

  1. Chunking: The original video is broken into small chunks (GOP - Group of Pictures).
  2. Parallel Processing: Different workers transcode these chunks into multiple formats (MP4, WebM) and resolutions (360p, 720p, 1080p, 4K) simultaneously.
  3. Merging: Transcoded chunks are merged back into a set of manifest files.
  4. Adaptive Bitrate Streaming (ABR): YouTube uses protocols like DASH (Dynamic Adaptive Streaming over HTTP) and HLS. The player automatically detects your bandwidth and requests the appropriate resolution chunk.

B. High Scale View Counting

For a viral video, thousands of views arrive every second.

  • The Problem: Updating a single database row 1,000 times per second causes massive contention and locking.
  • The Solution: Buffering & Aggregation.
    1. Requests are sent to a high-throughput bus (Kafka).
    2. An Accumulator Service pulls from Kafka and aggregates views in memory for 10-60 seconds.
    3. The aggregated count (e.g., +500 views) is written to the database in a single batch update.

C. Scaling Metadata with Vitess

YouTube traditionally used MySQL. To scale horizontally, they built Vitess:

  • Sharding: Automatically shards data across multiple MySQL instances.
  • Connection Pooling: Handles thousands of concurrent connections efficiently.
  • Query Routing: Routes SQL queries to the correct shard based on the video_id or user_id.

4. Implementation Example: Transcoding Orchestrator

This example demonstrates how a Master service might orchestrate a video transcoding job.

typescript
interface TranscodingJob {
  videoId: string;
  resolutions: string[]; // ['360p', '720p', '1080p']
  status: "QUEUED" | "PROCESSING" | "COMPLETED" | "FAILED";
}

class TranscodingOrchestrator {
  private queue: any[] = [];

  async startJob(videoId: string) {
    const job: TranscodingJob = {
      videoId,
      resolutions: ["360p", "720p", "1080p"],
      status: "QUEUED",
    };

    // 1. Chunk the video (Conceptual)
    const chunks = await this.videoProcessor.splitToChunks(videoId);
    console.log(`Video ${videoId} split into ${chunks.length} chunks.`);

    // 2. Distribute to workers
    const transcodingTasks = chunks.flatMap((chunk) =>
      job.resolutions.map((res) => this.workerService.process(chunk, res))
    );

    try {
      await Promise.all(transcodingTasks);
      console.log(`All chunks transcoded for video ${videoId}`);
      await this.finalizeVideo(videoId);
    } catch (error) {
      console.error("Transcoding failed:", error);
      await this.handleFailure(videoId);
    }
  }

  private async finalizeVideo(videoId: string) {
    // Generate DASH/HLS manifest files and update DB
    await this.metadataService.updateStatus(videoId, "READY");
  }
}

5. Summary: Key Architecture Trade-offs

ComponentChoiceRationale
PerformanceVideo ChunkingEssential for parallel transcoding and low-latency start times.
DeliveryGlobal CDN (GGC)Caching popular videos close to users reduces egress costs and buffering.
ConsistencyEventual ConsistencyView counts don't need to be 100% precise in real-time (Buffered updates).
StorageObject StorageBlob stores (S3/GCS) are the only way to scale to exabytes of binary data.

Released under the ISC License.