YouTube System Design: Scaling Video to Exabytes

YouTube is one of the largest data platforms on Earth. Its primary challenge is not just "storing" video, but processing and delivering it to billions of users with zero buffering.

1. Requirements

Functional

Upload Video: Users can upload high-resolution video files.
Play Video: Users can stream videos in various qualities (ABR).
Search: Users can find videos by title/tags.
View Count: Real-time tracking of video popularity.
Comments/Likes: Interaction layer.

Non-Functional

Reliability: Uploaded videos should never be lost (99.999% durability).
Availability: High availability for playback.
Low Latency: Minimal start time and no mid-stream buffering.
Scalability: Handle 500+ hours of video uploaded every minute.

2. High-Level Architecture

YouTube separates the Write Path (Upload/Process) from the Read Path (Stream/Search).

3. Technical Deep Dives

A. The Transcoding Pipeline

Raw video files are massive. To serve them efficiently, YouTube must:

Chunking: The original video is broken into small chunks (GOP - Group of Pictures).
Parallel Processing: Different workers transcode these chunks into multiple formats (MP4, WebM) and resolutions (360p, 720p, 1080p, 4K) simultaneously.
Merging: Transcoded chunks are merged back into a set of manifest files.
Adaptive Bitrate Streaming (ABR): YouTube uses protocols like DASH (Dynamic Adaptive Streaming over HTTP) and HLS. The player automatically detects your bandwidth and requests the appropriate resolution chunk.

B. High Scale View Counting

For a viral video, thousands of views arrive every second.

The Problem: Updating a single database row 1,000 times per second causes massive contention and locking.
The Solution: Buffering & Aggregation.
1. Requests are sent to a high-throughput bus (Kafka).
2. An Accumulator Service pulls from Kafka and aggregates views in memory for 10-60 seconds.
3. The aggregated count (e.g., +500 views) is written to the database in a single batch update.

C. Scaling Metadata with Vitess

YouTube traditionally used MySQL. To scale horizontally, they built Vitess:

Sharding: Automatically shards data across multiple MySQL instances.
Connection Pooling: Handles thousands of concurrent connections efficiently.
Query Routing: Routes SQL queries to the correct shard based on the video_id or user_id.

4. Implementation Example: Transcoding Orchestrator

This example demonstrates how a Master service might orchestrate a video transcoding job.

typescript

interface TranscodingJob {
  videoId: string;
  resolutions: string[]; // ['360p', '720p', '1080p']
  status: "QUEUED" | "PROCESSING" | "COMPLETED" | "FAILED";
}

class TranscodingOrchestrator {
  private queue: any[] = [];

  async startJob(videoId: string) {
    const job: TranscodingJob = {
      videoId,
      resolutions: ["360p", "720p", "1080p"],
      status: "QUEUED",
    };

    // 1. Chunk the video (Conceptual)
    const chunks = await this.videoProcessor.splitToChunks(videoId);
    console.log(`Video ${videoId} split into ${chunks.length} chunks.`);

    // 2. Distribute to workers
    const transcodingTasks = chunks.flatMap((chunk) =>
      job.resolutions.map((res) => this.workerService.process(chunk, res))
    );

    try {
      await Promise.all(transcodingTasks);
      console.log(`All chunks transcoded for video ${videoId}`);
      await this.finalizeVideo(videoId);
    } catch (error) {
      console.error("Transcoding failed:", error);
      await this.handleFailure(videoId);
    }
  }

  private async finalizeVideo(videoId: string) {
    // Generate DASH/HLS manifest files and update DB
    await this.metadataService.updateStatus(videoId, "READY");
  }
}

5. Summary: Key Architecture Trade-offs

Component	Choice	Rationale
Performance	Video Chunking	Essential for parallel transcoding and low-latency start times.
Delivery	Global CDN (GGC)	Caching popular videos close to users reduces egress costs and buffering.
Consistency	Eventual Consistency	View counts don't need to be 100% precise in real-time (Buffered updates).
Storage	Object Storage	Blob stores (S3/GCS) are the only way to scale to exabytes of binary data.

YouTube System Design: Scaling Video to Exabytes ​

1. Requirements ​

Functional ​

Non-Functional ​

2. High-Level Architecture ​

3. Technical Deep Dives ​

A. The Transcoding Pipeline ​

B. High Scale View Counting ​

C. Scaling Metadata with Vitess ​

4. Implementation Example: Transcoding Orchestrator ​

5. Summary: Key Architecture Trade-offs ​