YouTube System Design: Scaling Video to Exabytes
YouTube is one of the largest data platforms on Earth. Its primary challenge is not just "storing" video, but processing and delivering it to billions of users with zero buffering.
1. Requirements
Functional
- Upload Video: Users can upload high-resolution video files.
- Play Video: Users can stream videos in various qualities (ABR).
- Search: Users can find videos by title/tags.
- View Count: Real-time tracking of video popularity.
- Comments/Likes: Interaction layer.
Non-Functional
- Reliability: Uploaded videos should never be lost (99.999% durability).
- Availability: High availability for playback.
- Low Latency: Minimal start time and no mid-stream buffering.
- Scalability: Handle 500+ hours of video uploaded every minute.
2. High-Level Architecture
YouTube separates the Write Path (Upload/Process) from the Read Path (Stream/Search).
3. Technical Deep Dives
A. The Transcoding Pipeline
Raw video files are massive. To serve them efficiently, YouTube must:
- Chunking: The original video is broken into small chunks (GOP - Group of Pictures).
- Parallel Processing: Different workers transcode these chunks into multiple formats (MP4, WebM) and resolutions (360p, 720p, 1080p, 4K) simultaneously.
- Merging: Transcoded chunks are merged back into a set of manifest files.
- Adaptive Bitrate Streaming (ABR): YouTube uses protocols like DASH (Dynamic Adaptive Streaming over HTTP) and HLS. The player automatically detects your bandwidth and requests the appropriate resolution chunk.
B. High Scale View Counting
For a viral video, thousands of views arrive every second.
- The Problem: Updating a single database row 1,000 times per second causes massive contention and locking.
- The Solution: Buffering & Aggregation.
- Requests are sent to a high-throughput bus (Kafka).
- An Accumulator Service pulls from Kafka and aggregates views in memory for 10-60 seconds.
- The aggregated count (e.g., +500 views) is written to the database in a single batch update.
C. Scaling Metadata with Vitess
YouTube traditionally used MySQL. To scale horizontally, they built Vitess:
- Sharding: Automatically shards data across multiple MySQL instances.
- Connection Pooling: Handles thousands of concurrent connections efficiently.
- Query Routing: Routes SQL queries to the correct shard based on the
video_idoruser_id.
4. Implementation Example: Transcoding Orchestrator
This example demonstrates how a Master service might orchestrate a video transcoding job.
typescript
interface TranscodingJob {
videoId: string;
resolutions: string[]; // ['360p', '720p', '1080p']
status: "QUEUED" | "PROCESSING" | "COMPLETED" | "FAILED";
}
class TranscodingOrchestrator {
private queue: any[] = [];
async startJob(videoId: string) {
const job: TranscodingJob = {
videoId,
resolutions: ["360p", "720p", "1080p"],
status: "QUEUED",
};
// 1. Chunk the video (Conceptual)
const chunks = await this.videoProcessor.splitToChunks(videoId);
console.log(`Video ${videoId} split into ${chunks.length} chunks.`);
// 2. Distribute to workers
const transcodingTasks = chunks.flatMap((chunk) =>
job.resolutions.map((res) => this.workerService.process(chunk, res))
);
try {
await Promise.all(transcodingTasks);
console.log(`All chunks transcoded for video ${videoId}`);
await this.finalizeVideo(videoId);
} catch (error) {
console.error("Transcoding failed:", error);
await this.handleFailure(videoId);
}
}
private async finalizeVideo(videoId: string) {
// Generate DASH/HLS manifest files and update DB
await this.metadataService.updateStatus(videoId, "READY");
}
}5. Summary: Key Architecture Trade-offs
| Component | Choice | Rationale |
|---|---|---|
| Performance | Video Chunking | Essential for parallel transcoding and low-latency start times. |
| Delivery | Global CDN (GGC) | Caching popular videos close to users reduces egress costs and buffering. |
| Consistency | Eventual Consistency | View counts don't need to be 100% precise in real-time (Buffered updates). |
| Storage | Object Storage | Blob stores (S3/GCS) are the only way to scale to exabytes of binary data. |
