Instagram System Design: Scaling Media and Feeds
Instagram is a massive-scale social media platform focusing on photos and videos. Designing Instagram requires solving two major challenges: High-volume media storage and Efficient feed generation for billions of users.
1. Requirements
Functional
- Post Photos/Videos: Users can upload media with captions.
- Follow/Unfollow: Users can follow other users.
- News Feed: Users see a timeline of posts from people they follow.
- Likes & Comments: Users can interact with posts.
- Stories: Ephemeral 24-hour posts.
Non-Functional
- High Availability: The system must be always accessible.
- Low Latency: Feed generation and photo viewing must be fast (< 200ms).
- Scalability: Handle 100k+ uploads per second during peak times.
- Durability: Photos must never be lost.
2. High-Level Architecture
Instagram uses a microservices architecture to decouple media processing from social graph management.
3. Technical Deep Dives
A. Media Upload Pipeline
At Instagram's scale, we don't process images synchronously.
- Upload: The client sends the photo to the Photo Service, which uploads the original file to S3.
- Asynchronous Processing: An S3 event triggers a background worker (via Kafka/SQS).
- Image Processor:
- Resizes the image into multiple formats (Thumbnail, 1080p, 720p).
- Applies filters if selected.
- Stores processed versions back in S3.
- Metadata Update: Once processed, the worker updates the Metadata DB with the new photo URLs.
B. News Feed: The Hybrid Fan-out Model
Generating a feed for 1 billion users is the hardest problem. We use two models based on user popularity:
Push Model (Fan-out on Write):
- Used for "Regular" users.
- When you post, we push your post ID into the pre-computed feed caches (Redis) of all your followers.
- Pro: Viewing the feed is extremely fast (just a Redis read).
- Con: Inefficient for celebrities with 50M+ followers.
Pull Model (Fan-out on Load):
- Used for "Celebrities" (e.g., Cristiano Ronaldo).
- We do not push his posts to 500M+ people.
- Instead, when a follower loads their feed, we "pull" the latest posts from any celebrities they follow and merge them into the feed.
C. Sharding Strategy: Scalable Metadata
Instagram uses PostgreSQL but scales it through custom sharding.
- Data is sharded by
user_id. - Every shard contains multiple "logical shards" allowing for easy migration as volume grows.
- ID Generation: A custom Snowflake-like ID generator ensures unique IDs across shards without a central bottleneck.
4. Implementation Example: Feed Service
This example demonstrates how a Feed Service might handle the hybrid pulling of celebrity posts.
typescript
import { Redis } from "ioredis";
interface Post {
id: string;
userId: string;
mediaUrl: string;
timestamp: number;
}
class FeedService {
private redis: Redis;
constructor() {
this.redis = new Redis();
}
/**
* Retrieves the combined feed for a user
*/
async getFeed(userId: string): Promise<Post[]> {
// 1. Fetch pre-computed feed IDs from Redis (Regular users' posts)
const regularPostIds = await this.redis.lrange(`feed:${userId}`, 0, 49);
// 2. Identify Celebrities the user follows
const celebrities = await this.followService.getFollowedCelebrities(userId);
// 3. Pull latest posts from Celebrities (Fan-out on Load)
const celebrityPosts = await Promise.all(
celebrities.map((celebId) => this.postService.getLatestPosts(celebId, 5))
);
// 4. Merge and sort by timestamp
const allPosts = [
...(await this.getPostsByIds(regularPostIds)),
...celebrityPosts.flat(),
];
return allPosts.sort((a, b) => b.timestamp - a.timestamp);
}
private async getPostsByIds(ids: string[]): Promise<Post[]> {
// Fetch full post objects from cache or DB
return this.postCache.getMany(ids);
}
}5. Summary: Key Trade-offs
| Feature | Design Choice | Trade-off |
|---|---|---|
| Feed | Hybrid Push/Pull | Pulling for celebs reduces write-latency but adds complexity to feed retrieval. |
| Consistency | Eventual Consistency | Likes/Comments might be slightly out of sync for a few seconds to ensure high availability. |
| Storage | S3 + CDN | Increases infrastructure cost but ensures global low-latency viewing. |
