Storage Systems — Core Concepts

Interview Relevance: High — Any system that stores media, files, or large data blobs requires you to choose the right storage type. "Design YouTube", "Design Dropbox", "Design S3" all depend on this.

The Three Storage Types

Comparison Table

Feature	Block Storage	File Storage	Object Storage
Access method	Raw I/O (OS/kernel)	File system (NFS/SMB)	HTTP REST API
Hierarchy	None (raw blocks)	Directories & files	Flat (bucket + key)
Mutability	✅ In-place update	✅ In-place update	❌ Replace whole object
Performance	✅ Lowest latency	Medium	Higher latency (network)
Scalability	Limited (attached to VM)	Medium	✅ Unlimited (exabytes)
Cost	Highest ($/GB)	Medium	✅ Cheapest ($/GB)
Sharing	❌ Single VM only	✅ Multi-client mount	✅ Global HTTP access
Durability	✅ RAID replication	✅ RAID replication	✅ 11 nines (S3)
Best for	DBs, VM boot disk	Shared config, legacy	Media, backups, data lake

Object Storage Deep Dive (S3)

Object storage is the most discussed in interviews — it underpins YouTube, Netflix, Dropbox, Instagram, and more.

Core Concepts

Uploading Large Files — Multipart Upload

Multipart upload benefits:

✅ Upload parts in parallel → much faster for large files
✅ Retry only failed parts (not the whole file)
✅ Required for files > 5GB (S3 limit on single PUT)
✅ Can pause and resume

Pre-signed URLs — Secure Direct Upload

How S3 Works Internally

S3 Durability: 99.999999999% (11 nines)

Data split into chunks using erasure coding (like RAID 6 — can lose multiple chunks and still reconstruct)
Stored across ≥ 3 Availability Zones in a region
Automatic bit-rot detection and self-healing

Storage for Common Interview Systems

Design YouTube — Video Storage Pipeline

Design Dropbox — File Sync

Dropbox deduplication insight:

File: cat.pdf (100MB) split into 4MB blocks
Block hashes: [sha256_1, sha256_2, sha256_3...]

If sha256_3 already exists in S3 → skip uploading that block
Same block used across different files (e.g., attached PDFs) → stored once
Result: 50%+ storage savings on average

Storage Tiers — Cost vs. Access Speed

Lifecycle policy (automatic tiering):

json

{
  "Rules": [
    { "Transition": { "Days": 30, "StorageClass": "STANDARD_IA" } },
    { "Transition": { "Days": 90, "StorageClass": "GLACIER" } },
    { "Expiration": { "Days": 365 } }
  ]
}

→ Hot for 30 days → cheaper for 60 more → archived → deleted after 1 year

Interview Cheat Sheet

One-Line Summaries

Block Storage:    Raw disk blocks — fastest I/O, single VM, databases (EBS)
File Storage:     Shared file system — multi-client, NFS (EFS)
Object Storage:   HTTP key-value blobs — unlimited, cheap, global (S3)
Multipart Upload: Split large files, upload parts in parallel, retry failed parts
Pre-signed URL:   Short-lived S3 URL for direct client upload — no server proxy
Erasure coding:   Like RAID — store redundant chunks, survive node failures
Deduplication:    Store unique blocks by hash — Dropbox saves 50%+ storage
Storage tiers:    Hot (ms) → Warm (IA) → Cold (Glacier) — tiered by access freq
CDN + S3:         S3 as origin, CloudFront at edge — global low-latency delivery

The Interview Phrase

"For storing user-uploaded videos, I'd use S3 as the object store
 with a two-bucket approach: raw-videos for originals and
 transcoded-videos for processed output. Uploads go directly from
 the client to S3 via a pre-signed URL — this avoids proxying
 gigabytes of video through my app servers. For large files, I'd
 use S3 multipart upload with 100MB parts so chunks can be uploaded
 in parallel and failed parts retried individually. After upload,
 a VideoUploaded event triggers Kafka, which fans out to transcoding
 workers that produce 1080p/720p/480p variants. All variants are
 served through CloudFront so viewers get the nearest edge node.
 Raw originals are moved to S3 Glacier after 90 days via a
 lifecycle policy to cut storage costs."

Red Flags vs. Green Flags

🔴 Red Flag	🟢 Green Flag
Proxy uploads through app server	Use pre-signed URLs for direct client-to-S3 upload
Single PUT for 10GB video file	Multipart upload — parallel + resumable
Store everything in S3 Standard	Lifecycle policies to tier cold data to Glacier
Use block storage for media files	Object storage (S3) for unstructured blobs
Use object storage for a running DB	Block storage (EBS) for low-latency DB I/O
No deduplication for user file sync	Content-addressable storage (hash-keyed blocks)

IMPORTANT

Never proxy large file uploads through your app servers. It wastes CPU, memory, and bandwidth. Always generate a pre-signed URL and let the client upload directly to S3. Your server only processes the metadata.

TIP

For "Design Dropbox" questions, the key insight is block-level deduplication: split files into 4MB chunks, hash each chunk, only store unique chunks. This is what makes Dropbox storage-efficient and sync fast — only changed blocks are re-uploaded.

Storage Systems — Core Concepts ​

The Three Storage Types ​

Comparison Table ​

Object Storage Deep Dive (S3) ​

Core Concepts ​

Uploading Large Files — Multipart Upload ​

Pre-signed URLs — Secure Direct Upload ​

How S3 Works Internally ​

Storage for Common Interview Systems ​

Design YouTube — Video Storage Pipeline ​

Design Dropbox — File Sync ​

Storage Tiers — Cost vs. Access Speed ​

Interview Cheat Sheet ​

One-Line Summaries ​

The Interview Phrase ​

Red Flags vs. Green Flags ​