Storage Systems — Core Concepts
Interview Relevance: High — Any system that stores media, files, or large data blobs requires you to choose the right storage type. "Design YouTube", "Design Dropbox", "Design S3" all depend on this.
The Three Storage Types
Comparison Table
| Feature | Block Storage | File Storage | Object Storage |
|---|---|---|---|
| Access method | Raw I/O (OS/kernel) | File system (NFS/SMB) | HTTP REST API |
| Hierarchy | None (raw blocks) | Directories & files | Flat (bucket + key) |
| Mutability | ✅ In-place update | ✅ In-place update | ❌ Replace whole object |
| Performance | ✅ Lowest latency | Medium | Higher latency (network) |
| Scalability | Limited (attached to VM) | Medium | ✅ Unlimited (exabytes) |
| Cost | Highest ($/GB) | Medium | ✅ Cheapest ($/GB) |
| Sharing | ❌ Single VM only | ✅ Multi-client mount | ✅ Global HTTP access |
| Durability | ✅ RAID replication | ✅ RAID replication | ✅ 11 nines (S3) |
| Best for | DBs, VM boot disk | Shared config, legacy | Media, backups, data lake |
Object Storage Deep Dive (S3)
Object storage is the most discussed in interviews — it underpins YouTube, Netflix, Dropbox, Instagram, and more.
Core Concepts
Uploading Large Files — Multipart Upload
Multipart upload benefits:
- ✅ Upload parts in parallel → much faster for large files
- ✅ Retry only failed parts (not the whole file)
- ✅ Required for files > 5GB (S3 limit on single PUT)
- ✅ Can pause and resume
Pre-signed URLs — Secure Direct Upload
How S3 Works Internally
S3 Durability: 99.999999999% (11 nines)
- Data split into chunks using erasure coding (like RAID 6 — can lose multiple chunks and still reconstruct)
- Stored across ≥ 3 Availability Zones in a region
- Automatic bit-rot detection and self-healing
Storage for Common Interview Systems
Design YouTube — Video Storage Pipeline
Design Dropbox — File Sync
Dropbox deduplication insight:
File: cat.pdf (100MB) split into 4MB blocks
Block hashes: [sha256_1, sha256_2, sha256_3...]
If sha256_3 already exists in S3 → skip uploading that block
Same block used across different files (e.g., attached PDFs) → stored once
Result: 50%+ storage savings on averageStorage Tiers — Cost vs. Access Speed
Lifecycle policy (automatic tiering):
{
"Rules": [
{ "Transition": { "Days": 30, "StorageClass": "STANDARD_IA" } },
{ "Transition": { "Days": 90, "StorageClass": "GLACIER" } },
{ "Expiration": { "Days": 365 } }
]
}→ Hot for 30 days → cheaper for 60 more → archived → deleted after 1 year
Interview Cheat Sheet
One-Line Summaries
Block Storage: Raw disk blocks — fastest I/O, single VM, databases (EBS)
File Storage: Shared file system — multi-client, NFS (EFS)
Object Storage: HTTP key-value blobs — unlimited, cheap, global (S3)
Multipart Upload: Split large files, upload parts in parallel, retry failed parts
Pre-signed URL: Short-lived S3 URL for direct client upload — no server proxy
Erasure coding: Like RAID — store redundant chunks, survive node failures
Deduplication: Store unique blocks by hash — Dropbox saves 50%+ storage
Storage tiers: Hot (ms) → Warm (IA) → Cold (Glacier) — tiered by access freq
CDN + S3: S3 as origin, CloudFront at edge — global low-latency deliveryThe Interview Phrase
"For storing user-uploaded videos, I'd use S3 as the object store
with a two-bucket approach: raw-videos for originals and
transcoded-videos for processed output. Uploads go directly from
the client to S3 via a pre-signed URL — this avoids proxying
gigabytes of video through my app servers. For large files, I'd
use S3 multipart upload with 100MB parts so chunks can be uploaded
in parallel and failed parts retried individually. After upload,
a VideoUploaded event triggers Kafka, which fans out to transcoding
workers that produce 1080p/720p/480p variants. All variants are
served through CloudFront so viewers get the nearest edge node.
Raw originals are moved to S3 Glacier after 90 days via a
lifecycle policy to cut storage costs."Red Flags vs. Green Flags
| 🔴 Red Flag | 🟢 Green Flag |
|---|---|
| Proxy uploads through app server | Use pre-signed URLs for direct client-to-S3 upload |
| Single PUT for 10GB video file | Multipart upload — parallel + resumable |
| Store everything in S3 Standard | Lifecycle policies to tier cold data to Glacier |
| Use block storage for media files | Object storage (S3) for unstructured blobs |
| Use object storage for a running DB | Block storage (EBS) for low-latency DB I/O |
| No deduplication for user file sync | Content-addressable storage (hash-keyed blocks) |
IMPORTANT
Never proxy large file uploads through your app servers. It wastes CPU, memory, and bandwidth. Always generate a pre-signed URL and let the client upload directly to S3. Your server only processes the metadata.
TIP
For "Design Dropbox" questions, the key insight is block-level deduplication: split files into 4MB chunks, hash each chunk, only store unique chunks. This is what makes Dropbox storage-efficient and sync fast — only changed blocks are re-uploaded.
