Skip to content

Storage Systems — Core Concepts

Interview Relevance: High — Any system that stores media, files, or large data blobs requires you to choose the right storage type. "Design YouTube", "Design Dropbox", "Design S3" all depend on this.


The Three Storage Types


Comparison Table

FeatureBlock StorageFile StorageObject Storage
Access methodRaw I/O (OS/kernel)File system (NFS/SMB)HTTP REST API
HierarchyNone (raw blocks)Directories & filesFlat (bucket + key)
Mutability✅ In-place update✅ In-place update❌ Replace whole object
Performance✅ Lowest latencyMediumHigher latency (network)
ScalabilityLimited (attached to VM)Medium✅ Unlimited (exabytes)
CostHighest ($/GB)Medium✅ Cheapest ($/GB)
Sharing❌ Single VM only✅ Multi-client mount✅ Global HTTP access
Durability✅ RAID replication✅ RAID replication✅ 11 nines (S3)
Best forDBs, VM boot diskShared config, legacyMedia, backups, data lake

Object Storage Deep Dive (S3)

Object storage is the most discussed in interviews — it underpins YouTube, Netflix, Dropbox, Instagram, and more.

Core Concepts

Uploading Large Files — Multipart Upload

Multipart upload benefits:

  • ✅ Upload parts in parallel → much faster for large files
  • ✅ Retry only failed parts (not the whole file)
  • ✅ Required for files > 5GB (S3 limit on single PUT)
  • ✅ Can pause and resume

Pre-signed URLs — Secure Direct Upload


How S3 Works Internally

S3 Durability: 99.999999999% (11 nines)

  • Data split into chunks using erasure coding (like RAID 6 — can lose multiple chunks and still reconstruct)
  • Stored across ≥ 3 Availability Zones in a region
  • Automatic bit-rot detection and self-healing

Storage for Common Interview Systems

Design YouTube — Video Storage Pipeline

Design Dropbox — File Sync

Dropbox deduplication insight:

File: cat.pdf (100MB) split into 4MB blocks
Block hashes: [sha256_1, sha256_2, sha256_3...]

If sha256_3 already exists in S3 → skip uploading that block
Same block used across different files (e.g., attached PDFs) → stored once
Result: 50%+ storage savings on average

Storage Tiers — Cost vs. Access Speed

Lifecycle policy (automatic tiering):

json
{
  "Rules": [
    { "Transition": { "Days": 30, "StorageClass": "STANDARD_IA" } },
    { "Transition": { "Days": 90, "StorageClass": "GLACIER" } },
    { "Expiration": { "Days": 365 } }
  ]
}

→ Hot for 30 days → cheaper for 60 more → archived → deleted after 1 year


Interview Cheat Sheet

One-Line Summaries

Block Storage:    Raw disk blocks — fastest I/O, single VM, databases (EBS)
File Storage:     Shared file system — multi-client, NFS (EFS)
Object Storage:   HTTP key-value blobs — unlimited, cheap, global (S3)
Multipart Upload: Split large files, upload parts in parallel, retry failed parts
Pre-signed URL:   Short-lived S3 URL for direct client upload — no server proxy
Erasure coding:   Like RAID — store redundant chunks, survive node failures
Deduplication:    Store unique blocks by hash — Dropbox saves 50%+ storage
Storage tiers:    Hot (ms) → Warm (IA) → Cold (Glacier) — tiered by access freq
CDN + S3:         S3 as origin, CloudFront at edge — global low-latency delivery

The Interview Phrase

"For storing user-uploaded videos, I'd use S3 as the object store
 with a two-bucket approach: raw-videos for originals and
 transcoded-videos for processed output. Uploads go directly from
 the client to S3 via a pre-signed URL — this avoids proxying
 gigabytes of video through my app servers. For large files, I'd
 use S3 multipart upload with 100MB parts so chunks can be uploaded
 in parallel and failed parts retried individually. After upload,
 a VideoUploaded event triggers Kafka, which fans out to transcoding
 workers that produce 1080p/720p/480p variants. All variants are
 served through CloudFront so viewers get the nearest edge node.
 Raw originals are moved to S3 Glacier after 90 days via a
 lifecycle policy to cut storage costs."

Red Flags vs. Green Flags

🔴 Red Flag🟢 Green Flag
Proxy uploads through app serverUse pre-signed URLs for direct client-to-S3 upload
Single PUT for 10GB video fileMultipart upload — parallel + resumable
Store everything in S3 StandardLifecycle policies to tier cold data to Glacier
Use block storage for media filesObject storage (S3) for unstructured blobs
Use object storage for a running DBBlock storage (EBS) for low-latency DB I/O
No deduplication for user file syncContent-addressable storage (hash-keyed blocks)

IMPORTANT

Never proxy large file uploads through your app servers. It wastes CPU, memory, and bandwidth. Always generate a pre-signed URL and let the client upload directly to S3. Your server only processes the metadata.

TIP

For "Design Dropbox" questions, the key insight is block-level deduplication: split files into 4MB chunks, hash each chunk, only store unique chunks. This is what makes Dropbox storage-efficient and sync fast — only changed blocks are re-uploaded.

Released under the ISC License.