Skip to content

Back-of-the-Envelope Estimation (Step 2 Deep Dive)

Time Budget: 5 minutes — Optional but powerful. Demonstrates engineering maturity.

Back-of-the-envelope estimation is the art of making fast, structured approximations to understand the scale a system needs to handle. You don't need exact numbers — you need the right order of magnitude.


Why It Matters

Estimation bridges Requirements Clarification and High-Level Design. It answers the question:

"Is this a system that handles 100 requests/sec or 100,000 requests/sec? The architecture looks completely different."

Interviewers use this phase to assess:

  • Do you know your powers of two and latency numbers?
  • Can you reason about scale without a calculator?
  • Does your design match the scale you estimated?

The Estimation Flow


The Two Foundational Tables

Before estimating anything, internalize these two tables. They are your building blocks.

Powers of Two (Data Sizes)

PowerApprox ValueNameExample
2^101 Thousand1 KBA short text message
2^201 Million1 MBA compressed photo
2^301 Billion1 GBA standard movie file
2^401 Trillion1 TBA large database
2^501 Quadrillion1 PBA data warehouse

Latency Numbers Every Engineer Should Know

OperationLatencyRule of Thumb
L1 cache hit~0.5 nsFastest possible
RAM access~100 ns200× slower than L1
SSD random read~150 µs1,000× slower than RAM
Network round trip (same DC)~500 µsAvoid unnecessary network hops
HDD seek~10 msAvoid disk seeks at scale
Network round trip (cross-continent)~150 msUser-perceived latency

TIP

Memorize these three: RAM (100ns) → SSD (150µs) → Network same DC (500µs). They anchor every latency argument you'll make.


The 4 Estimation Pillars

Every estimation exercise covers these four areas:


Worked Example: URL Shortener

Using the requirements from Step 1:

  • 100 million URLs shortened per day (writes)
  • 10 billion redirects per day (reads)

Pillar 1 — Throughput

text
WRITES (URL Shortening):
  100 million / day
  = 100,000,000 / 86,400 seconds
  ≈ 1,160 writes/sec
  Round up → ~1,200 writes/sec

READS (Redirects):
  10 billion / day
  = 10,000,000,000 / 86,400 seconds
  ≈ 115,740 reads/sec
  Round up → ~116,000 reads/sec

READ-TO-WRITE RATIO:
  116,000 / 1,200 ≈ 100:1  → Extremely Read-Heavy ✅

Design signal: A 100:1 read-to-write ratio means we need aggressive caching (Redis/Memcached) and potentially a CDN to absorb redirect traffic. Write infrastructure can be modest.


Pillar 2 — Storage

text
RECORD SIZE (one shortened URL entry):
  short_url_hash:  7 bytes  (e.g., "abc1234")
  long_url:       200 bytes (average URL length)
  created_at:       8 bytes (timestamp)
  expiry_at:        8 bytes (timestamp)
  user_id:          8 bytes (optional FK)
  ─────────────────────────────────────
  Total:         ~231 bytes ≈ 500 bytes (with overhead)

DAILY NEW DATA:
  100M records/day × 500 bytes = 50 GB/day

5-YEAR TOTAL:
  50 GB/day × 365 × 5 = ~91 TB

10-YEAR TOTAL:
  50 GB/day × 365 × 10 = ~182 TB

Design signal: At ~182 TB over 10 years, we cannot store everything in a single database. We need database sharding or a distributed store like Cassandra or DynamoDB.


Pillar 3 — Bandwidth

text
WRITE BANDWIDTH (inbound):
  1,200 writes/sec × 500 bytes = 600 KB/sec ≈ 0.6 MB/sec
  → Negligible

READ BANDWIDTH (outbound redirect responses):
  116,000 reads/sec × 500 bytes = 58 MB/sec
  → Peak could be 2–3×: ~120–175 MB/sec ≈ ~1.4 Gbps

  (A standard 10 Gbps NIC handles this on a single server,
   but we still need multiple servers for fault tolerance.)

Design signal: Outbound bandwidth is significant. A CDN can cache hot short URLs at the edge and absorb the majority of this traffic before it even reaches our servers.


Pillar 4 — Memory / Cache

text
HOT URLs (20% of URLs drive 80% of traffic — Pareto principle):
  100M active URLs × 20% = 20 million hot URLs

CACHE SIZE REQUIRED:
  20 million URLs × 500 bytes = 10 GB of cache

CACHE SERVER:
  A single Redis server with 32–64 GB RAM can hold
  the entire working set → 1-2 Redis nodes is sufficient
  (with replication for HA)

Design signal: Our working set fits entirely in memory. Cache-hit ratio will be very high if we use an LRU eviction policy — this will absorb nearly all read traffic.


The Full Estimation Summary (What You Say to the Interviewer)


Common Estimation Shortcuts

These mental shortcuts keep your math fast and clean in the interview:

ScenarioShortcut
Seconds in a day~86,400 → round to 100,000 for easy math
Seconds in a month~2.5 million → round to 2.5M
Seconds in a year~31.5 million → round to 30M
Average photo size~300 KB (compressed)
Average video (1 min, 720p)~50–100 MB
Average tweet / short post~300 bytes
Average webpage~1 MB
Pareto Rule (80/20)20% of content → 80% of reads (always assume for caching)

Estimation Vocabulary (Say These Out Loud)

Using the right language shows engineering maturity:

Instead of...Say this...
"A lot of users""Assuming 500 million DAU..."
"Fast response""P99 latency under 100ms..."
"Big database""~182 TB over 10 years, so we'll need sharding..."
"Popular items""Assuming 80/20 rule, 20% of URLs account for 80% of traffic..."
"Maybe use cache""A 10 GB working set fits in Redis, so cache-hit ratio should be ~95%..."

Red Flags vs. Green Flags

🔴 Red Flag🟢 Green Flag
Skip estimation entirelyDo a 3-minute estimation even if interviewer says "optional"
Exact, precise numbers (false confidence)Round numbers, acknowledge they're approximations
Estimate then ignore the resultsLet estimates drive design decisions
Just calculate — don't interpretTranslate numbers into architecture signals
Forget to state assumptionsExplicitly say "I'm assuming X because..."

Next Steps

With scale now quantified, move to Step 3: High-Level Design → where your estimates become the foundation for every architectural decision.

IMPORTANT

Always connect your estimates to design decisions. Don't just throw numbers — say "Because we have 100:1 read-to-write ratio, I'll add a Redis cache in front of the database." That's what separates a senior candidate from a junior one.

WARNING

Don't spend more than 5 minutes here. If you find yourself going deep into precise math, snap back to round numbers and move on.

Released under the ISC License.