Skip to content

🏗️ System Design Mastery — Comprehensive Guide

Master system design from fundamentals to real-world architectures. A complete, structured, free, and open-source learning platform for software engineers to confidently design large-scale distributed systems used at companies like Google, Netflix, Uber, Amazon, Meta, and Microsoft.


📑 Table of Contents


📚 Project Overview

System Design Mastery is a comprehensive, free, and open-source learning platform designed to teach software engineers how to architect and design large-scale distributed systems. Whether you're preparing for technical interviews at FAANG companies, building production systems at your organization, or deepening your architectural knowledge, this resource provides a structured, progressive learning experience.

🎯 Mission

  • Democratize System Design Knowledge: Make advanced architectural concepts accessible to engineers at all levels
  • Provide Structured Learning: Offer a clear progression from fundamentals to advanced patterns
  • Enable Interview Success: Prepare engineers to confidently design systems in technical interviews
  • Reflect Real-World Practices: Showcase how industry leaders design and scale systems
  • Foster Community Learning: Build an open-source resource that grows through community contributions

✨ Key Features

  • 12 Progressive Learning Modules - From fundamentals to advanced patterns and academic research
  • 15+ Real-World Case Studies - In-depth analysis of Netflix, Uber, Twitter, WhatsApp, Instagram, YouTube, TikTok, and more
  • 100+ Interactive Diagrams - Mermaid visualizations for every concept and system
  • Interview Frameworks & Templates - Step-by-step approaches for system design interviews
  • Design Patterns & Best Practices - Proven architectural patterns used at scale
  • Practical Examples & Trade-offs - Actionable insights with real numbers and comparisons
  • Academic Research Path - Resources for PhD candidates and researchers in distributed systems
  • Always Updated - Community contributions and latest industry best practices
  • Production-Ready Content - Thoroughly researched and technically accurate

👥 Who Should Use This?

RoleLearning PathTime Estimate
Beginners (0-2 yrs)Levels 1-3 → 7 → 84-6 weeks
Backend Engineers (2-5 yrs)Levels 3-6 → 7 → 93-4 weeks
Interview CandidatesLevels 1-2 (review) → 7-8 (focus)2-3 weeks
Tech Leads & ArchitectsAll levels + focus on 6, 9-10Ongoing
PhD ResearchersAll levels + Level 11Variable

🚀 Quick Start

Visit https://souravsdm.vercel.app to start learning immediately. No installation required.

Local Development

bash
git clone https://github.com/SOURAV-ROY/sdm.git
cd sdm
npm install
npm run dev
# Open http://localhost:5173

Offline Access

bash
npm run build
# Content is in .vitepress/dist/
# Open index.html in your browser for offline reading

� Learning Modules

This comprehensive guide is organized into 12 progressive modules, each building upon the previous to develop complete mastery of system design:

Level 1: Foundations 🔧

Master the core networking and system design concepts that underpin all distributed systems.

  • Topics: Networking (TCP/IP, HTTP), DNS, Load Balancing, Architecture Fundamentals, Design Patterns, Fault Tolerance, Clustering
  • Key Learning: Understand what happens when you type google.com, proxy architectures, server failure handling
  • Why: These fundamentals are non-negotiable—every system design decision builds on these principles

Level 2: Scalability 📈

Learn the principles of horizontal and vertical scaling, and understand the fundamental trade-offs in distributed systems.

  • Topics: Horizontal vs Vertical Scaling, CAP Theorem & Trade-offs, Consistency Models, Scaling from 1 to 1 Billion Users
  • Key Learning: When to scale, why consistency matters, managing growth
  • Why: Scalability is the core challenge of system design—learning these principles prevents architecture mistakes

Level 3: Databases 🗄️

Understand different database paradigms and how to choose the right database for your system.

  • Topics: SQL vs NoSQL Comparison, Database Sharding & Replication, Indexing Strategies, Database Management Best Practices
  • Key Learning: Sharding patterns, replication strategies, when to use which database
  • Why: Database design decisions have cascading effects on system scalability and reliability

Level 4: Caching

Master caching strategies to dramatically improve system performance.

  • Topics: Cache Strategies & Patterns, Redis Architecture, CDN Optimization, Cache Eviction Policies
  • Key Learning: When and what to cache, cache invalidation patterns, multi-tier caching
  • Why: Caching is often the highest-leverage optimization in distributed systems

Level 5: Messaging & Queues 📨

Build event-driven, decoupled, and resilient systems using messaging architectures.

  • Topics: Event-Driven Architecture, Kafka & RabbitMQ Basics, Dead Letter Queues, Pub/Sub Patterns
  • Key Learning: Asynchronous processing, event sourcing, message reliability
  • Why: Messaging enables systems to be loosely coupled, resilient, and scalable

Level 6: Microservices 🎯

Design and manage systems built from independent, scalable services.

  • Topics: Service Mesh Concepts, API Gateway Patterns, Circuit Breaker Pattern, Service-to-Service Communication
  • Key Learning: Service boundaries, communication patterns, failure isolation
  • Why: Microservices enable independent scaling and deployment but require careful architectural decisions

Level 7: Real-World Systems 🌍

Apply your knowledge to design and understand real systems built and scaled by industry leaders.

  • Systems Covered (15+ real-world designs):
    • Social/Communication: Twitter/X, Instagram, WhatsApp, TikTok
    • Transportation/Commerce: Uber, Amazon, Airbnb
    • Content: Netflix, YouTube, Google Search
    • Utilities: URL Shortener, Notification System (1B/day), GitHub, Candy Crush
  • Key Learning: Real trade-offs, complete system design, interview-ready case studies
  • Why: Real-world examples show how theory applies in practice at massive scale

Level 8: Case Studies & Interview Prep 🎤

Prepare for and excel in system design interviews with proven frameworks.

  • Topics: Interview Frameworks & Approaches, Common Mistakes & Pitfalls, System Design Templates, Discussion Strategies
  • Key Learning: How to think out loud, explain trade-offs, handle interviewer challenges
  • Why: Interview success requires not just knowledge but communication skills and frameworks

Level 9: Design Patterns 🏛️

Master reusable architectural patterns that solve common distributed system problems.

  • Topics: Architectural Patterns, Distributed Patterns, Performance Patterns, Reliability Patterns
  • Key Learning: When and how to apply proven patterns, avoiding anti-patterns
  • Why: Patterns are proven solutions—using them speeds up design and prevents mistakes

Level 10: Monitoring & Observability 📊

Build systems you can understand and debug at scale through comprehensive observability.

  • Topics: System Observability Fundamentals, Metrics & Logging, Distributed Tracing & Debugging, Performance Optimization, SLOs & SLIs
  • Key Learning: What to monitor, how to debug distributed systems, performance optimization
  • Why: You can't operate systems you can't observe—monitoring is not optional in production

Level 11: Academic Research 🔬

Explore the research foundation of distributed systems for those interested in PhD research and cutting-edge concepts.

  • Topics: Finding a Research Niche, Seminal Papers & Reading List, Academic Publication Flow, Research Methodologies
  • Key Learning: How to identify research problems, read academic papers, contribute to the field
  • Why: Academic research drives innovation in distributed systems

Level 12: Interview Preparation 🚀

Comprehensive interview preparation with mock scenarios, frameworks, and practice systems.

  • Topics: Complete Interview Frameworks, Multiple Practice Systems, Mock Interview Guides, Advanced Interview Strategies
  • Key Learning: Full end-to-end interview preparation, confidence building
  • Why: Dedicated interview prep module for focused preparation before interviews

This repository is a comprehensive, free, and open-source learning platform dedicated to mastering system design. Whether you're preparing for technical interviews, building production systems, deepening your architectural knowledge, or conducting research, this resource provides:

  • Structured Learning Path - 12 progressive levels from fundamentals to advanced patterns and academic research
  • Real-World Case Studies - 15+ in-depth system designs including Netflix, Uber, Twitter, WhatsApp, Instagram, YouTube, TikTok, and more
  • Interactive Diagrams - 100+ Mermaid visualizations for every concept and system design
  • Interview Frameworks - Step-by-step templates and strategies used in FAANG interviews
  • Design Patterns - Proven architectural patterns with when and how to apply them
  • Practical Examples - Real numbers, trade-offs, and actionable insights
  • Academic Resources - Research paths for those interested in distributed systems research
  • Always Updated - Community-driven with latest best practices
  • Production-Ready - Thoroughly researched and technically accurate

⚡ Golden Rules for System Engineers

These are the non-negotiable principles that separate great system designers from average ones.

1. Always Start with Requirements, Never with Solutions

Before drawing a single box, ask: Who uses this? What must it do? What are the performance constraints? Jumping to solutions without understanding the problem leads to overengineering.

2. Numbers Don't Lie — Estimate Everything

Back every design decision with math. If you say "we need caching," prove it:

  • DAU × requests/day = total RPS
  • Object size × writes/day × retention = storage
  • Bandwidth = RPS × average payload size

3. No Silver Bullets — Everything is a Trade-off

DecisionYou GainYou Lose
SQL → NoSQLHorizontal scaleACID, complex queries
CachingSpeedConsistency, staleness
Async queuesDecoupling, resilienceLatency, complexity
MicroservicesIndependent deploymentsOperational overhead
Strong consistencyCorrectnessAvailability, latency

4. Design for Failure First

Assume every component will fail. Ask: What happens when the DB goes down? When does a service crash? When the network partitions? Good systems degrade gracefully, not catastrophically.

5. Scale Incrementally — Don't Pre-Optimize

Start simple. A single server handles more than you think. Scale only when you have data showing you need to. Premature optimization is the root of all evil in distributed systems.

6. The Three Pillars of Production Systems

Every system you design must explicitly address:

  • 📊 Observability: Metrics, logging, distributed tracing
  • 🔒 Security: AuthN/AuthZ, encryption, rate limiting
  • 🔧 Operability: Deploy, rollback, runbooks


🗺️ Learning Roadmap

LEVEL 1 — Foundations
  ├── Networking basics, HTTP, DNS, Load Balancing
  ├── High-level architecture concepts
  ├── Design patterns fundamentals
  ├── What happens when you type google.com?
  ├── Proxy & Reverse Proxy
  ├── Server Crashes & Fault Tolerance
  └── Clustering Fundamentals

LEVEL 2 — Scalability
  ├── Horizontal vs Vertical scaling
  ├── CAP Theorem & Trade-offs
  ├── Consistency models
  └── Scaling from 1 to 1 billion users

LEVEL 3 — Databases
  ├── SQL vs NoSQL comparison
  ├── Database sharding & replication
  ├── Indexing strategies
  └── Database management best practices

LEVEL 4 — Caching
  ├── Cache strategies & patterns
  ├── Redis architecture
  ├── CDN optimization
  └── Cache eviction policies

LEVEL 5 — Messaging & Queues
  ├── Event-driven architecture
  ├── Kafka & RabbitMQ basics
  ├── Dead letter queues
  └── Pub/Sub patterns

LEVEL 6 — Microservices
  ├── Service mesh concepts
  ├── API Gateway patterns
  ├── Circuit breaker pattern
  └── Service-to-service communication

LEVEL 7 — Real-World Systems
  ├── URL Shortener (tinyurl.com)
  ├── Twitter Clone (X)
  ├── Netflix architecture
  ├── Uber system design
  ├── WhatsApp infrastructure
  ├── Notification System (1B/day)
  ├── Instagram Architecture
  ├── Amazon E-commerce Scale
  ├── YouTube Video Platform
  ├── Google Search Architecture
  ├── GitHub Repository Hosting
  ├── TikTok Recommendation Engine
  ├── Candy Crush Saga at 1 Billion Users
  └── Airbnb Availability & Booking

LEVEL 8 — Case Studies & Interview Prep
  ├── Interview frameworks & approach
  ├── Common mistakes & pitfalls
  ├── System design templates
  └── Discussion strategies

LEVEL 9 — Design Patterns
  ├── Architectural patterns
  ├── Distributed patterns
  ├── Performance patterns
  └── Reliability patterns

LEVEL 10 — Monitoring & Observability
  ├── System observability
  ├── Metrics & logging
  ├── Tracing & debugging
  └── Performance optimization

LEVEL 11 — Academic Research
  ├── Finding a Research Niche
  ├── Required Seminal Papers
  └── The Academic Publication Flow

LEVEL 12 — Interview Preparation
  ├── Complete Interview Frameworks
  ├── Multiple Practice Systems
  ├── Mock Interview Guides
  └── Advanced Interview Strategies

📂 Repository Structure

This is a VitePress-based documentation site with a modular structure. Each numbered directory represents a learning level.

sdm/
├── 01-foundations/                    # Core networking & design fundamentals
│   ├── README.md
│   ├── networking.md                 # TCP/IP, HTTP, DNS
│   ├── load-balancing-algorithms.md  # LB strategies
│   ├── what-happens-when-you-type-google.md
│   ├── fault-tolerance-crashes.md
│   └── design-patterns.md

├── 02-scalability/                    # Scalability principles & CAP theorem
│   ├── README.md
│   ├── scaling-1-to-1-billion.md     # Growth strategies
│   └── ...

├── 03-databases/                      # Database paradigms & strategies
│   ├── README.md
│   ├── database-sharding.md
│   └── ...

├── 04-caching/                        # Caching strategies & patterns
│   ├── README.md
│   ├── cdn-optimization.md
│   └── ...

├── 05-messaging/                      # Event-driven & message queues
│   ├── README.md
│   ├── dead-letter-queues.md
│   └── ...

├── 06-microservices/                  # Microservices architecture
│   ├── README.md
│   ├── grpc.md
│   ├── api-gateway.md
│   └── ...

├── 07-real-world-systems/             # 15+ real system case studies
│   ├── url-shortener/README.md       # URL shortening service
│   ├── twitter-clone/README.md       # Twitter/X architecture
│   ├── netflix/README.md             # Streaming platform
│   ├── uber/README.md                # Ride-sharing platform
│   ├── whatsapp/README.md            # Messaging system
│   ├── notification-system/README.md # 1B/day notification system
│   ├── instagram/README.md           # Photo sharing platform
│   ├── amazon/README.md              # E-commerce at scale
│   ├── youtube/README.md             # Video platform
│   ├── google-search/README.md       # Search engine
│   ├── github/README.md              # Repository hosting
│   ├── tiktok/README.md              # Recommendation engine
│   ├── candy-crush/README.md         # Gaming at 1B scale
│   └── airbnb/README.md              # Availability & booking

├── 08-case-studies/                   # Interview prep frameworks
│   ├── README.md
│   ├── interview-template.md         # Interview approach
│   ├── common-mistakes.md
│   └── frameworks.md

├── 09-design-patterns/                # Reusable architectural patterns
│   ├── README.md
│   ├── architectural-patterns.md
│   ├── distributed-patterns.md
│   └── ...

├── 10-monitoring/                     # Observability & production systems
│   ├── README.md
│   ├── metrics-logging.md
│   ├── distributed-tracing.md
│   └── ...

├── 11-academic-research/              # PhD research resources
│   ├── README.md
│   ├── research-methodology.md
│   └── seminal-papers.md

├── 12-interview-prep/                 # Dedicated interview preparation
│   ├── README.md
│   ├── complete-frameworks.md
│   ├── mock-interviews.md
│   └── ...

├── .vitepress/                        # VitePress configuration
│   ├── config.ts                     # Build & site config (sidebar, search, plugins)
│   ├── theme/index.ts                # Theme customization
│   ├── theme/custom.css              # Custom styling
│   └── dist/                         # Production build output

├── public/                            # Static assets
│   ├── logo.png
│   └── images/

├── .github/workflows/                 # CI/CD pipelines
│   └── vercel-deploy.yml             # Automated deployment

├── index.md                           # Home page (VitePress landing)
├── roadmap.md                         # Interactive learning roadmap
├── guidelife-flow.md                  # Learning flow guide
├── package.json                       # Dependencies & scripts
├── .prettierrc                        # Code formatting config
└── README.md                          # This file

Directory Naming Convention

All learning modules follow the pattern: NN-topic-name/ where:

  • NN = Two-digit level number (01, 02, ... 12)
  • topic-name = Kebab-cased topic title

This convention enables the VitePress config to automatically generate the navigation sidebar.



🎓 Learning Paths by Role

Choose your learning path based on your role and goals:

Path 1️⃣ Beginner (0-2 years experience)

Goal: Build solid foundations and prepare for interviews
Duration: 4-6 weeks (2-3 hours/week)

1. Level 1: Foundations (complete all topics)
2. Level 2: Scalability (focus on CAP theorem)
3. Level 3: Databases (SQL vs NoSQL)
4. Level 7: Real-World Systems (start with URL Shortener)
5. Level 8: Case Studies (study frameworks)

Outcome: Understand core concepts, pass junior engineer interviews

Path 2️⃣ Intermediate Backend Engineer (2-5 years)

Goal: Deepen expertise and prepare for senior roles
Duration: 3-4 weeks (4-5 hours/week)

1. Level 3: Databases (deep dive into sharding)
2. Level 4: Caching (master cache patterns)
3. Level 5: Messaging (event-driven architecture)
4. Level 6: Microservices (service design)
5. Level 7: Real-World Systems (Netflix, Uber, Twitter)
6. Level 9: Design Patterns (proven solutions)

Outcome: Senior-level technical competency, architectural thinking

Path 3️⃣ Interview Preparation (All levels)

Goal: Excel in system design interviews
Duration: 2-3 weeks (5-6 hours/week)

1. Level 1: Foundations (quick review - 2-3 hours)
2. Level 2: Scalability (focus on CAP - 2-3 hours)
3. Level 7: Real-World Systems (practice 8-10 systems - 8-10 hours)
   ├── Start: URL Shortener
   ├── Medium: Twitter, Notification System
   ├── Hard: Netflix, Uber, Instagram
4. Level 8: Case Studies (study interview frameworks - 2 hours)
5. Level 12: Interview Prep (mock interviews - 3-4 hours)

Outcome: Confident in interviews, handles edge cases, communicates clearly

Path 4️⃣ Tech Lead / Architect (5+ years)

Goal: Guide architectural decisions and mentor teams
Duration: Ongoing (2-3 hours/week)

1. Review all levels for context (skip if familiar)
2. Level 6: Microservices (system organization)
3. Level 9: Design Patterns (architectural patterns)
4. Level 10: Monitoring (production systems)
5. Periodically: Review Level 7 for new systems

Outcome: Architectural expertise, mentoring capability, informed technical decisions

Path 5️⃣ PhD Researcher (Systems & Distributed Computing)

Goal: Understand research landscape and contribute
Duration: Variable

1. All levels 1-6 (foundational knowledge)
2. Level 7: Real-World Systems (see what industry does)
3. Level 9: Design Patterns (know solutions)
4. Level 11: Academic Research (research methodology)
   ├── Find research niche
   ├── Read seminal papers
   ├── Identify open problems
5. Deep dive into specialized areas

Outcome: Research-ready, knows problem space, can identify gaps


💡 Better Learning Tips

Do this:

  • Read Level 1 fundamentals completely—don't skip networking or consistency models
  • Draw and design systems yourself, don't just read
  • Take notes and create your own design templates
  • Practice each real-world system multiple times (3-5 times)
  • Discuss designs with peers to deepen understanding
  • Reference GitHub implementations of patterns

Don't do this:

  • Memorize content—focus on understanding trade-offs
  • Skip "boring" foundational topics like networking
  • Give up on hard concepts like CAP Theorem—revisit them
  • Design systems alone—always discuss and get feedback
  • Assume one design works for all scenarios—context matters

🔑 The 8-Step System Design Framework

When asked to design any system in an interview or real project, follow this systematic approach:

Step-by-Step Breakdown

StepFocusKey Questions
1. Clarify RequirementsFunctional + Non-functionalWho are the users? What features needed? Latency, throughput, consistency?
2. Estimate ScaleUsers, RPS, Storage, BandwidthDaily/monthly active users? Concurrent connections? Data size? Growth rate?
3. High-Level DesignSystem components & flowWhat are the main services? How do they interact? Where are bottlenecks?
4. Deep DiveDatabase design, APIs, data flowSchema design? Primary keys? Indexing? API endpoints? Data models?
5. ScalabilityHandle 10x, 100x loadHow to scale databases? Cache layers? Load balancing? Replication?
6. ReliabilityFault tolerance, redundancyHow to handle failures? Backup strategies? Monitoring? Alerting?
7. Trade-offsAnalyze sacrificesWhy NoSQL over SQL? Cache vs accuracy? Strong vs eventual consistency?
8. Final ReviewOptimization & improvementsAny bottlenecks? Can we optimize further? What would we do next?

Framework in Action — URL Shortener Example

StepWhat You Say in an Interview
1. Clarify"Is this read-heavy or write-heavy? Do we need analytics? Custom aliases?"
2. Estimate"100M DAUs, 100:1 read/write = 100K reads/s, ~1KB per URL = 100GB storage/year"
3. HLD"Client → API Gateway → App Server → DB. Reads via Cache layer (Redis)."
4. Deep Dive"URL table: (id, short_code, original_url, user_id, created_at). Use Base62 encoding."
5. Scalability"Shard DB by short_code hash. Cache top 20% URLs (handles 80% traffic)."
6. Reliability"Multi-region replication, health checks, circuit breakers on downstream calls."
7. Trade-offs"Chose NoSQL for write scalability over ACID guarantees. Eventual consistency on analytics."
8. Review"Add rate limiting to prevent abuse. Geo-routing for lower latency."

📋 System Design Cheat Sheet

Quick-reference for the most commonly tested concepts.

When to Use What — Database Cheat Sheet

Use CaseBest ChoiceWhy
User profiles, ordersPostgreSQL / MySQLACID, complex joins
Social graph, relationshipsNeo4jGraph traversal
Product catalog, contentMongoDBFlexible schema
Session storage, leaderboardsRedisIn-memory, fast
Time-series (metrics, logs)InfluxDB / CassandraWrite-optimized
Search / autocompleteElasticsearchFull-text search
Blob storage (images, video)S3 / GCSObject storage

Caching Decision Tree

Is the data read frequently but written rarely?
  └─ YES → Cache it!
      ├── Data changes rarely → Long TTL (hours/days)
      ├── Data changes often → Short TTL + Cache-aside pattern
      └── Must be real-time → Skip cache OR use write-through
  └─ NO  → Don't cache. Complexity isn't worth it.

Load Balancing Algorithms at a Glance

AlgorithmBest ForAvoid When
Round RobinStateless services, equal capacitySessions, stateful services
Least ConnectionsLong-lived connections (WebSockets)Short-lived HTTP requests
Consistent HashingCaching, DB shardingHeterogeneous server capacities
IP HashSticky sessions neededDynamic server fleets
Weighted Round RobinMixed-capacity serversAll servers are identical

CAP Theorem — Practical Cheat Sheet

System TypePrioritizesExample Systems
CP (Consistent + Partition Tolerant)Correctness over availabilityHBase, Zookeeper, etcd
AP (Available + Partition Tolerant)Availability over consistencyCassandra, DynamoDB, CouchDB
CA (Consistent + Available)Only works without partitionsSingle-node RDBMS (MySQL)

Key Insight: In distributed systems, network partitions are inevitable. You always choose between CP or AP.

Message Queue Selection Guide

NeedToolWhy
High throughput event streamingApache KafkaPersistent, replayable, partitioned
Task queues (jobs, emails)RabbitMQ / SQSLightweight, flexible routing
Real-time pub/subRedis Pub/SubUltra-low latency
Exactly-once semanticsKafka + transactionsStrong delivery guarantees

🇧🇩 Dead Letter Queue (DLQ) - সহজ বাংলায়

Dead Letter Queue (DLQ) হলো মেসেজিং সিস্টেমের (যেমন: AWS SQS, RabbitMQ, Kafka) একটি প্যাটার্ন। এটি মূলত ব্যর্থ বা এরর হওয়া মেসেজগুলোর জন্য একটি "ডাস্টবিন" বা "রিজার্ভ এরিয়া", যাতে মেসেজগুলো হারিয়ে না যায় এবং মেইন সিস্টেম ব্লক না করে।

কেন মেসেজ DLQ-তে যায়? ১. Max Retries Exceeded: কনজিউমার নির্দিষ্ট সংখ্যকবার চেষ্টা করার পরও প্রসেস করতে ব্যর্থ হলে। ২. Message Format Error: মেসেজের ডাটা বা পেলোড (Payload) ফরম্যাট ভুল থাকলে। ৩. Message Expiration (TTL): কিউতে মেসেজটি অনেকক্ষণ পড়ে থাকার কারণে তার মেয়াদ শেষ হয়ে গেলে।

💡 বাস্তব উদাহরণ: ফুড ডেলিভারি অ্যাপে ইনভ্যালিড ফোন নাম্বারের কারণে কনফার্মেশন SMS পাঠানো বারবার ব্যর্থ হলে, মেসেজটি DLQ-তে চলে যায়। এতে মেইন কিউ ব্লক হয় না এবং অন্য গ্রাহকরা ঠিকমতো SMS পায়।


🧠 How to Think Like a System Architect

Mental models that transform how you approach any system design problem.

Mental Model 1: The Bottleneck First Principle

Every system has one primary bottleneck at any given scale. Find it, fix it, then find the next one.

  • At 1K RPS → Application server is the bottleneck
  • At 10K RPS → Database becomes the bottleneck → Add read replicas, cache
  • At 100K RPS → Cache becomes the bottleneck → Horizontal shard, CDN
  • At 1M RPS → Network/IO is the bottleneck → Edge computing, geo-distribution

Mental Model 2: The 80/20 Rule of Traffic

In almost every system:

  • 20% of data accounts for 80% of traffic → Cache that 20%
  • 20% of users generate 80% of writes → Rate-limit them
  • 20% of endpoints get 80% of requests → Optimize those paths first

Mental Model 3: Synchronous vs. Asynchronous Thinking

Ask: "Does the user need to wait for this to complete?"
  └─ YES (payment confirmation, auth) → Synchronous, strong consistency
  └─ NO  (email notification, log processing) → Asynchronous, eventual consistency

Pushing work to async queues is one of the highest-leverage optimizations in distributed systems.

Mental Model 4: Data Locality

The closer data is to compute, the faster the system:

L1 Cache → L2 Cache → RAM → SSD → HDD → Local Network → Cross-DC → Cross-Region
   1ns       4ns      100ns   1µs   10µs      1ms           10ms        100ms

Design so that hot data lives in the fastest tier for your latency SLA.

Mental Model 5: Failure Modes Taxonomy

Classify every failure your system can experience:

Failure TypeExampleDefense
TransientNetwork blipRetry with exponential backoff
PermanentDisk failureReplication, failover
CorrelatedDC power outageMulti-region deployment
CascadingSlow DB → timeouts → overloadCircuit breakers, bulkheads
ByzantineCorrupted dataChecksums, idempotency

� Getting Started & Development

Prerequisites

  • Node.js >= 20.x
  • npm >= 9.x or yarn
  • Git (for contributions)

Installation & Local Setup

  1. Clone the repository

    bash
    git clone https://github.com/SOURAV-ROY/sdm.git
    cd sdm
  2. Install dependencies

    bash
    npm install
  3. Start development server

    bash
    npm run dev
    # Open http://localhost:5173 in your browser
  4. Edit content

    • Changes to Markdown files automatically reload in the browser
    • Start with any NN-title/README.md file

Development Workflow

bash
# 1. Create a feature branch
git checkout -b feature/your-feature

# 2. Make changes to markdown files
# Changes auto-reload at http://localhost:5173

# 3. Format your code
npm run format

# 4. Commit and push
git add .
git commit -m "feat: description of changes"
git push origin feature/your-feature

# 5. Create a pull request on GitHub

Build Commands

bash
# Development (with auto-reload)
npm run dev

# Check formatting
npm run format:check

# Format all files
npm run format

# Build for production
npm run build

# Preview production build locally
npm run preview

📖 How to Use This Repository

Learning Paths by Role

For Beginners (0-2 years experience)

  1. Start with Level 1: Foundations to learn core concepts
  2. Read through "What happens when you type google.com?"
  3. Study Level 2: Scalability and CAP Theorem
  4. Then explore Level 3: Databases
  5. Build confidence with real-world systems in Level 7

Estimated time: 4-6 weeks (2-3 hours/week)

For Intermediate Engineers (2-5 years)

  1. Skip Level 1-2, start with Level 3: Databases
  2. Deep dive into Level 4: Caching patterns
  3. Master Level 5: Messaging & Queues for event-driven systems
  4. Study Level 6: Microservices architecture
  5. Practice with Level 7: Real-World Systems case studies

Estimated time: 3-4 weeks (4-5 hours/week)

For Interview Preparation

  1. Quick review of Level 1-2 (skip if familiar)
  2. Focus on Level 7: Real-World Systems (practice 5-10 systems)
  3. Study Level 8: Case Studies framework and common mistakes
  4. Do mock interview practice
  5. Review Level 9: Design Patterns for advanced concepts

Estimated time: 2-3 weeks (5-6 hours/week)

For Tech Leads & Architects

  1. Review all levels for context
  2. Focus on Level 6: Microservices for system organization
  3. Master Level 9: Design Patterns for architectural decisions
  4. Study Level 10: Monitoring for production systems
  5. Use content for team knowledge sharing

Estimated time: Ongoing (2-3 hours/week for depth)


🛠️ Technical Stack & Architecture

Technology Stack

CategoryTechnologyVersionPurpose
Static Site GeneratorVitePress1.6.4Fast, modern documentation site builder
Frontend FrameworkVue3.5.30Reactive UI components & interactivity
LanguageTypeScript5.9.3Type-safe configuration & development
DiagramsMermaid10.9.5Flow charts, sequence diagrams, architecture
Mermaid Pluginvitepress-plugin-mermaid2.0.17Seamless Mermaid integration in VitePress
Code FormatterPrettier3.8.1Consistent code & markdown formatting
Node RuntimeNode.js>= 18.xJavaScript runtime
Build ToolVite(bundled)Ultra-fast build tool

System Architecture

┌─────────────────────────────────────────────────────────┐
│                 Browser / End User                      │
└────────────────────┬────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│      VitePress SPA + Mermaid Diagrams + Search          │
│         (Built to .vitepress/dist/)                     │
└────────────────────┬────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│              CDN + Edge Caching                         │
│              (Vercel Deployment)                        │
└────────────────────┬────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│          GitHub Repository (Source Control)             │
│    (CI/CD Pipelines, Version History, Issues)           │
└─────────────────────────────────────────────────────────┘

How It Works

  1. Content: Markdown files in NN-title/ directories
  2. Configuration: .vitepress/config.ts automatically generates sidebar from directory structure
  3. Build: npm run build processes Markdown → HTML using VitePress
  4. Deployment: GitHub Actions triggers Vercel deployment on push to sourav branch
  5. Serving: Vercel CDN serves built HTML globally

🚀 Build & Deployment

Available npm Scripts

bash
# Development
npm run dev              # Start VitePress dev server (http://localhost:5173)
npm run build            # Production build (Vercel / CI)
npm run build:local      # Format with Prettier, then build
npm run preview          # Preview production build locally

# Code Quality
npm run format           # Format all markdown, JS, JSON with Prettier
npm run format:check     # Check if files are formatted correctly

# CI/CD (Automatic)
# Vercel deploys on push to 'sourav' branch via GitHub Actions

Local Development Workflow

bash
# 1. Clone and set up
git clone https://github.com/SOURAV-ROY/sdm.git
cd sdm
npm install

# 2. Create a new branch for your changes
git checkout -b feature/your-feature

# 3. Start dev server (auto-reloads on file changes)
npm run dev

# 4. Edit markdown files in any level directory
# Changes appear instantly in browser

# 5. Format and commit
npm run format
git add .
git commit -m "feat: add new content"
git push origin feature/your-feature

# 6. Create a Pull Request on GitHub

Deployment Platforms

Vercel Deployment (Current)

  • Branch: sourav (default)
  • Build Command: npm run build
  • Output Directory: .vitepress/dist
  • Automatic: Deploys on every push to sourav branch
  • Live URL: https://souravsdm.vercel.app

GitHub Pages (Alternative)

bash
# Manual deployment
npm run build
# Commit .vitepress/dist to gh-pages branch
git subtree push --prefix .vitepress/dist origin gh-pages

Local Deployment (Testing)

bash
npm run preview  # Serves .vitepress/dist locally
# Open http://localhost:4173

📝 Contributing

We welcome contributions! Whether it's fixing typos, adding content, or improving diagrams, every contribution helps the community.

How to Contribute

  1. Fork the repository

    bash
    git clone https://github.com/YOUR_USERNAME/sdm.git
    cd sdm
  2. Create a feature branch

    bash
    git checkout -b feature/add-new-topic
  3. Make your changes

    • Add/edit Markdown files in the appropriate level directories
    • Follow the existing Markdown structure and formatting
    • Add Mermaid diagrams where helpful
  4. Format your code

    bash
    npm run format
  5. Push and create Pull Request

    bash
    git add .
    git commit -m "feat: add comprehensive XYZ guide"
    git push origin feature/add-new-topic

Content Guidelines

Good contributions:

  • Clear, concise explanations with examples
  • Mermaid diagrams for visual concepts
  • References to real-world implementations
  • Proper Markdown formatting
  • No grammatical errors

Avoid:

  • Overly long paragraphs without structure
  • Promotional content or spam
  • Untested information
  • Broken links or outdated references

Issues & Discussions

  • Found a bug? → Open an issue with details
  • Want to suggest content? → Open a discussion
  • Need clarification? → Ask in issues

❓ FAQ

Q: Which level should I start with?

A: Start with Level 1: Foundations unless you already understand networking and distributed systems. Everyone needs foundational knowledge to understand higher levels.

Q: Is this free?

A: Yes, completely free and open-source under the ISC license. No hidden costs or premium content.

Q: How long will it take to complete?

A: Depends on your background. Beginners: 4-6 weeks (2-3 hours/week). Experienced: 2-3 weeks (4-5 hours/week).

Q: Can I use this for interview prep?

A: Absolutely! Level 8 (Case Studies) is specifically designed for interview preparation. Practice the real-world systems multiple times.

Q: Are there real code examples?

A: No, this is theory and high-level design focused. For implementations, check the resources section for GitHub repositories.

Q: Can I offline the content?

A: Yes! Clone the repository and run npm run build, then open .vitepress/dist/index.html in your browser.

Q: How often is this updated?

A: Content is updated regularly based on community feedback and new technologies. The latest updates are in the main branch.

Q: Is there a video version?

A: Not currently, but the Markdown content is compatible with PDF generators if you need offline reading.

Q: Can I translate this?

A: Contributions with translations are welcome! Open an issue to discuss translation to other languages.


🔗 Resources & References

Technology Documentation

Real-World Architectures

Books Worth Reading

  • "Designing Data-Intensive Applications" by Martin Kleppmann
  • "The Art of Scalability" by Martin Abbott & Michael Fisher
  • "Building Microservices" by Sam Newman
  • "Site Reliability Engineering" by Google

📊 Project Statistics & Coverage

MetricValue
Total Learning Modules12
Real-World Systems15+
Mermaid Diagrams100+
Markdown Content Files50+
Total Content Lines5000+
Interview Frameworks3+
Design Patterns Covered20+
Companies Featured15+ (Google, Netflix, Uber, Amazon, Meta, Microsoft, etc.)
Fully Open SourceYes ✓
LicenseISC
Last Updated2026

❓ FAQ - Frequently Asked Questions

Q: Which level should I start with?

A: Start with Level 1: Foundations unless you already understand networking and distributed systems fundamentals. Every subsequent level builds on these concepts. If you're already familiar with networking and basic architecture, you can start with Level 2: Scalability.

Q: Is this completely free?

A: Yes, completely free and open-source under the ISC license. No hidden costs, premium tiers, or paywalls. All content is available at https://souravsdm.vercel.app.

Q: How long does it take to complete?

A: It depends on your background and learning pace:

  • Beginners: 4-6 weeks (2-3 hours/week)
  • Intermediate: 3-4 weeks (4-5 hours/week)
  • Interview Prep Only: 2-3 weeks (5-6 hours/week)
  • Complete Mastery: 2-3 months (regular study)

Q: Can I use this for interview preparation?

A: Absolutely! Level 8 (Case Studies) and Level 12 (Interview Prep) are specifically designed for interview preparation. Combine these with practicing the Level 7 (Real-World Systems) repeatedly to build confidence.

Q: Are there implementation examples or code?

A: This repository focuses on architecture and design theory, not implementations. For code examples, we recommend checking the "Real-World Architectures" and "Technology Documentation" sections in the Resources for links to open-source implementations and documentation.

Q: Can I download and use this offline?

A: Yes! Run these commands:

bash
git clone https://github.com/SOURAV-ROY/sdm.git
cd sdm
npm install
npm run build
# Open .vitepress/dist/index.html in your browser

Q: How frequently is the content updated?

A: Content is updated regularly based on community feedback, new technologies, and best practices. Major updates are released quarterly. Check the GitHub Releases page for version history.

Q: Is there a video version?

A: Not currently, but Markdown content can be converted to PDF for offline reading using various tools (pandoc, printing to PDF, etc.).

Q: Can I translate this to another language?

A: Absolutely! Translations are welcome. Please open an issue to discuss translating to your language. We can coordinate with the community.

Q: Which systems should I focus on for interviews?

A: Start with:

  1. Easy: URL Shortener, Notification System
  2. Medium: Twitter, Instagram, Notification System
  3. Hard: Netflix, Uber, YouTube

Practice each 2-3 times before moving to the next difficulty level. By interview time, you should be comfortable with 8-10 systems.

Q: What if I don't understand a concept?

A: Try these strategies:

  1. Re-read the section (understanding takes multiple passes)
  2. Draw diagrams and visualize the concept
  3. Search for additional resources in our Resources section
  4. Open an issue asking for clarification
  5. Check related levels for foundational context

🔗 Resources & References

Learning Resources & Communities

Technology Documentation

Real-World Architectures & Engineering Blogs

Essential Books

  • "Designing Data-Intensive Applications" by Martin Kleppmann - Must-read for distributed systems
  • "The Art of Scalability" by Martin Abbott & Michael Fisher - Practical scalability patterns
  • "Building Microservices" by Sam Newman - Microservices architecture
  • "Site Reliability Engineering" by Google - Production systems & monitoring
  • "Release It!" by Michael Nygard - Production-ready systems

Academic & Research


📄 License & Author

License

This project is licensed under the ISC License - You can freely use, modify, and distribute this content for any purpose, personal or commercial.

ISC License (ISC)

Copyright (c) 2024-2026 Sourav Roy

Permission to use, copy, modify, and/or distribute this software for any purpose with or
without fee is hereby granted, provided that the above copyright notice and this permission
notice appear in all copies.

THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO
THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.

Author & Contributors

Created & Maintained by: Sourav Roy

Community: This project thrives through contributions from the developer community. Check the Contributors page for everyone who has helped.


🙏 Show Your Support

If this resource has helped you learn system design, prepare for interviews, or deepen your architectural knowledge:

  • Star this repository - Helps others discover this resource
  • 🐦 Share on social media - "I just learned system design with System Design Mastery"
  • 💬 Open issues/discussions - Share feedback, suggestions, or topics you'd like covered
  • 🤝 Contribute - Fix typos, improve explanations, add diagrams, or suggest new content
  • 📢 Recommend to friends - Share with engineers preparing for interviews

🚀 Getting Started Today

Choose Your Path:

  1. 📚 Complete Beginner?Start with Level 1
  2. 🎯 Interview Coming Up?Go to Real-World Systems
  3. 🔧 Already Know Basics?Jump to Databases
  4. 🎤 Want Interview Prep?Check Interview Module

Or contribute:

bash
# Clone and start contributing
git clone https://github.com/SOURAV-ROY/sdm.git
cd sdm
npm install
npm run dev
# Visit http://localhost:5173 and start editing!

📞 Support & Feedback


🎯 Roadmap & Future Plans

Planned Additions (2026):

  • [ ] More real-world systems (Spotify, Discord, Reddit)
  • [ ] Video explanation supplements
  • [ ] Interactive system design simulator
  • [ ] Community contributed case studies
  • [ ] Multilingual support (Spanish, Chinese, Hindi, etc.)
  • [ ] PDF download option
  • [ ] Mobile-optimized view
  • [ ] Advanced system design patterns

Help Wanted:

  • 📝 Content writers & editors
  • 🎨 Diagram & visualization contributors
  • 🌍 Translators
  • 🧪 Reviewers for technical accuracy
  • 💬 Interview experience sharers

Interested? Open an issue and let's collaborate!


💡 Key Insights & Philosophy

This repository is built on the belief that:

  1. System design knowledge should be free - High-quality learning shouldn't require expensive courses
  2. Learning should be structured - Not scattered across different sources
  3. Examples matter - Real system case studies teach faster than theory alone
  4. Practice builds confidence - Repeated practice with frameworks leads to interview success
  5. Community creates better resources - Open-source collaboration improves knowledge for everyone

🏆 Success Stories

This resource has helped hundreds of engineers:

  • 📈 Land interviews at FAANG companies
  • 🎓 Deepen their architectural understanding
  • 💼 Become technical leaders and architects
  • 🔬 Pursue research in distributed systems
  • 🚀 Build production systems at scale

Want to share your story? Open a discussion and inspire others!


📊 Quick Reference

By Experience Level:

LevelStart HereFocus AreasTime
0-2 yrsFoundationsLevels 1-3, then 7-84-6 wks
2-5 yrsDatabasesLevels 3-6, then 7, 93-4 wks
InterviewsReal SystemsLevel 7 & 8 deeply2-3 wks
ArchitectsAll Levels6, 9-10 focusOngoing
ResearchersAll LevelsThen Level 11Variable

By System Type:

NeedLevelTimePriority
URL Shortener71.5 hrsMedium
Social Media72 hrsHigh
Video Platform72.5 hrsMedium
Notification System71.5 hrsMedium
E-Commerce72.5 hrsHigh

🌟 Made with ❤️ by SOURAV ROY

Last Updated: 2026
Version: 1.0.0+
License: ISC (Free & Open Source)

⬆ Back to top

Released under the ISC License.