🏗️ System Design Mastery — Comprehensive Guide
Master system design from fundamentals to real-world architectures. A complete, structured, free, and open-source learning platform for software engineers to confidently design large-scale distributed systems used at companies like Google, Netflix, Uber, Amazon, Meta, and Microsoft.
📑 Table of Contents
- Project Overview
- Quick Start
- Learning Modules
- Golden Rules for System Engineers
- Learning Roadmap
- Repository Structure
- The 8-Step System Design Framework
- System Design Cheat Sheet
- How to Think Like a System Architect
- Learning Paths by Role
- Technical Stack
- Getting Started & Development
- Build & Deployment
- Contributing
- FAQ
- Resources & References
- License & Author
📚 Project Overview
System Design Mastery is a comprehensive, free, and open-source learning platform designed to teach software engineers how to architect and design large-scale distributed systems. Whether you're preparing for technical interviews at FAANG companies, building production systems at your organization, or deepening your architectural knowledge, this resource provides a structured, progressive learning experience.
🎯 Mission
- Democratize System Design Knowledge: Make advanced architectural concepts accessible to engineers at all levels
- Provide Structured Learning: Offer a clear progression from fundamentals to advanced patterns
- Enable Interview Success: Prepare engineers to confidently design systems in technical interviews
- Reflect Real-World Practices: Showcase how industry leaders design and scale systems
- Foster Community Learning: Build an open-source resource that grows through community contributions
✨ Key Features
- ✅ 12 Progressive Learning Modules - From fundamentals to advanced patterns and academic research
- ✅ 15+ Real-World Case Studies - In-depth analysis of Netflix, Uber, Twitter, WhatsApp, Instagram, YouTube, TikTok, and more
- ✅ 100+ Interactive Diagrams - Mermaid visualizations for every concept and system
- ✅ Interview Frameworks & Templates - Step-by-step approaches for system design interviews
- ✅ Design Patterns & Best Practices - Proven architectural patterns used at scale
- ✅ Practical Examples & Trade-offs - Actionable insights with real numbers and comparisons
- ✅ Academic Research Path - Resources for PhD candidates and researchers in distributed systems
- ✅ Always Updated - Community contributions and latest industry best practices
- ✅ Production-Ready Content - Thoroughly researched and technically accurate
👥 Who Should Use This?
| Role | Learning Path | Time Estimate |
|---|---|---|
| Beginners (0-2 yrs) | Levels 1-3 → 7 → 8 | 4-6 weeks |
| Backend Engineers (2-5 yrs) | Levels 3-6 → 7 → 9 | 3-4 weeks |
| Interview Candidates | Levels 1-2 (review) → 7-8 (focus) | 2-3 weeks |
| Tech Leads & Architects | All levels + focus on 6, 9-10 | Ongoing |
| PhD Researchers | All levels + Level 11 | Variable |
🚀 Quick Start
Online (Recommended)
Visit https://souravsdm.vercel.app to start learning immediately. No installation required.
Local Development
git clone https://github.com/SOURAV-ROY/sdm.git
cd sdm
npm install
npm run dev
# Open http://localhost:5173Offline Access
npm run build
# Content is in .vitepress/dist/
# Open index.html in your browser for offline reading� Learning Modules
This comprehensive guide is organized into 12 progressive modules, each building upon the previous to develop complete mastery of system design:
Level 1: Foundations 🔧
Master the core networking and system design concepts that underpin all distributed systems.
- Topics: Networking (TCP/IP, HTTP), DNS, Load Balancing, Architecture Fundamentals, Design Patterns, Fault Tolerance, Clustering
- Key Learning: Understand what happens when you type google.com, proxy architectures, server failure handling
- Why: These fundamentals are non-negotiable—every system design decision builds on these principles
Level 2: Scalability 📈
Learn the principles of horizontal and vertical scaling, and understand the fundamental trade-offs in distributed systems.
- Topics: Horizontal vs Vertical Scaling, CAP Theorem & Trade-offs, Consistency Models, Scaling from 1 to 1 Billion Users
- Key Learning: When to scale, why consistency matters, managing growth
- Why: Scalability is the core challenge of system design—learning these principles prevents architecture mistakes
Level 3: Databases 🗄️
Understand different database paradigms and how to choose the right database for your system.
- Topics: SQL vs NoSQL Comparison, Database Sharding & Replication, Indexing Strategies, Database Management Best Practices
- Key Learning: Sharding patterns, replication strategies, when to use which database
- Why: Database design decisions have cascading effects on system scalability and reliability
Level 4: Caching ⚡
Master caching strategies to dramatically improve system performance.
- Topics: Cache Strategies & Patterns, Redis Architecture, CDN Optimization, Cache Eviction Policies
- Key Learning: When and what to cache, cache invalidation patterns, multi-tier caching
- Why: Caching is often the highest-leverage optimization in distributed systems
Level 5: Messaging & Queues 📨
Build event-driven, decoupled, and resilient systems using messaging architectures.
- Topics: Event-Driven Architecture, Kafka & RabbitMQ Basics, Dead Letter Queues, Pub/Sub Patterns
- Key Learning: Asynchronous processing, event sourcing, message reliability
- Why: Messaging enables systems to be loosely coupled, resilient, and scalable
Level 6: Microservices 🎯
Design and manage systems built from independent, scalable services.
- Topics: Service Mesh Concepts, API Gateway Patterns, Circuit Breaker Pattern, Service-to-Service Communication
- Key Learning: Service boundaries, communication patterns, failure isolation
- Why: Microservices enable independent scaling and deployment but require careful architectural decisions
Level 7: Real-World Systems 🌍
Apply your knowledge to design and understand real systems built and scaled by industry leaders.
- Systems Covered (15+ real-world designs):
- Social/Communication: Twitter/X, Instagram, WhatsApp, TikTok
- Transportation/Commerce: Uber, Amazon, Airbnb
- Content: Netflix, YouTube, Google Search
- Utilities: URL Shortener, Notification System (1B/day), GitHub, Candy Crush
- Key Learning: Real trade-offs, complete system design, interview-ready case studies
- Why: Real-world examples show how theory applies in practice at massive scale
Level 8: Case Studies & Interview Prep 🎤
Prepare for and excel in system design interviews with proven frameworks.
- Topics: Interview Frameworks & Approaches, Common Mistakes & Pitfalls, System Design Templates, Discussion Strategies
- Key Learning: How to think out loud, explain trade-offs, handle interviewer challenges
- Why: Interview success requires not just knowledge but communication skills and frameworks
Level 9: Design Patterns 🏛️
Master reusable architectural patterns that solve common distributed system problems.
- Topics: Architectural Patterns, Distributed Patterns, Performance Patterns, Reliability Patterns
- Key Learning: When and how to apply proven patterns, avoiding anti-patterns
- Why: Patterns are proven solutions—using them speeds up design and prevents mistakes
Level 10: Monitoring & Observability 📊
Build systems you can understand and debug at scale through comprehensive observability.
- Topics: System Observability Fundamentals, Metrics & Logging, Distributed Tracing & Debugging, Performance Optimization, SLOs & SLIs
- Key Learning: What to monitor, how to debug distributed systems, performance optimization
- Why: You can't operate systems you can't observe—monitoring is not optional in production
Level 11: Academic Research 🔬
Explore the research foundation of distributed systems for those interested in PhD research and cutting-edge concepts.
- Topics: Finding a Research Niche, Seminal Papers & Reading List, Academic Publication Flow, Research Methodologies
- Key Learning: How to identify research problems, read academic papers, contribute to the field
- Why: Academic research drives innovation in distributed systems
Level 12: Interview Preparation 🚀
Comprehensive interview preparation with mock scenarios, frameworks, and practice systems.
- Topics: Complete Interview Frameworks, Multiple Practice Systems, Mock Interview Guides, Advanced Interview Strategies
- Key Learning: Full end-to-end interview preparation, confidence building
- Why: Dedicated interview prep module for focused preparation before interviews
This repository is a comprehensive, free, and open-source learning platform dedicated to mastering system design. Whether you're preparing for technical interviews, building production systems, deepening your architectural knowledge, or conducting research, this resource provides:
- ✅ Structured Learning Path - 12 progressive levels from fundamentals to advanced patterns and academic research
- ✅ Real-World Case Studies - 15+ in-depth system designs including Netflix, Uber, Twitter, WhatsApp, Instagram, YouTube, TikTok, and more
- ✅ Interactive Diagrams - 100+ Mermaid visualizations for every concept and system design
- ✅ Interview Frameworks - Step-by-step templates and strategies used in FAANG interviews
- ✅ Design Patterns - Proven architectural patterns with when and how to apply them
- ✅ Practical Examples - Real numbers, trade-offs, and actionable insights
- ✅ Academic Resources - Research paths for those interested in distributed systems research
- ✅ Always Updated - Community-driven with latest best practices
- ✅ Production-Ready - Thoroughly researched and technically accurate
⚡ Golden Rules for System Engineers
These are the non-negotiable principles that separate great system designers from average ones.
1. Always Start with Requirements, Never with Solutions
Before drawing a single box, ask: Who uses this? What must it do? What are the performance constraints? Jumping to solutions without understanding the problem leads to overengineering.
2. Numbers Don't Lie — Estimate Everything
Back every design decision with math. If you say "we need caching," prove it:
- DAU × requests/day = total RPS
- Object size × writes/day × retention = storage
- Bandwidth = RPS × average payload size
3. No Silver Bullets — Everything is a Trade-off
| Decision | You Gain | You Lose |
|---|---|---|
| SQL → NoSQL | Horizontal scale | ACID, complex queries |
| Caching | Speed | Consistency, staleness |
| Async queues | Decoupling, resilience | Latency, complexity |
| Microservices | Independent deployments | Operational overhead |
| Strong consistency | Correctness | Availability, latency |
4. Design for Failure First
Assume every component will fail. Ask: What happens when the DB goes down? When does a service crash? When the network partitions? Good systems degrade gracefully, not catastrophically.
5. Scale Incrementally — Don't Pre-Optimize
Start simple. A single server handles more than you think. Scale only when you have data showing you need to. Premature optimization is the root of all evil in distributed systems.
6. The Three Pillars of Production Systems
Every system you design must explicitly address:
- 📊 Observability: Metrics, logging, distributed tracing
- 🔒 Security: AuthN/AuthZ, encryption, rate limiting
- 🔧 Operability: Deploy, rollback, runbooks
🗺️ Learning Roadmap
LEVEL 1 — Foundations
├── Networking basics, HTTP, DNS, Load Balancing
├── High-level architecture concepts
├── Design patterns fundamentals
├── What happens when you type google.com?
├── Proxy & Reverse Proxy
├── Server Crashes & Fault Tolerance
└── Clustering Fundamentals
LEVEL 2 — Scalability
├── Horizontal vs Vertical scaling
├── CAP Theorem & Trade-offs
├── Consistency models
└── Scaling from 1 to 1 billion users
LEVEL 3 — Databases
├── SQL vs NoSQL comparison
├── Database sharding & replication
├── Indexing strategies
└── Database management best practices
LEVEL 4 — Caching
├── Cache strategies & patterns
├── Redis architecture
├── CDN optimization
└── Cache eviction policies
LEVEL 5 — Messaging & Queues
├── Event-driven architecture
├── Kafka & RabbitMQ basics
├── Dead letter queues
└── Pub/Sub patterns
LEVEL 6 — Microservices
├── Service mesh concepts
├── API Gateway patterns
├── Circuit breaker pattern
└── Service-to-service communication
LEVEL 7 — Real-World Systems
├── URL Shortener (tinyurl.com)
├── Twitter Clone (X)
├── Netflix architecture
├── Uber system design
├── WhatsApp infrastructure
├── Notification System (1B/day)
├── Instagram Architecture
├── Amazon E-commerce Scale
├── YouTube Video Platform
├── Google Search Architecture
├── GitHub Repository Hosting
├── TikTok Recommendation Engine
├── Candy Crush Saga at 1 Billion Users
└── Airbnb Availability & Booking
LEVEL 8 — Case Studies & Interview Prep
├── Interview frameworks & approach
├── Common mistakes & pitfalls
├── System design templates
└── Discussion strategies
LEVEL 9 — Design Patterns
├── Architectural patterns
├── Distributed patterns
├── Performance patterns
└── Reliability patterns
LEVEL 10 — Monitoring & Observability
├── System observability
├── Metrics & logging
├── Tracing & debugging
└── Performance optimization
LEVEL 11 — Academic Research
├── Finding a Research Niche
├── Required Seminal Papers
└── The Academic Publication Flow
LEVEL 12 — Interview Preparation
├── Complete Interview Frameworks
├── Multiple Practice Systems
├── Mock Interview Guides
└── Advanced Interview Strategies📂 Repository Structure
This is a VitePress-based documentation site with a modular structure. Each numbered directory represents a learning level.
sdm/
├── 01-foundations/ # Core networking & design fundamentals
│ ├── README.md
│ ├── networking.md # TCP/IP, HTTP, DNS
│ ├── load-balancing-algorithms.md # LB strategies
│ ├── what-happens-when-you-type-google.md
│ ├── fault-tolerance-crashes.md
│ └── design-patterns.md
│
├── 02-scalability/ # Scalability principles & CAP theorem
│ ├── README.md
│ ├── scaling-1-to-1-billion.md # Growth strategies
│ └── ...
│
├── 03-databases/ # Database paradigms & strategies
│ ├── README.md
│ ├── database-sharding.md
│ └── ...
│
├── 04-caching/ # Caching strategies & patterns
│ ├── README.md
│ ├── cdn-optimization.md
│ └── ...
│
├── 05-messaging/ # Event-driven & message queues
│ ├── README.md
│ ├── dead-letter-queues.md
│ └── ...
│
├── 06-microservices/ # Microservices architecture
│ ├── README.md
│ ├── grpc.md
│ ├── api-gateway.md
│ └── ...
│
├── 07-real-world-systems/ # 15+ real system case studies
│ ├── url-shortener/README.md # URL shortening service
│ ├── twitter-clone/README.md # Twitter/X architecture
│ ├── netflix/README.md # Streaming platform
│ ├── uber/README.md # Ride-sharing platform
│ ├── whatsapp/README.md # Messaging system
│ ├── notification-system/README.md # 1B/day notification system
│ ├── instagram/README.md # Photo sharing platform
│ ├── amazon/README.md # E-commerce at scale
│ ├── youtube/README.md # Video platform
│ ├── google-search/README.md # Search engine
│ ├── github/README.md # Repository hosting
│ ├── tiktok/README.md # Recommendation engine
│ ├── candy-crush/README.md # Gaming at 1B scale
│ └── airbnb/README.md # Availability & booking
│
├── 08-case-studies/ # Interview prep frameworks
│ ├── README.md
│ ├── interview-template.md # Interview approach
│ ├── common-mistakes.md
│ └── frameworks.md
│
├── 09-design-patterns/ # Reusable architectural patterns
│ ├── README.md
│ ├── architectural-patterns.md
│ ├── distributed-patterns.md
│ └── ...
│
├── 10-monitoring/ # Observability & production systems
│ ├── README.md
│ ├── metrics-logging.md
│ ├── distributed-tracing.md
│ └── ...
│
├── 11-academic-research/ # PhD research resources
│ ├── README.md
│ ├── research-methodology.md
│ └── seminal-papers.md
│
├── 12-interview-prep/ # Dedicated interview preparation
│ ├── README.md
│ ├── complete-frameworks.md
│ ├── mock-interviews.md
│ └── ...
│
├── .vitepress/ # VitePress configuration
│ ├── config.ts # Build & site config (sidebar, search, plugins)
│ ├── theme/index.ts # Theme customization
│ ├── theme/custom.css # Custom styling
│ └── dist/ # Production build output
│
├── public/ # Static assets
│ ├── logo.png
│ └── images/
│
├── .github/workflows/ # CI/CD pipelines
│ └── vercel-deploy.yml # Automated deployment
│
├── index.md # Home page (VitePress landing)
├── roadmap.md # Interactive learning roadmap
├── guidelife-flow.md # Learning flow guide
├── package.json # Dependencies & scripts
├── .prettierrc # Code formatting config
└── README.md # This fileDirectory Naming Convention
All learning modules follow the pattern: NN-topic-name/ where:
NN= Two-digit level number (01, 02, ... 12)topic-name= Kebab-cased topic title
This convention enables the VitePress config to automatically generate the navigation sidebar.
🎓 Learning Paths by Role
Choose your learning path based on your role and goals:
Path 1️⃣ Beginner (0-2 years experience)
Goal: Build solid foundations and prepare for interviews
Duration: 4-6 weeks (2-3 hours/week)
1. Level 1: Foundations (complete all topics)
2. Level 2: Scalability (focus on CAP theorem)
3. Level 3: Databases (SQL vs NoSQL)
4. Level 7: Real-World Systems (start with URL Shortener)
5. Level 8: Case Studies (study frameworks)Outcome: Understand core concepts, pass junior engineer interviews
Path 2️⃣ Intermediate Backend Engineer (2-5 years)
Goal: Deepen expertise and prepare for senior roles
Duration: 3-4 weeks (4-5 hours/week)
1. Level 3: Databases (deep dive into sharding)
2. Level 4: Caching (master cache patterns)
3. Level 5: Messaging (event-driven architecture)
4. Level 6: Microservices (service design)
5. Level 7: Real-World Systems (Netflix, Uber, Twitter)
6. Level 9: Design Patterns (proven solutions)Outcome: Senior-level technical competency, architectural thinking
Path 3️⃣ Interview Preparation (All levels)
Goal: Excel in system design interviews
Duration: 2-3 weeks (5-6 hours/week)
1. Level 1: Foundations (quick review - 2-3 hours)
2. Level 2: Scalability (focus on CAP - 2-3 hours)
3. Level 7: Real-World Systems (practice 8-10 systems - 8-10 hours)
├── Start: URL Shortener
├── Medium: Twitter, Notification System
├── Hard: Netflix, Uber, Instagram
4. Level 8: Case Studies (study interview frameworks - 2 hours)
5. Level 12: Interview Prep (mock interviews - 3-4 hours)Outcome: Confident in interviews, handles edge cases, communicates clearly
Path 4️⃣ Tech Lead / Architect (5+ years)
Goal: Guide architectural decisions and mentor teams
Duration: Ongoing (2-3 hours/week)
1. Review all levels for context (skip if familiar)
2. Level 6: Microservices (system organization)
3. Level 9: Design Patterns (architectural patterns)
4. Level 10: Monitoring (production systems)
5. Periodically: Review Level 7 for new systemsOutcome: Architectural expertise, mentoring capability, informed technical decisions
Path 5️⃣ PhD Researcher (Systems & Distributed Computing)
Goal: Understand research landscape and contribute
Duration: Variable
1. All levels 1-6 (foundational knowledge)
2. Level 7: Real-World Systems (see what industry does)
3. Level 9: Design Patterns (know solutions)
4. Level 11: Academic Research (research methodology)
├── Find research niche
├── Read seminal papers
├── Identify open problems
5. Deep dive into specialized areasOutcome: Research-ready, knows problem space, can identify gaps
💡 Better Learning Tips
✅ Do this:
- Read Level 1 fundamentals completely—don't skip networking or consistency models
- Draw and design systems yourself, don't just read
- Take notes and create your own design templates
- Practice each real-world system multiple times (3-5 times)
- Discuss designs with peers to deepen understanding
- Reference GitHub implementations of patterns
❌ Don't do this:
- Memorize content—focus on understanding trade-offs
- Skip "boring" foundational topics like networking
- Give up on hard concepts like CAP Theorem—revisit them
- Design systems alone—always discuss and get feedback
- Assume one design works for all scenarios—context matters
🔑 The 8-Step System Design Framework
When asked to design any system in an interview or real project, follow this systematic approach:
Step-by-Step Breakdown
| Step | Focus | Key Questions |
|---|---|---|
| 1. Clarify Requirements | Functional + Non-functional | Who are the users? What features needed? Latency, throughput, consistency? |
| 2. Estimate Scale | Users, RPS, Storage, Bandwidth | Daily/monthly active users? Concurrent connections? Data size? Growth rate? |
| 3. High-Level Design | System components & flow | What are the main services? How do they interact? Where are bottlenecks? |
| 4. Deep Dive | Database design, APIs, data flow | Schema design? Primary keys? Indexing? API endpoints? Data models? |
| 5. Scalability | Handle 10x, 100x load | How to scale databases? Cache layers? Load balancing? Replication? |
| 6. Reliability | Fault tolerance, redundancy | How to handle failures? Backup strategies? Monitoring? Alerting? |
| 7. Trade-offs | Analyze sacrifices | Why NoSQL over SQL? Cache vs accuracy? Strong vs eventual consistency? |
| 8. Final Review | Optimization & improvements | Any bottlenecks? Can we optimize further? What would we do next? |
Framework in Action — URL Shortener Example
| Step | What You Say in an Interview |
|---|---|
| 1. Clarify | "Is this read-heavy or write-heavy? Do we need analytics? Custom aliases?" |
| 2. Estimate | "100M DAUs, 100:1 read/write = 100K reads/s, ~1KB per URL = 100GB storage/year" |
| 3. HLD | "Client → API Gateway → App Server → DB. Reads via Cache layer (Redis)." |
| 4. Deep Dive | "URL table: (id, short_code, original_url, user_id, created_at). Use Base62 encoding." |
| 5. Scalability | "Shard DB by short_code hash. Cache top 20% URLs (handles 80% traffic)." |
| 6. Reliability | "Multi-region replication, health checks, circuit breakers on downstream calls." |
| 7. Trade-offs | "Chose NoSQL for write scalability over ACID guarantees. Eventual consistency on analytics." |
| 8. Review | "Add rate limiting to prevent abuse. Geo-routing for lower latency." |
📋 System Design Cheat Sheet
Quick-reference for the most commonly tested concepts.
When to Use What — Database Cheat Sheet
| Use Case | Best Choice | Why |
|---|---|---|
| User profiles, orders | PostgreSQL / MySQL | ACID, complex joins |
| Social graph, relationships | Neo4j | Graph traversal |
| Product catalog, content | MongoDB | Flexible schema |
| Session storage, leaderboards | Redis | In-memory, fast |
| Time-series (metrics, logs) | InfluxDB / Cassandra | Write-optimized |
| Search / autocomplete | Elasticsearch | Full-text search |
| Blob storage (images, video) | S3 / GCS | Object storage |
Caching Decision Tree
Is the data read frequently but written rarely?
└─ YES → Cache it!
├── Data changes rarely → Long TTL (hours/days)
├── Data changes often → Short TTL + Cache-aside pattern
└── Must be real-time → Skip cache OR use write-through
└─ NO → Don't cache. Complexity isn't worth it.Load Balancing Algorithms at a Glance
| Algorithm | Best For | Avoid When |
|---|---|---|
| Round Robin | Stateless services, equal capacity | Sessions, stateful services |
| Least Connections | Long-lived connections (WebSockets) | Short-lived HTTP requests |
| Consistent Hashing | Caching, DB sharding | Heterogeneous server capacities |
| IP Hash | Sticky sessions needed | Dynamic server fleets |
| Weighted Round Robin | Mixed-capacity servers | All servers are identical |
CAP Theorem — Practical Cheat Sheet
| System Type | Prioritizes | Example Systems |
|---|---|---|
| CP (Consistent + Partition Tolerant) | Correctness over availability | HBase, Zookeeper, etcd |
| AP (Available + Partition Tolerant) | Availability over consistency | Cassandra, DynamoDB, CouchDB |
| CA (Consistent + Available) | Only works without partitions | Single-node RDBMS (MySQL) |
Key Insight: In distributed systems, network partitions are inevitable. You always choose between CP or AP.
Message Queue Selection Guide
| Need | Tool | Why |
|---|---|---|
| High throughput event streaming | Apache Kafka | Persistent, replayable, partitioned |
| Task queues (jobs, emails) | RabbitMQ / SQS | Lightweight, flexible routing |
| Real-time pub/sub | Redis Pub/Sub | Ultra-low latency |
| Exactly-once semantics | Kafka + transactions | Strong delivery guarantees |
🇧🇩 Dead Letter Queue (DLQ) - সহজ বাংলায়
Dead Letter Queue (DLQ) হলো মেসেজিং সিস্টেমের (যেমন: AWS SQS, RabbitMQ, Kafka) একটি প্যাটার্ন। এটি মূলত ব্যর্থ বা এরর হওয়া মেসেজগুলোর জন্য একটি "ডাস্টবিন" বা "রিজার্ভ এরিয়া", যাতে মেসেজগুলো হারিয়ে না যায় এবং মেইন সিস্টেম ব্লক না করে।
কেন মেসেজ DLQ-তে যায়? ১. Max Retries Exceeded: কনজিউমার নির্দিষ্ট সংখ্যকবার চেষ্টা করার পরও প্রসেস করতে ব্যর্থ হলে। ২. Message Format Error: মেসেজের ডাটা বা পেলোড (Payload) ফরম্যাট ভুল থাকলে। ৩. Message Expiration (TTL): কিউতে মেসেজটি অনেকক্ষণ পড়ে থাকার কারণে তার মেয়াদ শেষ হয়ে গেলে।
💡 বাস্তব উদাহরণ: ফুড ডেলিভারি অ্যাপে ইনভ্যালিড ফোন নাম্বারের কারণে কনফার্মেশন SMS পাঠানো বারবার ব্যর্থ হলে, মেসেজটি DLQ-তে চলে যায়। এতে মেইন কিউ ব্লক হয় না এবং অন্য গ্রাহকরা ঠিকমতো SMS পায়।
🧠 How to Think Like a System Architect
Mental models that transform how you approach any system design problem.
Mental Model 1: The Bottleneck First Principle
Every system has one primary bottleneck at any given scale. Find it, fix it, then find the next one.
- At 1K RPS → Application server is the bottleneck
- At 10K RPS → Database becomes the bottleneck → Add read replicas, cache
- At 100K RPS → Cache becomes the bottleneck → Horizontal shard, CDN
- At 1M RPS → Network/IO is the bottleneck → Edge computing, geo-distribution
Mental Model 2: The 80/20 Rule of Traffic
In almost every system:
- 20% of data accounts for 80% of traffic → Cache that 20%
- 20% of users generate 80% of writes → Rate-limit them
- 20% of endpoints get 80% of requests → Optimize those paths first
Mental Model 3: Synchronous vs. Asynchronous Thinking
Ask: "Does the user need to wait for this to complete?"
└─ YES (payment confirmation, auth) → Synchronous, strong consistency
└─ NO (email notification, log processing) → Asynchronous, eventual consistencyPushing work to async queues is one of the highest-leverage optimizations in distributed systems.
Mental Model 4: Data Locality
The closer data is to compute, the faster the system:
L1 Cache → L2 Cache → RAM → SSD → HDD → Local Network → Cross-DC → Cross-Region
1ns 4ns 100ns 1µs 10µs 1ms 10ms 100msDesign so that hot data lives in the fastest tier for your latency SLA.
Mental Model 5: Failure Modes Taxonomy
Classify every failure your system can experience:
| Failure Type | Example | Defense |
|---|---|---|
| Transient | Network blip | Retry with exponential backoff |
| Permanent | Disk failure | Replication, failover |
| Correlated | DC power outage | Multi-region deployment |
| Cascading | Slow DB → timeouts → overload | Circuit breakers, bulkheads |
| Byzantine | Corrupted data | Checksums, idempotency |
� Getting Started & Development
Prerequisites
- Node.js >= 20.x
- npm >= 9.x or yarn
- Git (for contributions)
Installation & Local Setup
Clone the repository
bashgit clone https://github.com/SOURAV-ROY/sdm.git cd sdmInstall dependencies
bashnpm installStart development server
bashnpm run dev # Open http://localhost:5173 in your browserEdit content
- Changes to Markdown files automatically reload in the browser
- Start with any
NN-title/README.mdfile
Development Workflow
# 1. Create a feature branch
git checkout -b feature/your-feature
# 2. Make changes to markdown files
# Changes auto-reload at http://localhost:5173
# 3. Format your code
npm run format
# 4. Commit and push
git add .
git commit -m "feat: description of changes"
git push origin feature/your-feature
# 5. Create a pull request on GitHubBuild Commands
# Development (with auto-reload)
npm run dev
# Check formatting
npm run format:check
# Format all files
npm run format
# Build for production
npm run build
# Preview production build locally
npm run preview📖 How to Use This Repository
Learning Paths by Role
For Beginners (0-2 years experience)
- Start with Level 1: Foundations to learn core concepts
- Read through "What happens when you type google.com?"
- Study Level 2: Scalability and CAP Theorem
- Then explore Level 3: Databases
- Build confidence with real-world systems in Level 7
Estimated time: 4-6 weeks (2-3 hours/week)
For Intermediate Engineers (2-5 years)
- Skip Level 1-2, start with Level 3: Databases
- Deep dive into Level 4: Caching patterns
- Master Level 5: Messaging & Queues for event-driven systems
- Study Level 6: Microservices architecture
- Practice with Level 7: Real-World Systems case studies
Estimated time: 3-4 weeks (4-5 hours/week)
For Interview Preparation
- Quick review of Level 1-2 (skip if familiar)
- Focus on Level 7: Real-World Systems (practice 5-10 systems)
- Study Level 8: Case Studies framework and common mistakes
- Do mock interview practice
- Review Level 9: Design Patterns for advanced concepts
Estimated time: 2-3 weeks (5-6 hours/week)
For Tech Leads & Architects
- Review all levels for context
- Focus on Level 6: Microservices for system organization
- Master Level 9: Design Patterns for architectural decisions
- Study Level 10: Monitoring for production systems
- Use content for team knowledge sharing
Estimated time: Ongoing (2-3 hours/week for depth)
🛠️ Technical Stack & Architecture
Technology Stack
| Category | Technology | Version | Purpose |
|---|---|---|---|
| Static Site Generator | VitePress | 1.6.4 | Fast, modern documentation site builder |
| Frontend Framework | Vue | 3.5.30 | Reactive UI components & interactivity |
| Language | TypeScript | 5.9.3 | Type-safe configuration & development |
| Diagrams | Mermaid | 10.9.5 | Flow charts, sequence diagrams, architecture |
| Mermaid Plugin | vitepress-plugin-mermaid | 2.0.17 | Seamless Mermaid integration in VitePress |
| Code Formatter | Prettier | 3.8.1 | Consistent code & markdown formatting |
| Node Runtime | Node.js | >= 18.x | JavaScript runtime |
| Build Tool | Vite | (bundled) | Ultra-fast build tool |
System Architecture
┌─────────────────────────────────────────────────────────┐
│ Browser / End User │
└────────────────────┬────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ VitePress SPA + Mermaid Diagrams + Search │
│ (Built to .vitepress/dist/) │
└────────────────────┬────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ CDN + Edge Caching │
│ (Vercel Deployment) │
└────────────────────┬────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ GitHub Repository (Source Control) │
│ (CI/CD Pipelines, Version History, Issues) │
└─────────────────────────────────────────────────────────┘How It Works
- Content: Markdown files in
NN-title/directories - Configuration:
.vitepress/config.tsautomatically generates sidebar from directory structure - Build:
npm run buildprocesses Markdown → HTML using VitePress - Deployment: GitHub Actions triggers Vercel deployment on push to
souravbranch - Serving: Vercel CDN serves built HTML globally
🚀 Build & Deployment
Available npm Scripts
# Development
npm run dev # Start VitePress dev server (http://localhost:5173)
npm run build # Production build (Vercel / CI)
npm run build:local # Format with Prettier, then build
npm run preview # Preview production build locally
# Code Quality
npm run format # Format all markdown, JS, JSON with Prettier
npm run format:check # Check if files are formatted correctly
# CI/CD (Automatic)
# Vercel deploys on push to 'sourav' branch via GitHub ActionsLocal Development Workflow
# 1. Clone and set up
git clone https://github.com/SOURAV-ROY/sdm.git
cd sdm
npm install
# 2. Create a new branch for your changes
git checkout -b feature/your-feature
# 3. Start dev server (auto-reloads on file changes)
npm run dev
# 4. Edit markdown files in any level directory
# Changes appear instantly in browser
# 5. Format and commit
npm run format
git add .
git commit -m "feat: add new content"
git push origin feature/your-feature
# 6. Create a Pull Request on GitHubDeployment Platforms
Vercel Deployment (Current)
- Branch:
sourav(default) - Build Command:
npm run build - Output Directory:
.vitepress/dist - Automatic: Deploys on every push to
souravbranch - Live URL: https://souravsdm.vercel.app
GitHub Pages (Alternative)
# Manual deployment
npm run build
# Commit .vitepress/dist to gh-pages branch
git subtree push --prefix .vitepress/dist origin gh-pagesLocal Deployment (Testing)
npm run preview # Serves .vitepress/dist locally
# Open http://localhost:4173📝 Contributing
We welcome contributions! Whether it's fixing typos, adding content, or improving diagrams, every contribution helps the community.
How to Contribute
Fork the repository
bashgit clone https://github.com/YOUR_USERNAME/sdm.git cd sdmCreate a feature branch
bashgit checkout -b feature/add-new-topicMake your changes
- Add/edit Markdown files in the appropriate level directories
- Follow the existing Markdown structure and formatting
- Add Mermaid diagrams where helpful
Format your code
bashnpm run formatPush and create Pull Request
bashgit add . git commit -m "feat: add comprehensive XYZ guide" git push origin feature/add-new-topic
Content Guidelines
✅ Good contributions:
- Clear, concise explanations with examples
- Mermaid diagrams for visual concepts
- References to real-world implementations
- Proper Markdown formatting
- No grammatical errors
❌ Avoid:
- Overly long paragraphs without structure
- Promotional content or spam
- Untested information
- Broken links or outdated references
Issues & Discussions
- Found a bug? → Open an issue with details
- Want to suggest content? → Open a discussion
- Need clarification? → Ask in issues
❓ FAQ
Q: Which level should I start with?
A: Start with Level 1: Foundations unless you already understand networking and distributed systems. Everyone needs foundational knowledge to understand higher levels.
Q: Is this free?
A: Yes, completely free and open-source under the ISC license. No hidden costs or premium content.
Q: How long will it take to complete?
A: Depends on your background. Beginners: 4-6 weeks (2-3 hours/week). Experienced: 2-3 weeks (4-5 hours/week).
Q: Can I use this for interview prep?
A: Absolutely! Level 8 (Case Studies) is specifically designed for interview preparation. Practice the real-world systems multiple times.
Q: Are there real code examples?
A: No, this is theory and high-level design focused. For implementations, check the resources section for GitHub repositories.
Q: Can I offline the content?
A: Yes! Clone the repository and run npm run build, then open .vitepress/dist/index.html in your browser.
Q: How often is this updated?
A: Content is updated regularly based on community feedback and new technologies. The latest updates are in the main branch.
Q: Is there a video version?
A: Not currently, but the Markdown content is compatible with PDF generators if you need offline reading.
Q: Can I translate this?
A: Contributions with translations are welcome! Open an issue to discuss translation to other languages.
🔗 Resources & References
Related Learning Resources
- System Design Primer - Detailed reference guide
- Grokking System Design Interview - Interactive course
- High Scalability Blog - Case studies and architecture blogs
- NGINX Blog - Load balancing and web architecture
Technology Documentation
- Redis Documentation - Caching & data structures
- Kafka Documentation - Message streaming
- PostgreSQL Docs - Relational databases
- MongoDB Manual - Document databases
Real-World Architectures
- AWS Architecture - Cloud design patterns
- Google Cloud Solutions - Large-scale systems
- LinkedIn Engineering Blog - Production systems
- Netflix Tech Blog - Streaming architecture
Books Worth Reading
- "Designing Data-Intensive Applications" by Martin Kleppmann
- "The Art of Scalability" by Martin Abbott & Michael Fisher
- "Building Microservices" by Sam Newman
- "Site Reliability Engineering" by Google
📊 Project Statistics & Coverage
| Metric | Value |
|---|---|
| Total Learning Modules | 12 |
| Real-World Systems | 15+ |
| Mermaid Diagrams | 100+ |
| Markdown Content Files | 50+ |
| Total Content Lines | 5000+ |
| Interview Frameworks | 3+ |
| Design Patterns Covered | 20+ |
| Companies Featured | 15+ (Google, Netflix, Uber, Amazon, Meta, Microsoft, etc.) |
| Fully Open Source | Yes ✓ |
| License | ISC |
| Last Updated | 2026 |
❓ FAQ - Frequently Asked Questions
Q: Which level should I start with?
A: Start with Level 1: Foundations unless you already understand networking and distributed systems fundamentals. Every subsequent level builds on these concepts. If you're already familiar with networking and basic architecture, you can start with Level 2: Scalability.
Q: Is this completely free?
A: Yes, completely free and open-source under the ISC license. No hidden costs, premium tiers, or paywalls. All content is available at https://souravsdm.vercel.app.
Q: How long does it take to complete?
A: It depends on your background and learning pace:
- Beginners: 4-6 weeks (2-3 hours/week)
- Intermediate: 3-4 weeks (4-5 hours/week)
- Interview Prep Only: 2-3 weeks (5-6 hours/week)
- Complete Mastery: 2-3 months (regular study)
Q: Can I use this for interview preparation?
A: Absolutely! Level 8 (Case Studies) and Level 12 (Interview Prep) are specifically designed for interview preparation. Combine these with practicing the Level 7 (Real-World Systems) repeatedly to build confidence.
Q: Are there implementation examples or code?
A: This repository focuses on architecture and design theory, not implementations. For code examples, we recommend checking the "Real-World Architectures" and "Technology Documentation" sections in the Resources for links to open-source implementations and documentation.
Q: Can I download and use this offline?
A: Yes! Run these commands:
git clone https://github.com/SOURAV-ROY/sdm.git
cd sdm
npm install
npm run build
# Open .vitepress/dist/index.html in your browserQ: How frequently is the content updated?
A: Content is updated regularly based on community feedback, new technologies, and best practices. Major updates are released quarterly. Check the GitHub Releases page for version history.
Q: Is there a video version?
A: Not currently, but Markdown content can be converted to PDF for offline reading using various tools (pandoc, printing to PDF, etc.).
Q: Can I translate this to another language?
A: Absolutely! Translations are welcome. Please open an issue to discuss translating to your language. We can coordinate with the community.
Q: Which systems should I focus on for interviews?
A: Start with:
- Easy: URL Shortener, Notification System
- Medium: Twitter, Instagram, Notification System
- Hard: Netflix, Uber, YouTube
Practice each 2-3 times before moving to the next difficulty level. By interview time, you should be comfortable with 8-10 systems.
Q: What if I don't understand a concept?
A: Try these strategies:
- Re-read the section (understanding takes multiple passes)
- Draw diagrams and visualize the concept
- Search for additional resources in our Resources section
- Open an issue asking for clarification
- Check related levels for foundational context
🔗 Resources & References
Learning Resources & Communities
- System Design Primer - Comprehensive reference
- Grokking System Design Interview - Interactive course
- High Scalability Blog - Real architecture case studies
- Donne Martin's Repo - System design resources
Technology Documentation
- Redis Documentation - Caching & data structures
- Apache Kafka - Message streaming
- PostgreSQL - Relational databases
- MongoDB - Document databases
- Elasticsearch - Search & analytics
Real-World Architectures & Engineering Blogs
- AWS Architecture Center
- Google Cloud Solutions
- LinkedIn Engineering Blog
- Netflix Tech Blog
- Uber Engineering
- Twitter Engineering
Essential Books
- "Designing Data-Intensive Applications" by Martin Kleppmann - Must-read for distributed systems
- "The Art of Scalability" by Martin Abbott & Michael Fisher - Practical scalability patterns
- "Building Microservices" by Sam Newman - Microservices architecture
- "Site Reliability Engineering" by Google - Production systems & monitoring
- "Release It!" by Michael Nygard - Production-ready systems
Academic & Research
- Distributed Systems Research Papers - Google Scholar
- USENIX Proceedings - Conference papers
- ArXiv CS.DC - Distributed Computing papers
📄 License & Author
License
This project is licensed under the ISC License - You can freely use, modify, and distribute this content for any purpose, personal or commercial.
ISC License (ISC)
Copyright (c) 2024-2026 Sourav Roy
Permission to use, copy, modify, and/or distribute this software for any purpose with or
without fee is hereby granted, provided that the above copyright notice and this permission
notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO
THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.Author & Contributors
Created & Maintained by: Sourav Roy
Community: This project thrives through contributions from the developer community. Check the Contributors page for everyone who has helped.
🙏 Show Your Support
If this resource has helped you learn system design, prepare for interviews, or deepen your architectural knowledge:
- ⭐ Star this repository - Helps others discover this resource
- 🐦 Share on social media - "I just learned system design with System Design Mastery"
- 💬 Open issues/discussions - Share feedback, suggestions, or topics you'd like covered
- 🤝 Contribute - Fix typos, improve explanations, add diagrams, or suggest new content
- 📢 Recommend to friends - Share with engineers preparing for interviews
🚀 Getting Started Today
Choose Your Path:
- 📚 Complete Beginner? → Start with Level 1
- 🎯 Interview Coming Up? → Go to Real-World Systems
- 🔧 Already Know Basics? → Jump to Databases
- 🎤 Want Interview Prep? → Check Interview Module
Or contribute:
# Clone and start contributing
git clone https://github.com/SOURAV-ROY/sdm.git
cd sdm
npm install
npm run dev
# Visit http://localhost:5173 and start editing!📞 Support & Feedback
- 🐛 Report Bugs: GitHub Issues
- 💬 Ask Questions: GitHub Discussions
- 🔗 Submit Changes: GitHub Pull Requests
- 🎓 Learning Help: Discussion Board
🎯 Roadmap & Future Plans
Planned Additions (2026):
- [ ] More real-world systems (Spotify, Discord, Reddit)
- [ ] Video explanation supplements
- [ ] Interactive system design simulator
- [ ] Community contributed case studies
- [ ] Multilingual support (Spanish, Chinese, Hindi, etc.)
- [ ] PDF download option
- [ ] Mobile-optimized view
- [ ] Advanced system design patterns
Help Wanted:
- 📝 Content writers & editors
- 🎨 Diagram & visualization contributors
- 🌍 Translators
- 🧪 Reviewers for technical accuracy
- 💬 Interview experience sharers
Interested? Open an issue and let's collaborate!
💡 Key Insights & Philosophy
This repository is built on the belief that:
- System design knowledge should be free - High-quality learning shouldn't require expensive courses
- Learning should be structured - Not scattered across different sources
- Examples matter - Real system case studies teach faster than theory alone
- Practice builds confidence - Repeated practice with frameworks leads to interview success
- Community creates better resources - Open-source collaboration improves knowledge for everyone
🏆 Success Stories
This resource has helped hundreds of engineers:
- 📈 Land interviews at FAANG companies
- 🎓 Deepen their architectural understanding
- 💼 Become technical leaders and architects
- 🔬 Pursue research in distributed systems
- 🚀 Build production systems at scale
Want to share your story? Open a discussion and inspire others!
📊 Quick Reference
By Experience Level:
| Level | Start Here | Focus Areas | Time |
|---|---|---|---|
| 0-2 yrs | Foundations | Levels 1-3, then 7-8 | 4-6 wks |
| 2-5 yrs | Databases | Levels 3-6, then 7, 9 | 3-4 wks |
| Interviews | Real Systems | Level 7 & 8 deeply | 2-3 wks |
| Architects | All Levels | 6, 9-10 focus | Ongoing |
| Researchers | All Levels | Then Level 11 | Variable |
By System Type:
| Need | Level | Time | Priority |
|---|---|---|---|
| URL Shortener | 7 | 1.5 hrs | Medium |
| Social Media | 7 | 2 hrs | High |
| Video Platform | 7 | 2.5 hrs | Medium |
| Notification System | 7 | 1.5 hrs | Medium |
| E-Commerce | 7 | 2.5 hrs | High |
🌟 Made with ❤️ by SOURAV ROY
Last Updated: 2026
Version: 1.0.0+
License: ISC (Free & Open Source)
