Databases — Core Concepts
Interview Relevance: Critical — Choosing the right database is the most frequent decision point in any system design interview.
The database is where your system's "state" lives. In an interview, you must justify your choice based on data structure, consistency needs, and scale.
SQL vs. NoSQL: The Fundamental Choice
1. Relational Databases (SQL)
Best for: Structured data, complex joins, and strong consistency (financial systems, ERPs).
ACID Properties
The 4 pillars of SQL reliability:
- Atomicity: All-or-nothing transactions.
- Consistency: Data must follow all schema rules/constraints.
- Isolation: Concurrent transactions don't interfere.
- Durability: Once committed, data stays saved even during power failure.
Indexing (The "Cheat Code" for Speed)
Indexes make reads faster but writes slower (because the index must be updated).
- B-Tree Index: The default. Balanced tree structure. Great for equality and range queries (
id = 5orage > 20). - Hash Index: Map-like. Extremely fast for exact matches (
id = 5) but useless for ranges.
Normalization vs. Denormalization
- Normalization (3NF): Minimizing redundancy by splitting data into multiple tables.
- ✅ Saves space, prevents data anomalies.
- ❌ Requires complex
JOINs which are slow at scale.
- Denormalization: Intentionally adding redundant data to a single table.
- ✅ Much faster reads (no joins).
- ❌ Stale data risks, more storage space.
2. Non-Relational Databases (NoSQL)
Best for: High-velocity data, massive scale, and flexible schemas.
The 4 Types of NoSQL
| Type | Example | Best For |
|---|---|---|
| Key-Value | Redis, DynamoDB | Sessions, shopping carts, simple lookups |
| Document | MongoDB, CouchDB | Catalogs, content management (JSON-like) |
| Column-Family | Cassandra, HBase | Time-series, analytics, massive write-heavy logs |
| Graph | Neo4j | Social networks, fraud detection, recommendation engines |
BASE Properties (The NoSQL Counterpart to ACID)
- Basically Available: The system guarantees availability.
- Soft-state: State may change over time without input (due to eventual consistency).
- Eventual consistency: Data will be consistent eventually across all nodes.
The CAP Theorem
You can only pick two in a distributed system under network partition.
Note: In distributed systems, P is mandatory. You are always choosing between CP (Consistency) or AP (Availability).
3. Database Scaling
When a single server isn't enough.
Replication (Read Scaling)
Creating copies of the database to handle more read traffic.
- Master-Slave: Master handles writes; Slaves handle reads.
- ✅ Scale reads linearly.
- ❌ Master is still a single point of failure for writes.
- Multi-Master: Any node can handle writes.
- ✅ High availability for writes.
- ❌ Conflict resolution is extremely complex.
Partitioning / Sharding (Write Scaling)
Splitting a large dataset into smaller chunks ("shards") across different servers.
- Vertical Sharding: Moving specific columns/tables to different servers (e.g., User table on Server A, Order table on Server B).
- Horizontal Sharding (The Real "Sharding"): Moving rows of the same table to different servers based on a Sharding Key.
Worked Example: URL Shortener DB Choice
Scenario 1: Small Scale (100 req/sec)
Choice: PostgreSQL (SQL)
- Why: ACID compliance ensures 100% reliability. Simple to set up. Relations between Users and URLs are easy to manage.
Scenario 2: FAANG Scale (100K+ req/sec)
Choice: Cassandra (NoSQL Column-Family)
- Why:
- Scale: Cassandra is designed for massive writes.
- No Joins: URL shorteners are mostly key-value lookups (
short_link->long_url). - Availability: We prefer AP (Availability) over strict consistency. If a user gets an "eventually consistent" old URL for 1 second, it's fine.
Interview Cheat Sheet
Decision Framework
- Relationship Complexity: If it's a social network with deep connections -> Graph.
- Consistency Needs: If it's a bank or payment system -> SQL.
- Write Volume: If it's IoT sensors or metrics logging -> Cassandra/NoSQL.
- Data Flexibility: If the schema changes every week -> Document (MongoDB).
The Interview Phrase
"Given the massive write volume and simple key-value access pattern of the URL shortener, I'll choose Cassandra. Since it's an AP system, it offers high availability even across data centers, and we can scale writes horizontally by adding more nodes without the complex conflict resolution of multi-master SQL."
Red Flags vs. Green Flags
| 🔴 Red Flag | 🟢 Green Flag |
|---|---|
| "I'll use MySQL because I know it." | "I'll use MySQL because we need ACID transactions for payments." |
| Suggesting Joins in a NoSQL DB. | Denormalizing data in NoSQL to avoid Joins. |
| Forgetting to mention the Sharding Key. | Picking a high-cardinality Sharding Key to avoid hot shards. |
| Claiming NoSQL is "just faster." | Explaining NoSQL speed in terms of limited Joins and BASE consistency. |
IMPORTANT
Sharding is a last resort. Always try Replication or Caching first. If you jump straight to sharding in an interview, you might look like you're over-engineering.
TIP
When talking about Sharding, always mention Hot Shards (e.g., Justin Bieber's user ID causing 90% of traffic to hit one shard) and how you'd mitigate it (consistent hashing).
