Skip to content

Databases — Core Concepts

Interview Relevance: Critical — Choosing the right database is the most frequent decision point in any system design interview.

The database is where your system's "state" lives. In an interview, you must justify your choice based on data structure, consistency needs, and scale.


SQL vs. NoSQL: The Fundamental Choice


1. Relational Databases (SQL)

Best for: Structured data, complex joins, and strong consistency (financial systems, ERPs).

ACID Properties

The 4 pillars of SQL reliability:

  • Atomicity: All-or-nothing transactions.
  • Consistency: Data must follow all schema rules/constraints.
  • Isolation: Concurrent transactions don't interfere.
  • Durability: Once committed, data stays saved even during power failure.

Indexing (The "Cheat Code" for Speed)

Indexes make reads faster but writes slower (because the index must be updated).

  • B-Tree Index: The default. Balanced tree structure. Great for equality and range queries (id = 5 or age > 20).
  • Hash Index: Map-like. Extremely fast for exact matches (id = 5) but useless for ranges.

Normalization vs. Denormalization

  • Normalization (3NF): Minimizing redundancy by splitting data into multiple tables.
    • ✅ Saves space, prevents data anomalies.
    • ❌ Requires complex JOINs which are slow at scale.
  • Denormalization: Intentionally adding redundant data to a single table.
    • ✅ Much faster reads (no joins).
    • ❌ Stale data risks, more storage space.

2. Non-Relational Databases (NoSQL)

Best for: High-velocity data, massive scale, and flexible schemas.

The 4 Types of NoSQL

TypeExampleBest For
Key-ValueRedis, DynamoDBSessions, shopping carts, simple lookups
DocumentMongoDB, CouchDBCatalogs, content management (JSON-like)
Column-FamilyCassandra, HBaseTime-series, analytics, massive write-heavy logs
GraphNeo4jSocial networks, fraud detection, recommendation engines

BASE Properties (The NoSQL Counterpart to ACID)

  • Basically Available: The system guarantees availability.
  • Soft-state: State may change over time without input (due to eventual consistency).
  • Eventual consistency: Data will be consistent eventually across all nodes.

The CAP Theorem

You can only pick two in a distributed system under network partition.

Note: In distributed systems, P is mandatory. You are always choosing between CP (Consistency) or AP (Availability).


3. Database Scaling

When a single server isn't enough.

Replication (Read Scaling)

Creating copies of the database to handle more read traffic.

  • Master-Slave: Master handles writes; Slaves handle reads.
    • ✅ Scale reads linearly.
    • ❌ Master is still a single point of failure for writes.
  • Multi-Master: Any node can handle writes.
    • ✅ High availability for writes.
    • ❌ Conflict resolution is extremely complex.

Partitioning / Sharding (Write Scaling)

Splitting a large dataset into smaller chunks ("shards") across different servers.

  • Vertical Sharding: Moving specific columns/tables to different servers (e.g., User table on Server A, Order table on Server B).
  • Horizontal Sharding (The Real "Sharding"): Moving rows of the same table to different servers based on a Sharding Key.

Worked Example: URL Shortener DB Choice

Scenario 1: Small Scale (100 req/sec)

Choice: PostgreSQL (SQL)

  • Why: ACID compliance ensures 100% reliability. Simple to set up. Relations between Users and URLs are easy to manage.

Scenario 2: FAANG Scale (100K+ req/sec)

Choice: Cassandra (NoSQL Column-Family)

  • Why:
    1. Scale: Cassandra is designed for massive writes.
    2. No Joins: URL shorteners are mostly key-value lookups (short_link -> long_url).
    3. Availability: We prefer AP (Availability) over strict consistency. If a user gets an "eventually consistent" old URL for 1 second, it's fine.

Interview Cheat Sheet

Decision Framework

  1. Relationship Complexity: If it's a social network with deep connections -> Graph.
  2. Consistency Needs: If it's a bank or payment system -> SQL.
  3. Write Volume: If it's IoT sensors or metrics logging -> Cassandra/NoSQL.
  4. Data Flexibility: If the schema changes every week -> Document (MongoDB).

The Interview Phrase

"Given the massive write volume and simple key-value access pattern of the URL shortener, I'll choose Cassandra. Since it's an AP system, it offers high availability even across data centers, and we can scale writes horizontally by adding more nodes without the complex conflict resolution of multi-master SQL."


Red Flags vs. Green Flags

🔴 Red Flag🟢 Green Flag
"I'll use MySQL because I know it.""I'll use MySQL because we need ACID transactions for payments."
Suggesting Joins in a NoSQL DB.Denormalizing data in NoSQL to avoid Joins.
Forgetting to mention the Sharding Key.Picking a high-cardinality Sharding Key to avoid hot shards.
Claiming NoSQL is "just faster."Explaining NoSQL speed in terms of limited Joins and BASE consistency.

IMPORTANT

Sharding is a last resort. Always try Replication or Caching first. If you jump straight to sharding in an interview, you might look like you're over-engineering.

TIP

When talking about Sharding, always mention Hot Shards (e.g., Justin Bieber's user ID causing 90% of traffic to hit one shard) and how you'd mitigate it (consistent hashing).

Released under the ISC License.