🗄️ Level 3 — Databases

The right database choice can make or break your system.

3.1 SQL vs NoSQL

Comparison at a Glance

Feature	SQL (Relational)	NoSQL (Non-Relational)
Structure	Tables (Rows/Columns)	Documents, Key-Value, Graph
Schema	Fixed / Predefined	Flexible / Dynamic
Scaling	Vertical (mostly)	Horizontal (natively)
Transactions	ACID (Strong)	BASE (Soft)

💻 JS Example: Query Styles

Compare how you fetch a user in different worlds:

javascript

// SQL (e.g., using pg or mysql2)
const [rows] = await db.query("SELECT * FROM users WHERE email = ?", [email]);

// NoSQL (e.g., using Mongoose/MongoDB)
const user = await User.findOne({ email: email });

3.2 Database Replication

Replication ensures you have multiple copies of data for high availability and read scalability.

Primary-Replica (Master-Slave)

3.3 Database Sharding (Partitioning)

Sharding splits a large dataset across multiple database instances horizontally.

Horizontal Split

3.4 Indexes

Indexes speed up reads but slow down writes. Think of them as the index at the back of a textbook.

B-Tree Visualization (Conceptual)

3.5 ACID Transactions

ACID properties ensure database reliability.

💻 JS Example: ACID Transaction logic

Conceptual logic for a bank transfer to demonstrate Atomicity.

javascript

async function transferMoney(amount, fromId, toId) {
  const session = await db.startSession();
  try {
    session.startTransaction();

    // 1. Deduct from sender
    await Accounts.update(fromId, { $inc: { balance: -amount } }, { session });

    // 2. Add to receiver
    await Accounts.update(toId, { $inc: { balance: amount } }, { session });

    // Commit both or none
    await session.commitTransaction();
  } catch (error) {
    // Rollback if anything fails
    await session.abortTransaction();
    throw error;
  } finally {
    session.endSession();
  }
}

3.6 Choosing the Right Database

Decision Flowchart

✅ Checklist Before Moving On

[ ] I know when to pick SQL vs NoSQL
[ ] I can explain primary-replica replication
[ ] I understand sharding vs indexing
[ ] I can implement basic transaction logic
[ ] I can explain what a distributed database is and the CAP theorem
[ ] I understand what a Vector Database is and when to use one
[ ] I can explain what a vector embedding is and how it captures meaning
[ ] I understand ANN search and how HNSW enables fast similarity lookup
[ ] I understand the RAG pattern and why it solves LLM hallucinations
[ ] I can choose between Cosine, Euclidean, and Dot Product metrics
[ ] I understand when to use FAISS vs a managed Vector Database
[ ] I can explain how LLMs work (Transformer, attention, KV Cache)
[ ] I understand what an AI Agent is and how the ReAct loop works
[ ] I know the difference between RAG, Fine-tuning, and Agents

3.7 Distributed Databases

How to scale databases horizontally across multiple nodes while managing consistency, availability, and partitions.

📄 Read the full Distributed Databases guide →

Covers: Core concepts, CAP Theorem, Architecture, Replication vs Partitioning, Two-Phase Commit (2PC) and examples like DynamoDB, Cassandra, and Spanner.

3.8 Vector Databases

Store and search meaning, not just values. Essential for AI-powered apps.

📄 Read the full Vector Database guide →

Covers: embeddings, ANN search, HNSW index, RAG pattern, Pinecone / Qdrant / ChromaDB, semantic search examples.

3.9 Vector Embeddings

How raw data (text, images, audio) is converted into numbers that capture meaning.

📄 Read the full Vector Embeddings guide →

Covers: one-hot vs dense embeddings, cosine / euclidean / dot-product similarity, OpenAI API, local models, chunking strategy, multimodal CLIP embeddings.

3.10 ANN & HNSW Index

How vector databases search billions of points in milliseconds.

📄 Read the full ANN & HNSW Index guide →

Covers: why brute-force KNN fails at scale, how HNSW layers work, IVF clustering, M / ef_construction / ef_search tuning, Product Quantization, memory estimation, and benchmark data.

3.11 RAG Pattern

Retrieval-Augmented Generation: Giving LLMs a reliable memory and ending hallucinations.

📄 Read the full RAG Pattern guide →

Covers: RAG architecture, query vs ingestion phases, code examples, hybrid search, reranking, and when to use RAG vs Fine-Tuning.

3.12 Similarity Metrics

Choosing the right "ruler" to measure distance in vector space.

📄 Read the full Similarity Metrics guide →

Covers: Euclidean (L2), Cosine Similarity, Dot Product, Math intuition, visualization, and the impact of normalization.

3.13 FAISS Vector Library

The high-performance C++ engine behind billion-scale semantic search.

📄 Read the full FAISS guide →

Covers: Why it's a library and not a database server, Index types (IVF, PQ), GPU acceleration, and Python implementation examples.

3.14 LLMs & AI Agents

How Large Language Models work under the hood, and how to build autonomous AI Agents that reason, plan, and act.

📄 Read the full LLMs & AI Agents guide →

Covers: Transformer architecture, self-attention math, KV Cache, prompt engineering (CoT, ReAct, ToT), AI Agent anatomy, memory systems, tool use / function calling, multi-agent topologies, production LLM system design, and key failure modes.

3.15 Data Storage Patterns

The complete playbook for how data is structured, replicated, and served across modern systems.

📄 Read the full Data Storage Patterns guide →

Covers: Normalization vs Denormalization, Key-Value / Document / Wide-Column / Graph stores, CQRS, Event Sourcing, Data Lake vs Data Warehouse, Replication patterns (Primary-Replica, Multi-Primary, Quorum), Caching patterns (Cache-Aside, Write-Through, Write-Behind), Polyglot Persistence, Lambda vs Kappa Architecture, and a full pattern selection decision guide.

➡️ Next: Level 4 — Caching

🗄️ Level 3 — Databases ​

3.1 SQL vs NoSQL ​

Comparison at a Glance ​

💻 JS Example: Query Styles ​

3.2 Database Replication ​

Primary-Replica (Master-Slave) ​

3.3 Database Sharding (Partitioning) ​

Horizontal Split ​

3.4 Indexes ​

B-Tree Visualization (Conceptual) ​

3.5 ACID Transactions ​

💻 JS Example: ACID Transaction logic ​

3.6 Choosing the Right Database ​

Decision Flowchart ​

✅ Checklist Before Moving On ​

3.7 Distributed Databases ​

3.8 Vector Databases ​

3.9 Vector Embeddings ​

3.10 ANN & HNSW Index ​

3.11 RAG Pattern ​

3.12 Similarity Metrics ​

3.13 FAISS Vector Library ​

3.14 LLMs & AI Agents ​

3.15 Data Storage Patterns ​