🗄️ Level 3 — Databases
The right database choice can make or break your system.
3.1 SQL vs NoSQL
Comparison at a Glance
| Feature | SQL (Relational) | NoSQL (Non-Relational) |
|---|---|---|
| Structure | Tables (Rows/Columns) | Documents, Key-Value, Graph |
| Schema | Fixed / Predefined | Flexible / Dynamic |
| Scaling | Vertical (mostly) | Horizontal (natively) |
| Transactions | ACID (Strong) | BASE (Soft) |
💻 JS Example: Query Styles
Compare how you fetch a user in different worlds:
// SQL (e.g., using pg or mysql2)
const [rows] = await db.query("SELECT * FROM users WHERE email = ?", [email]);
// NoSQL (e.g., using Mongoose/MongoDB)
const user = await User.findOne({ email: email });3.2 Database Replication
Replication ensures you have multiple copies of data for high availability and read scalability.
Primary-Replica (Master-Slave)
3.3 Database Sharding (Partitioning)
Sharding splits a large dataset across multiple database instances horizontally.
Horizontal Split
3.4 Indexes
Indexes speed up reads but slow down writes. Think of them as the index at the back of a textbook.
B-Tree Visualization (Conceptual)
3.5 ACID Transactions
ACID properties ensure database reliability.
💻 JS Example: ACID Transaction logic
Conceptual logic for a bank transfer to demonstrate Atomicity.
async function transferMoney(amount, fromId, toId) {
const session = await db.startSession();
try {
session.startTransaction();
// 1. Deduct from sender
await Accounts.update(fromId, { $inc: { balance: -amount } }, { session });
// 2. Add to receiver
await Accounts.update(toId, { $inc: { balance: amount } }, { session });
// Commit both or none
await session.commitTransaction();
} catch (error) {
// Rollback if anything fails
await session.abortTransaction();
throw error;
} finally {
session.endSession();
}
}3.6 Choosing the Right Database
Decision Flowchart
✅ Checklist Before Moving On
- [ ] I know when to pick SQL vs NoSQL
- [ ] I can explain primary-replica replication
- [ ] I understand sharding vs indexing
- [ ] I can implement basic transaction logic
- [ ] I can explain what a distributed database is and the CAP theorem
- [ ] I understand what a Vector Database is and when to use one
- [ ] I can explain what a vector embedding is and how it captures meaning
- [ ] I understand ANN search and how HNSW enables fast similarity lookup
- [ ] I understand the RAG pattern and why it solves LLM hallucinations
- [ ] I can choose between Cosine, Euclidean, and Dot Product metrics
- [ ] I understand when to use FAISS vs a managed Vector Database
- [ ] I can explain how LLMs work (Transformer, attention, KV Cache)
- [ ] I understand what an AI Agent is and how the ReAct loop works
- [ ] I know the difference between RAG, Fine-tuning, and Agents
3.7 Distributed Databases
How to scale databases horizontally across multiple nodes while managing consistency, availability, and partitions.
📄 Read the full Distributed Databases guide →
Covers: Core concepts, CAP Theorem, Architecture, Replication vs Partitioning, Two-Phase Commit (2PC) and examples like DynamoDB, Cassandra, and Spanner.
3.8 Vector Databases
Store and search meaning, not just values. Essential for AI-powered apps.
📄 Read the full Vector Database guide →
Covers: embeddings, ANN search, HNSW index, RAG pattern, Pinecone / Qdrant / ChromaDB, semantic search examples.
3.9 Vector Embeddings
How raw data (text, images, audio) is converted into numbers that capture meaning.
📄 Read the full Vector Embeddings guide →
Covers: one-hot vs dense embeddings, cosine / euclidean / dot-product similarity, OpenAI API, local models, chunking strategy, multimodal CLIP embeddings.
3.10 ANN & HNSW Index
How vector databases search billions of points in milliseconds.
📄 Read the full ANN & HNSW Index guide →
Covers: why brute-force KNN fails at scale, how HNSW layers work, IVF clustering, M / ef_construction / ef_search tuning, Product Quantization, memory estimation, and benchmark data.
3.11 RAG Pattern
Retrieval-Augmented Generation: Giving LLMs a reliable memory and ending hallucinations.
📄 Read the full RAG Pattern guide →
Covers: RAG architecture, query vs ingestion phases, code examples, hybrid search, reranking, and when to use RAG vs Fine-Tuning.
3.12 Similarity Metrics
Choosing the right "ruler" to measure distance in vector space.
📄 Read the full Similarity Metrics guide →
Covers: Euclidean (L2), Cosine Similarity, Dot Product, Math intuition, visualization, and the impact of normalization.
3.13 FAISS Vector Library
The high-performance C++ engine behind billion-scale semantic search.
Covers: Why it's a library and not a database server, Index types (IVF, PQ), GPU acceleration, and Python implementation examples.
3.14 LLMs & AI Agents
How Large Language Models work under the hood, and how to build autonomous AI Agents that reason, plan, and act.
📄 Read the full LLMs & AI Agents guide →
Covers: Transformer architecture, self-attention math, KV Cache, prompt engineering (CoT, ReAct, ToT), AI Agent anatomy, memory systems, tool use / function calling, multi-agent topologies, production LLM system design, and key failure modes.
3.15 Data Storage Patterns
The complete playbook for how data is structured, replicated, and served across modern systems.
📄 Read the full Data Storage Patterns guide →
Covers: Normalization vs Denormalization, Key-Value / Document / Wide-Column / Graph stores, CQRS, Event Sourcing, Data Lake vs Data Warehouse, Replication patterns (Primary-Replica, Multi-Primary, Quorum), Caching patterns (Cache-Aside, Write-Through, Write-Behind), Polyglot Persistence, Lambda vs Kappa Architecture, and a full pattern selection decision guide.
➡️ Next: Level 4 — Caching
