Skip to content

🗄️ Level 3 — Databases

The right database choice can make or break your system.


3.1 SQL vs NoSQL

Comparison at a Glance

FeatureSQL (Relational)NoSQL (Non-Relational)
StructureTables (Rows/Columns)Documents, Key-Value, Graph
SchemaFixed / PredefinedFlexible / Dynamic
ScalingVertical (mostly)Horizontal (natively)
TransactionsACID (Strong)BASE (Soft)

💻 JS Example: Query Styles

Compare how you fetch a user in different worlds:

javascript
// SQL (e.g., using pg or mysql2)
const [rows] = await db.query("SELECT * FROM users WHERE email = ?", [email]);

// NoSQL (e.g., using Mongoose/MongoDB)
const user = await User.findOne({ email: email });

3.2 Database Replication

Replication ensures you have multiple copies of data for high availability and read scalability.

Primary-Replica (Master-Slave)


3.3 Database Sharding (Partitioning)

Sharding splits a large dataset across multiple database instances horizontally.

Horizontal Split


3.4 Indexes

Indexes speed up reads but slow down writes. Think of them as the index at the back of a textbook.

B-Tree Visualization (Conceptual)


3.5 ACID Transactions

ACID properties ensure database reliability.

💻 JS Example: ACID Transaction logic

Conceptual logic for a bank transfer to demonstrate Atomicity.

javascript
async function transferMoney(amount, fromId, toId) {
  const session = await db.startSession();
  try {
    session.startTransaction();

    // 1. Deduct from sender
    await Accounts.update(fromId, { $inc: { balance: -amount } }, { session });

    // 2. Add to receiver
    await Accounts.update(toId, { $inc: { balance: amount } }, { session });

    // Commit both or none
    await session.commitTransaction();
  } catch (error) {
    // Rollback if anything fails
    await session.abortTransaction();
    throw error;
  } finally {
    session.endSession();
  }
}

3.6 Choosing the Right Database

Decision Flowchart


✅ Checklist Before Moving On

  • [ ] I know when to pick SQL vs NoSQL
  • [ ] I can explain primary-replica replication
  • [ ] I understand sharding vs indexing
  • [ ] I can implement basic transaction logic
  • [ ] I can explain what a distributed database is and the CAP theorem
  • [ ] I understand what a Vector Database is and when to use one
  • [ ] I can explain what a vector embedding is and how it captures meaning
  • [ ] I understand ANN search and how HNSW enables fast similarity lookup
  • [ ] I understand the RAG pattern and why it solves LLM hallucinations
  • [ ] I can choose between Cosine, Euclidean, and Dot Product metrics
  • [ ] I understand when to use FAISS vs a managed Vector Database
  • [ ] I can explain how LLMs work (Transformer, attention, KV Cache)
  • [ ] I understand what an AI Agent is and how the ReAct loop works
  • [ ] I know the difference between RAG, Fine-tuning, and Agents

3.7 Distributed Databases

How to scale databases horizontally across multiple nodes while managing consistency, availability, and partitions.

📄 Read the full Distributed Databases guide →

Covers: Core concepts, CAP Theorem, Architecture, Replication vs Partitioning, Two-Phase Commit (2PC) and examples like DynamoDB, Cassandra, and Spanner.


3.8 Vector Databases

Store and search meaning, not just values. Essential for AI-powered apps.

📄 Read the full Vector Database guide →

Covers: embeddings, ANN search, HNSW index, RAG pattern, Pinecone / Qdrant / ChromaDB, semantic search examples.


3.9 Vector Embeddings

How raw data (text, images, audio) is converted into numbers that capture meaning.

📄 Read the full Vector Embeddings guide →

Covers: one-hot vs dense embeddings, cosine / euclidean / dot-product similarity, OpenAI API, local models, chunking strategy, multimodal CLIP embeddings.


3.10 ANN & HNSW Index

How vector databases search billions of points in milliseconds.

📄 Read the full ANN & HNSW Index guide →

Covers: why brute-force KNN fails at scale, how HNSW layers work, IVF clustering, M / ef_construction / ef_search tuning, Product Quantization, memory estimation, and benchmark data.


3.11 RAG Pattern

Retrieval-Augmented Generation: Giving LLMs a reliable memory and ending hallucinations.

📄 Read the full RAG Pattern guide →

Covers: RAG architecture, query vs ingestion phases, code examples, hybrid search, reranking, and when to use RAG vs Fine-Tuning.


3.12 Similarity Metrics

Choosing the right "ruler" to measure distance in vector space.

📄 Read the full Similarity Metrics guide →

Covers: Euclidean (L2), Cosine Similarity, Dot Product, Math intuition, visualization, and the impact of normalization.


3.13 FAISS Vector Library

The high-performance C++ engine behind billion-scale semantic search.

📄 Read the full FAISS guide →

Covers: Why it's a library and not a database server, Index types (IVF, PQ), GPU acceleration, and Python implementation examples.


3.14 LLMs & AI Agents

How Large Language Models work under the hood, and how to build autonomous AI Agents that reason, plan, and act.

📄 Read the full LLMs & AI Agents guide →

Covers: Transformer architecture, self-attention math, KV Cache, prompt engineering (CoT, ReAct, ToT), AI Agent anatomy, memory systems, tool use / function calling, multi-agent topologies, production LLM system design, and key failure modes.


3.15 Data Storage Patterns

The complete playbook for how data is structured, replicated, and served across modern systems.

📄 Read the full Data Storage Patterns guide →

Covers: Normalization vs Denormalization, Key-Value / Document / Wide-Column / Graph stores, CQRS, Event Sourcing, Data Lake vs Data Warehouse, Replication patterns (Primary-Replica, Multi-Primary, Quorum), Caching patterns (Cache-Aside, Write-Through, Write-Behind), Polyglot Persistence, Lambda vs Kappa Architecture, and a full pattern selection decision guide.


➡️ Next: Level 4 — Caching

Released under the ISC License.