🔍 ANN & HNSW Index — Complete Guide

Finding the nearest neighbor in a billion-point space — in milliseconds.

NOTE

Prerequisite: This guide builds on vector embeddings. Read Vector Embeddings → first if you haven't.

The Problem — Why We Need Special Indexes

Imagine you have 10 million product vectors stored in a database. A user searches for a product. The naive approach:

text

For every stored vector (10,000,000):
  1. Compute distance to query vector
  2. Track the smallest distance

Return the closest one.

With 1536 dimensions per vector, that's 15 billion floating-point operations per query. At ~1μs per op — that's 4+ hours per search. Completely unusable.

The solution: Don't check every vector. Use a smart index that navigates directly to the answer.

What is ANN — Approximate Nearest Neighbor?

ANN (Approximate Nearest Neighbor) is a family of algorithms that find vectors close to the nearest neighbor — sacrificing a small amount of accuracy for orders-of-magnitude speedup.

text

KNN  (Exact):   finds the 100% perfect closest vector  — slow
ANN  (Approx):  finds a ~98% close vector              — milliseconds

The trade-off is almost always worth it. In real-world semantic search, a 98%-accurate result is perceptually identical to 100% accurate.

The Core Insight

Instead of scanning everything, ANN algorithms pre-build a data structure during indexing that allows skipping most of the data during query time.

ANN Algorithm Family

Algorithm	Full Name	Approach	Best For
HNSW	Hierarchical Navigable Small World	Layered graph	General purpose, high accuracy
IVF	Inverted File Index	Cluster + search	Large datasets, memory-constrained
LSH	Locality Sensitive Hashing	Hash collisions	Very fast, lower accuracy
Annoy	Approximate Nearest Neighbors Oh Yeah	Random projection trees	Read-heavy, Spotify
FLAT	Flat (brute force)	Scan everything	Tiny datasets, 100% accuracy
ScaNN	Scalable Nearest Neighbors	Quantization + reorder	Google-scale, cutting edge

Deep Dive — HNSW (The Gold Standard)

HNSW (Hierarchical Navigable Small World) is the most widely used ANN index. It powers Qdrant, Weaviate, Milvus, pgvector, and many others.

The Intuition — Six Degrees of Separation

The algorithm is inspired by the "small world" network theory — the idea that any two people on Earth are connected by ~6 acquaintances. A graph where everyone has a few long-range connections + many local connections allows reaching any node very quickly.

HNSW applies this to vector space:

How HNSW Works — Step by Step

Phase 1: Building the Index

Phase 2: Searching the Index

Key HNSW Parameters

Parameter	Default	Effect of Increasing	Effect of Decreasing
`M`	16	Better recall, more memory	Faster build, less memory
`ef_construction`	200	Better index quality, slower build	Faster build, lower quality
`ef_search`	50	Better recall, slower query	Faster query, lower recall

Deep Dive — IVF (Inverted File Index)

IVF clusters vectors into Voronoi cells (groups of nearby vectors), then at query time only searches the closest clusters.

IVF Key Parameters

Parameter	Meaning	Rule of Thumb
`nlists`	Number of clusters (Voronoi cells)	`sqrt(total_vectors)`
`nprobe`	How many clusters to search at query time	5–20% of nlists

text

nprobe = 1    → Very fast, lower recall (~80%)
nprobe = 10   → Balanced (~95% recall)
nprobe = nlists → Full scan (= exact KNN)

IVF vs HNSW vs FLAT — When to Use Which

Side-by-Side Comparison

Dimension	FLAT	IVF	HNSW
Recall accuracy	100%	~90–98%	~95–99%
Query speed	🐢 Slow	⚡ Fast	⚡⚡ Fastest
Build time	Instant	Medium	Slow
Memory usage	Low	Medium	High
Supports updates	✅ Easy	⚠️ Rebuild	✅ Good
Best dataset size	< 100K	1M–100M	100K–50M
Used by	Test/dev	FAISS, Milvus	Qdrant, Weaviate

Recall vs Speed Trade-off Curve

💻 Code Examples

HNSW with Qdrant (JavaScript)

javascript

// npm install @qdrant/js-client-rest
import { QdrantClient } from "@qdrant/js-client-rest";

const client = new QdrantClient({ host: "localhost", port: 6333 });

// ── Create collection with HNSW config ─────────────────
await client.createCollection("products", {
  vectors: {
    size: 1536, // must match embedding model output
    distance: "Cosine", // or "Euclid", "Dot"
  },
  hnsw_config: {
    m: 16, // connections per node (default: 16)
    ef_construct: 200, // build-time candidate list (default: 100)
    full_scan_threshold: 10_000, // switch to brute-force below this
  },
});

// ── Insert vectors ──────────────────────────────────────
await client.upsert("products", {
  points: [
    {
      id: 1,
      vector: [0.21, 0.87, 0.43 /* ...1536 floats */],
      payload: { name: "Waterproof Hiking Boot", price: 129 },
    },
    {
      id: 2,
      vector: [0.22, 0.85, 0.44 /* ...1536 floats */],
      payload: { name: "Trail Running Shoe", price: 99 },
    },
  ],
});

// ── Search using ANN (HNSW) ─────────────────────────────
const results = await client.search("products", {
  vector: [0.2, 0.88, 0.42 /* ...query vector */],
  limit: 5,
  params: {
    hnsw_ef: 128, // runtime ef — higher = better recall, slower
  },
  with_payload: true,
});

results.forEach((r) => {
  console.log(`${r.payload.name} — score: ${r.score.toFixed(4)}`);
});
// Trail Running Shoe   — score: 0.9991
// Waterproof Hiking Boot — score: 0.9876

HNSW with pgvector (SQL + Node.js)

javascript

// npm install pg
import pg from "pg";
const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });

// ── Enable extension & create table ────────────────────
await pool.query(`CREATE EXTENSION IF NOT EXISTS vector`);

await pool.query(`
  CREATE TABLE IF NOT EXISTS documents (
    id     SERIAL PRIMARY KEY,
    content TEXT,
    embedding vector(1536)
  )
`);

// ── Create HNSW index ───────────────────────────────────
await pool.query(`
  CREATE INDEX ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 200)
`);

// ── Insert a document with its embedding ────────────────
const vec = await embed("Our 30-day return policy applies to all items.");

await pool.query(`INSERT INTO documents (content, embedding) VALUES ($1, $2)`, [
  "Our 30-day return policy applies to all items.",
  JSON.stringify(vec),
]);

// ── ANN search — finds semantically closest rows ────────
const queryVec = await embed("How long can I return something?");

const { rows } = await pool.query(
  `SELECT content,
          1 - (embedding <=> $1::vector) AS similarity
   FROM documents
   ORDER BY embedding <=> $1::vector
   LIMIT 5`,
  [JSON.stringify(queryVec)]
);

rows.forEach((r) => console.log(r.content, "→", r.similarity.toFixed(4)));
// "Our 30-day return policy..." → 0.9712

IVF with FAISS (Python — for reference)

TIP

Want to see how FAISS fits into production architectures and how it uses GPU acceleration? Read the full FAISS Vector Library guide →

python

# pip install faiss-cpu numpy
import faiss
import numpy as np

d = 128          # vector dimensions
n = 1_000_000    # number of vectors

# ── Generate random vectors ─────────────────────────────
vectors = np.random.random((n, d)).astype("float32")
faiss.normalize_L2(vectors)  # normalize for cosine similarity

# ── Build IVF + Product Quantization index ──────────────
nlist = 1000      # number of Voronoi clusters
quantizer = faiss.IndexFlatL2(d)
index = faiss.IndexIVFFlat(quantizer, d, nlist)

index.train(vectors)   # clustering phase (done once)
index.add(vectors)     # insert all vectors

# ── Query ───────────────────────────────────────────────
index.nprobe = 10      # search 10 clusters (tune for recall vs speed)
query = np.random.random((1, d)).astype("float32")
faiss.normalize_L2(query)

distances, indices = index.search(query, k=5)
print("Top 5 nearest IDs:", indices[0])
print("Distances:", distances[0])

🔬 Real-World Example — Tuning HNSW for Production

Scenario

You're building a semantic search for a legal document archive. 5 million documents, 1536-dim embeddings. Requirements: < 50ms query latency and > 98% recall.

javascript

// Production-tuned HNSW config for Qdrant
const collectionConfig = {
  vectors: { size: 1536, distance: "Cosine" },
  hnsw_config: {
    m: 16, // ✅ balanced memory vs recall
    ef_construct: 200, // ✅ high-quality index build
    on_disk: false, // keep in RAM for speed
  },
  optimizers_config: {
    memmap_threshold: 20_000, // switch to mmap above this segment size
  },
};

// ── Runtime search params (tune per query type) ─────────
const searchConfig = {
  params: {
    hnsw_ef: 128, // ✅ > default 50 for better recall
    exact: false, // use ANN (set true for FLAT / exact)
  },
  limit: 10,
};

🧮 HNSW Memory Estimation

text

Memory (bytes) ≈  n × (d × 4 + M × 2 × 8)

Where:
  n  = number of vectors
  d  = vector dimensions
  M  = connections per node

Example: 1M vectors, 1536 dims, M=16
  = 1,000,000 × (1536 × 4 + 16 × 2 × 8)
  = 1,000,000 × (6144 + 256)
  = 1,000,000 × 6400
  = 6.4 GB RAM

Scale	Vectors	Dims	M	Est. RAM
Small	100K	384	16	~230 MB
Medium	1M	1536	16	~6.4 GB
Large	10M	1536	16	~64 GB
X-Large	100M+	1536	16	> 640 GB → use IVF+PQ or DiskANN

Product Quantization (PQ) — Compressing Vectors

When memory is tight, Product Quantization compresses each vector from 32-bit floats to a much smaller code.

Typical IVF + PQ setup:

python

# FAISS: IVF with Product Quantization
m_pq = 8       # number of sub-vectors (d must be divisible by m)
nbits = 8      # bits per sub-vector code (256 centroids)

index = faiss.IndexIVFPQ(quantizer, d, nlist, m_pq, nbits)
# Result: 768x memory reduction, ~5% recall drop

📊 ANN Benchmarks Reference

Performance measured on the standard ANN Benchmarks dataset (glove-100-angular):

Algorithm	Recall@10	QPS	Memory
HNSW	99.3%	15,000	High
IVF (nprobe=50)	97.1%	8,500	Medium
Annoy (n=1000)	94.5%	3,200	Low
LSH	88.3%	12,000	Low
FLAT	100%	850	Low

Source: ann-benchmarks.com — the gold standard for ANN evaluation.

✅ Checklist Before Moving On

[ ] I understand why brute-force KNN is unusable at scale
[ ] I can explain what ANN means and the accuracy trade-off
[ ] I understand how HNSW layers enable fast graph traversal
[ ] I know what M, ef_construction, and ef_search control
[ ] I understand when to use FLAT vs IVF vs HNSW
[ ] I can estimate HNSW memory usage for a given dataset
[ ] I know what Product Quantization does and when to use it

📚 Further Reading

HNSW Paper (Malkov & Yashunin, 2018) — Original research paper
ANN Benchmarks — Performance comparison of all algorithms
Qdrant HNSW Docs — Production HNSW config guide
pgvector HNSW Guide — SQL + HNSW integration
FAISS Wiki — Facebook's production ANN library

➡️ Next: RAG Pattern → — then Vector Databases →

🔍 ANN & HNSW Index — Complete Guide ​

The Problem — Why We Need Special Indexes ​

What is ANN — Approximate Nearest Neighbor? ​

The Core Insight ​

ANN Algorithm Family ​

Deep Dive — HNSW (The Gold Standard) ​

The Intuition — Six Degrees of Separation ​

How HNSW Works — Step by Step ​

Phase 1: Building the Index ​

Phase 2: Searching the Index ​

Key HNSW Parameters ​

Deep Dive — IVF (Inverted File Index) ​

IVF Key Parameters ​

IVF vs HNSW vs FLAT — When to Use Which ​

Side-by-Side Comparison ​

Recall vs Speed Trade-off Curve ​

💻 Code Examples ​

HNSW with Qdrant (JavaScript) ​

HNSW with pgvector (SQL + Node.js) ​

IVF with FAISS (Python — for reference) ​

🔬 Real-World Example — Tuning HNSW for Production ​

Scenario ​

🧮 HNSW Memory Estimation ​

Product Quantization (PQ) — Compressing Vectors ​

📊 ANN Benchmarks Reference ​

✅ Checklist Before Moving On ​

📚 Further Reading ​

🔍 ANN & HNSW Index — Complete Guide

The Problem — Why We Need Special Indexes

What is ANN — Approximate Nearest Neighbor?

The Core Insight

ANN Algorithm Family

Deep Dive — HNSW (The Gold Standard)

The Intuition — Six Degrees of Separation

How HNSW Works — Step by Step

Phase 1: Building the Index

Phase 2: Searching the Index

Key HNSW Parameters

Deep Dive — IVF (Inverted File Index)

IVF Key Parameters

IVF vs HNSW vs FLAT — When to Use Which

Side-by-Side Comparison

Recall vs Speed Trade-off Curve

💻 Code Examples

HNSW with Qdrant (JavaScript)

HNSW with pgvector (SQL + Node.js)

IVF with FAISS (Python — for reference)

🔬 Real-World Example — Tuning HNSW for Production

Scenario

🧮 HNSW Memory Estimation

Product Quantization (PQ) — Compressing Vectors

📊 ANN Benchmarks Reference

✅ Checklist Before Moving On

📚 Further Reading