๐ ANN & HNSW Index โ Complete Guide โ
Finding the nearest neighbor in a billion-point space โ in milliseconds.
NOTE
Prerequisite: This guide builds on vector embeddings. Read Vector Embeddings โ first if you haven't.
The Problem โ Why We Need Special Indexes โ
Imagine you have 10 million product vectors stored in a database. A user searches for a product. The naive approach:
For every stored vector (10,000,000):
1. Compute distance to query vector
2. Track the smallest distance
Return the closest one.With 1536 dimensions per vector, that's 15 billion floating-point operations per query. At ~1ฮผs per op โ that's 4+ hours per search. Completely unusable.
The solution: Don't check every vector. Use a smart index that navigates directly to the answer.
What is ANN โ Approximate Nearest Neighbor? โ
ANN (Approximate Nearest Neighbor) is a family of algorithms that find vectors close to the nearest neighbor โ sacrificing a small amount of accuracy for orders-of-magnitude speedup.
KNN (Exact): finds the 100% perfect closest vector โ slow
ANN (Approx): finds a ~98% close vector โ millisecondsThe trade-off is almost always worth it. In real-world semantic search, a 98%-accurate result is perceptually identical to 100% accurate.
The Core Insight โ
Instead of scanning everything, ANN algorithms pre-build a data structure during indexing that allows skipping most of the data during query time.
ANN Algorithm Family โ
| Algorithm | Full Name | Approach | Best For |
|---|---|---|---|
| HNSW | Hierarchical Navigable Small World | Layered graph | General purpose, high accuracy |
| IVF | Inverted File Index | Cluster + search | Large datasets, memory-constrained |
| LSH | Locality Sensitive Hashing | Hash collisions | Very fast, lower accuracy |
| Annoy | Approximate Nearest Neighbors Oh Yeah | Random projection trees | Read-heavy, Spotify |
| FLAT | Flat (brute force) | Scan everything | Tiny datasets, 100% accuracy |
| ScaNN | Scalable Nearest Neighbors | Quantization + reorder | Google-scale, cutting edge |
Deep Dive โ HNSW (The Gold Standard) โ
HNSW (Hierarchical Navigable Small World) is the most widely used ANN index. It powers Qdrant, Weaviate, Milvus, pgvector, and many others.
The Intuition โ Six Degrees of Separation โ
The algorithm is inspired by the "small world" network theory โ the idea that any two people on Earth are connected by ~6 acquaintances. A graph where everyone has a few long-range connections + many local connections allows reaching any node very quickly.
HNSW applies this to vector space:
How HNSW Works โ Step by Step โ
Phase 1: Building the Index โ
Phase 2: Searching the Index โ
Key HNSW Parameters โ
| Parameter | Default | Effect of Increasing | Effect of Decreasing |
|---|---|---|---|
M | 16 | Better recall, more memory | Faster build, less memory |
ef_construction | 200 | Better index quality, slower build | Faster build, lower quality |
ef_search | 50 | Better recall, slower query | Faster query, lower recall |
Deep Dive โ IVF (Inverted File Index) โ
IVF clusters vectors into Voronoi cells (groups of nearby vectors), then at query time only searches the closest clusters.
IVF Key Parameters โ
| Parameter | Meaning | Rule of Thumb |
|---|---|---|
nlists | Number of clusters (Voronoi cells) | sqrt(total_vectors) |
nprobe | How many clusters to search at query time | 5โ20% of nlists |
nprobe = 1 โ Very fast, lower recall (~80%)
nprobe = 10 โ Balanced (~95% recall)
nprobe = nlists โ Full scan (= exact KNN)IVF vs HNSW vs FLAT โ When to Use Which โ
Side-by-Side Comparison โ
| Dimension | FLAT | IVF | HNSW |
|---|---|---|---|
| Recall accuracy | 100% | ~90โ98% | ~95โ99% |
| Query speed | ๐ข Slow | โก Fast | โกโก Fastest |
| Build time | Instant | Medium | Slow |
| Memory usage | Low | Medium | High |
| Supports updates | โ Easy | โ ๏ธ Rebuild | โ Good |
| Best dataset size | < 100K | 1Mโ100M | 100Kโ50M |
| Used by | Test/dev | FAISS, Milvus | Qdrant, Weaviate |
Recall vs Speed Trade-off Curve โ
๐ป Code Examples โ
HNSW with Qdrant (JavaScript) โ
// npm install @qdrant/js-client-rest
import { QdrantClient } from "@qdrant/js-client-rest";
const client = new QdrantClient({ host: "localhost", port: 6333 });
// โโ Create collection with HNSW config โโโโโโโโโโโโโโโโโ
await client.createCollection("products", {
vectors: {
size: 1536, // must match embedding model output
distance: "Cosine", // or "Euclid", "Dot"
},
hnsw_config: {
m: 16, // connections per node (default: 16)
ef_construct: 200, // build-time candidate list (default: 100)
full_scan_threshold: 10_000, // switch to brute-force below this
},
});
// โโ Insert vectors โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
await client.upsert("products", {
points: [
{
id: 1,
vector: [0.21, 0.87, 0.43 /* ...1536 floats */],
payload: { name: "Waterproof Hiking Boot", price: 129 },
},
{
id: 2,
vector: [0.22, 0.85, 0.44 /* ...1536 floats */],
payload: { name: "Trail Running Shoe", price: 99 },
},
],
});
// โโ Search using ANN (HNSW) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
const results = await client.search("products", {
vector: [0.2, 0.88, 0.42 /* ...query vector */],
limit: 5,
params: {
hnsw_ef: 128, // runtime ef โ higher = better recall, slower
},
with_payload: true,
});
results.forEach((r) => {
console.log(`${r.payload.name} โ score: ${r.score.toFixed(4)}`);
});
// Trail Running Shoe โ score: 0.9991
// Waterproof Hiking Boot โ score: 0.9876HNSW with pgvector (SQL + Node.js) โ
// npm install pg
import pg from "pg";
const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });
// โโ Enable extension & create table โโโโโโโโโโโโโโโโโโโโ
await pool.query(`CREATE EXTENSION IF NOT EXISTS vector`);
await pool.query(`
CREATE TABLE IF NOT EXISTS documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536)
)
`);
// โโ Create HNSW index โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
await pool.query(`
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200)
`);
// โโ Insert a document with its embedding โโโโโโโโโโโโโโโโ
const vec = await embed("Our 30-day return policy applies to all items.");
await pool.query(`INSERT INTO documents (content, embedding) VALUES ($1, $2)`, [
"Our 30-day return policy applies to all items.",
JSON.stringify(vec),
]);
// โโ ANN search โ finds semantically closest rows โโโโโโโโ
const queryVec = await embed("How long can I return something?");
const { rows } = await pool.query(
`SELECT content,
1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 5`,
[JSON.stringify(queryVec)]
);
rows.forEach((r) => console.log(r.content, "โ", r.similarity.toFixed(4)));
// "Our 30-day return policy..." โ 0.9712IVF with FAISS (Python โ for reference) โ
TIP
Want to see how FAISS fits into production architectures and how it uses GPU acceleration? Read the full FAISS Vector Library guide โ
# pip install faiss-cpu numpy
import faiss
import numpy as np
d = 128 # vector dimensions
n = 1_000_000 # number of vectors
# โโ Generate random vectors โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
vectors = np.random.random((n, d)).astype("float32")
faiss.normalize_L2(vectors) # normalize for cosine similarity
# โโ Build IVF + Product Quantization index โโโโโโโโโโโโโโ
nlist = 1000 # number of Voronoi clusters
quantizer = faiss.IndexFlatL2(d)
index = faiss.IndexIVFFlat(quantizer, d, nlist)
index.train(vectors) # clustering phase (done once)
index.add(vectors) # insert all vectors
# โโ Query โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
index.nprobe = 10 # search 10 clusters (tune for recall vs speed)
query = np.random.random((1, d)).astype("float32")
faiss.normalize_L2(query)
distances, indices = index.search(query, k=5)
print("Top 5 nearest IDs:", indices[0])
print("Distances:", distances[0])๐ฌ Real-World Example โ Tuning HNSW for Production โ
Scenario โ
You're building a semantic search for a legal document archive. 5 million documents, 1536-dim embeddings. Requirements: < 50ms query latency and > 98% recall.
// Production-tuned HNSW config for Qdrant
const collectionConfig = {
vectors: { size: 1536, distance: "Cosine" },
hnsw_config: {
m: 16, // โ
balanced memory vs recall
ef_construct: 200, // โ
high-quality index build
on_disk: false, // keep in RAM for speed
},
optimizers_config: {
memmap_threshold: 20_000, // switch to mmap above this segment size
},
};
// โโ Runtime search params (tune per query type) โโโโโโโโโ
const searchConfig = {
params: {
hnsw_ef: 128, // โ
> default 50 for better recall
exact: false, // use ANN (set true for FLAT / exact)
},
limit: 10,
};๐งฎ HNSW Memory Estimation โ
Memory (bytes) โ n ร (d ร 4 + M ร 2 ร 8)
Where:
n = number of vectors
d = vector dimensions
M = connections per node
Example: 1M vectors, 1536 dims, M=16
= 1,000,000 ร (1536 ร 4 + 16 ร 2 ร 8)
= 1,000,000 ร (6144 + 256)
= 1,000,000 ร 6400
= 6.4 GB RAM| Scale | Vectors | Dims | M | Est. RAM |
|---|---|---|---|---|
| Small | 100K | 384 | 16 | ~230 MB |
| Medium | 1M | 1536 | 16 | ~6.4 GB |
| Large | 10M | 1536 | 16 | ~64 GB |
| X-Large | 100M+ | 1536 | 16 | > 640 GB โ use IVF+PQ or DiskANN |
Product Quantization (PQ) โ Compressing Vectors โ
When memory is tight, Product Quantization compresses each vector from 32-bit floats to a much smaller code.
Typical IVF + PQ setup:
# FAISS: IVF with Product Quantization
m_pq = 8 # number of sub-vectors (d must be divisible by m)
nbits = 8 # bits per sub-vector code (256 centroids)
index = faiss.IndexIVFPQ(quantizer, d, nlist, m_pq, nbits)
# Result: 768x memory reduction, ~5% recall drop๐ ANN Benchmarks Reference โ
Performance measured on the standard ANN Benchmarks dataset (glove-100-angular):
| Algorithm | Recall@10 | QPS | Memory |
|---|---|---|---|
| HNSW | 99.3% | 15,000 | High |
| IVF (nprobe=50) | 97.1% | 8,500 | Medium |
| Annoy (n=1000) | 94.5% | 3,200 | Low |
| LSH | 88.3% | 12,000 | Low |
| FLAT | 100% | 850 | Low |
Source: ann-benchmarks.com โ the gold standard for ANN evaluation.
โ Checklist Before Moving On โ
- [ ] I understand why brute-force KNN is unusable at scale
- [ ] I can explain what ANN means and the accuracy trade-off
- [ ] I understand how HNSW layers enable fast graph traversal
- [ ] I know what M, ef_construction, and ef_search control
- [ ] I understand when to use FLAT vs IVF vs HNSW
- [ ] I can estimate HNSW memory usage for a given dataset
- [ ] I know what Product Quantization does and when to use it
๐ Further Reading โ
- HNSW Paper (Malkov & Yashunin, 2018) โ Original research paper
- ANN Benchmarks โ Performance comparison of all algorithms
- Qdrant HNSW Docs โ Production HNSW config guide
- pgvector HNSW Guide โ SQL + HNSW integration
- FAISS Wiki โ Facebook's production ANN library
โก๏ธ Next: RAG Pattern โ โ then Vector Databases โ
