Skip to content

๐Ÿ” ANN & HNSW Index โ€” Complete Guide โ€‹

Finding the nearest neighbor in a billion-point space โ€” in milliseconds.

NOTE

Prerequisite: This guide builds on vector embeddings. Read Vector Embeddings โ†’ first if you haven't.


The Problem โ€” Why We Need Special Indexes โ€‹

Imagine you have 10 million product vectors stored in a database. A user searches for a product. The naive approach:

text
For every stored vector (10,000,000):
  1. Compute distance to query vector
  2. Track the smallest distance

Return the closest one.

With 1536 dimensions per vector, that's 15 billion floating-point operations per query. At ~1ฮผs per op โ€” that's 4+ hours per search. Completely unusable.

The solution: Don't check every vector. Use a smart index that navigates directly to the answer.


What is ANN โ€” Approximate Nearest Neighbor? โ€‹

ANN (Approximate Nearest Neighbor) is a family of algorithms that find vectors close to the nearest neighbor โ€” sacrificing a small amount of accuracy for orders-of-magnitude speedup.

text
KNN  (Exact):   finds the 100% perfect closest vector  โ€” slow
ANN  (Approx):  finds a ~98% close vector              โ€” milliseconds

The trade-off is almost always worth it. In real-world semantic search, a 98%-accurate result is perceptually identical to 100% accurate.

The Core Insight โ€‹

Instead of scanning everything, ANN algorithms pre-build a data structure during indexing that allows skipping most of the data during query time.


ANN Algorithm Family โ€‹

AlgorithmFull NameApproachBest For
HNSWHierarchical Navigable Small WorldLayered graphGeneral purpose, high accuracy
IVFInverted File IndexCluster + searchLarge datasets, memory-constrained
LSHLocality Sensitive HashingHash collisionsVery fast, lower accuracy
AnnoyApproximate Nearest Neighbors Oh YeahRandom projection treesRead-heavy, Spotify
FLATFlat (brute force)Scan everythingTiny datasets, 100% accuracy
ScaNNScalable Nearest NeighborsQuantization + reorderGoogle-scale, cutting edge

Deep Dive โ€” HNSW (The Gold Standard) โ€‹

HNSW (Hierarchical Navigable Small World) is the most widely used ANN index. It powers Qdrant, Weaviate, Milvus, pgvector, and many others.

The Intuition โ€” Six Degrees of Separation โ€‹

The algorithm is inspired by the "small world" network theory โ€” the idea that any two people on Earth are connected by ~6 acquaintances. A graph where everyone has a few long-range connections + many local connections allows reaching any node very quickly.

HNSW applies this to vector space:

How HNSW Works โ€” Step by Step โ€‹

Phase 1: Building the Index โ€‹

Phase 2: Searching the Index โ€‹

Key HNSW Parameters โ€‹

ParameterDefaultEffect of IncreasingEffect of Decreasing
M16Better recall, more memoryFaster build, less memory
ef_construction200Better index quality, slower buildFaster build, lower quality
ef_search50Better recall, slower queryFaster query, lower recall

Deep Dive โ€” IVF (Inverted File Index) โ€‹

IVF clusters vectors into Voronoi cells (groups of nearby vectors), then at query time only searches the closest clusters.

IVF Key Parameters โ€‹

ParameterMeaningRule of Thumb
nlistsNumber of clusters (Voronoi cells)sqrt(total_vectors)
nprobeHow many clusters to search at query time5โ€“20% of nlists
text
nprobe = 1    โ†’ Very fast, lower recall (~80%)
nprobe = 10   โ†’ Balanced (~95% recall)
nprobe = nlists โ†’ Full scan (= exact KNN)

IVF vs HNSW vs FLAT โ€” When to Use Which โ€‹

Side-by-Side Comparison โ€‹

DimensionFLATIVFHNSW
Recall accuracy100%~90โ€“98%~95โ€“99%
Query speed๐Ÿข Slowโšก Fastโšกโšก Fastest
Build timeInstantMediumSlow
Memory usageLowMediumHigh
Supports updatesโœ… Easyโš ๏ธ Rebuildโœ… Good
Best dataset size< 100K1Mโ€“100M100Kโ€“50M
Used byTest/devFAISS, MilvusQdrant, Weaviate

Recall vs Speed Trade-off Curve โ€‹


๐Ÿ’ป Code Examples โ€‹

HNSW with Qdrant (JavaScript) โ€‹

javascript
// npm install @qdrant/js-client-rest
import { QdrantClient } from "@qdrant/js-client-rest";

const client = new QdrantClient({ host: "localhost", port: 6333 });

// โ”€โ”€ Create collection with HNSW config โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
await client.createCollection("products", {
  vectors: {
    size: 1536, // must match embedding model output
    distance: "Cosine", // or "Euclid", "Dot"
  },
  hnsw_config: {
    m: 16, // connections per node (default: 16)
    ef_construct: 200, // build-time candidate list (default: 100)
    full_scan_threshold: 10_000, // switch to brute-force below this
  },
});

// โ”€โ”€ Insert vectors โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
await client.upsert("products", {
  points: [
    {
      id: 1,
      vector: [0.21, 0.87, 0.43 /* ...1536 floats */],
      payload: { name: "Waterproof Hiking Boot", price: 129 },
    },
    {
      id: 2,
      vector: [0.22, 0.85, 0.44 /* ...1536 floats */],
      payload: { name: "Trail Running Shoe", price: 99 },
    },
  ],
});

// โ”€โ”€ Search using ANN (HNSW) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
const results = await client.search("products", {
  vector: [0.2, 0.88, 0.42 /* ...query vector */],
  limit: 5,
  params: {
    hnsw_ef: 128, // runtime ef โ€” higher = better recall, slower
  },
  with_payload: true,
});

results.forEach((r) => {
  console.log(`${r.payload.name} โ€” score: ${r.score.toFixed(4)}`);
});
// Trail Running Shoe   โ€” score: 0.9991
// Waterproof Hiking Boot โ€” score: 0.9876

HNSW with pgvector (SQL + Node.js) โ€‹

javascript
// npm install pg
import pg from "pg";
const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });

// โ”€โ”€ Enable extension & create table โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
await pool.query(`CREATE EXTENSION IF NOT EXISTS vector`);

await pool.query(`
  CREATE TABLE IF NOT EXISTS documents (
    id     SERIAL PRIMARY KEY,
    content TEXT,
    embedding vector(1536)
  )
`);

// โ”€โ”€ Create HNSW index โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
await pool.query(`
  CREATE INDEX ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 200)
`);

// โ”€โ”€ Insert a document with its embedding โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
const vec = await embed("Our 30-day return policy applies to all items.");

await pool.query(`INSERT INTO documents (content, embedding) VALUES ($1, $2)`, [
  "Our 30-day return policy applies to all items.",
  JSON.stringify(vec),
]);

// โ”€โ”€ ANN search โ€” finds semantically closest rows โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
const queryVec = await embed("How long can I return something?");

const { rows } = await pool.query(
  `SELECT content,
          1 - (embedding <=> $1::vector) AS similarity
   FROM documents
   ORDER BY embedding <=> $1::vector
   LIMIT 5`,
  [JSON.stringify(queryVec)]
);

rows.forEach((r) => console.log(r.content, "โ†’", r.similarity.toFixed(4)));
// "Our 30-day return policy..." โ†’ 0.9712

IVF with FAISS (Python โ€” for reference) โ€‹

TIP

Want to see how FAISS fits into production architectures and how it uses GPU acceleration? Read the full FAISS Vector Library guide โ†’

python
# pip install faiss-cpu numpy
import faiss
import numpy as np

d = 128          # vector dimensions
n = 1_000_000    # number of vectors

# โ”€โ”€ Generate random vectors โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
vectors = np.random.random((n, d)).astype("float32")
faiss.normalize_L2(vectors)  # normalize for cosine similarity

# โ”€โ”€ Build IVF + Product Quantization index โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
nlist = 1000      # number of Voronoi clusters
quantizer = faiss.IndexFlatL2(d)
index = faiss.IndexIVFFlat(quantizer, d, nlist)

index.train(vectors)   # clustering phase (done once)
index.add(vectors)     # insert all vectors

# โ”€โ”€ Query โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
index.nprobe = 10      # search 10 clusters (tune for recall vs speed)
query = np.random.random((1, d)).astype("float32")
faiss.normalize_L2(query)

distances, indices = index.search(query, k=5)
print("Top 5 nearest IDs:", indices[0])
print("Distances:", distances[0])

๐Ÿ”ฌ Real-World Example โ€” Tuning HNSW for Production โ€‹

Scenario โ€‹

You're building a semantic search for a legal document archive. 5 million documents, 1536-dim embeddings. Requirements: < 50ms query latency and > 98% recall.

javascript
// Production-tuned HNSW config for Qdrant
const collectionConfig = {
  vectors: { size: 1536, distance: "Cosine" },
  hnsw_config: {
    m: 16, // โœ… balanced memory vs recall
    ef_construct: 200, // โœ… high-quality index build
    on_disk: false, // keep in RAM for speed
  },
  optimizers_config: {
    memmap_threshold: 20_000, // switch to mmap above this segment size
  },
};

// โ”€โ”€ Runtime search params (tune per query type) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
const searchConfig = {
  params: {
    hnsw_ef: 128, // โœ… > default 50 for better recall
    exact: false, // use ANN (set true for FLAT / exact)
  },
  limit: 10,
};

๐Ÿงฎ HNSW Memory Estimation โ€‹

text
Memory (bytes) โ‰ˆ  n ร— (d ร— 4 + M ร— 2 ร— 8)

Where:
  n  = number of vectors
  d  = vector dimensions
  M  = connections per node

Example: 1M vectors, 1536 dims, M=16
  = 1,000,000 ร— (1536 ร— 4 + 16 ร— 2 ร— 8)
  = 1,000,000 ร— (6144 + 256)
  = 1,000,000 ร— 6400
  = 6.4 GB RAM
ScaleVectorsDimsMEst. RAM
Small100K38416~230 MB
Medium1M153616~6.4 GB
Large10M153616~64 GB
X-Large100M+153616> 640 GB โ†’ use IVF+PQ or DiskANN

Product Quantization (PQ) โ€” Compressing Vectors โ€‹

When memory is tight, Product Quantization compresses each vector from 32-bit floats to a much smaller code.

Typical IVF + PQ setup:

python
# FAISS: IVF with Product Quantization
m_pq = 8       # number of sub-vectors (d must be divisible by m)
nbits = 8      # bits per sub-vector code (256 centroids)

index = faiss.IndexIVFPQ(quantizer, d, nlist, m_pq, nbits)
# Result: 768x memory reduction, ~5% recall drop

๐Ÿ“Š ANN Benchmarks Reference โ€‹

Performance measured on the standard ANN Benchmarks dataset (glove-100-angular):

AlgorithmRecall@10QPSMemory
HNSW99.3%15,000High
IVF (nprobe=50)97.1%8,500Medium
Annoy (n=1000)94.5%3,200Low
LSH88.3%12,000Low
FLAT100%850Low

Source: ann-benchmarks.com โ€” the gold standard for ANN evaluation.


โœ… Checklist Before Moving On โ€‹

  • [ ] I understand why brute-force KNN is unusable at scale
  • [ ] I can explain what ANN means and the accuracy trade-off
  • [ ] I understand how HNSW layers enable fast graph traversal
  • [ ] I know what M, ef_construction, and ef_search control
  • [ ] I understand when to use FLAT vs IVF vs HNSW
  • [ ] I can estimate HNSW memory usage for a given dataset
  • [ ] I know what Product Quantization does and when to use it

๐Ÿ“š Further Reading โ€‹


โžก๏ธ Next: RAG Pattern โ†’ โ€” then Vector Databases โ†’

Released under the ISC License.