⚡ FAISS (Facebook AI Similarity Search)
The engine behind billion-scale semantic search.
NOTE
Prerequisite: This guide assumes you understand ANN & HNSW Indexes → and Similarity Metrics →.
What is FAISS?
FAISS (Facebook AI Similarity Search) is an open-source library developed by Meta (Facebook) AI. It allows developers to quickly search for embeddings (vectors) that are similar to each other.
⚠️ The Golden Rule: FAISS is a Library, NOT a Database Server
When people say "Vector Database," they usually mean client-server applications like Pinecone, Qdrant, or Milvus.
FAISS is not a database. It is a low-level C++ library (with a Python wrapper) that runs in your application's memory.
- It does not have a REST API.
- It does not handle network requests.
- It does not manage users, authentication, or high-availability replication.
- It just does the math—faster than almost anything else on Earth.
🆚 When to Use FAISS vs. Managed Vector DBs
| Feature | FAISS (Library) | Qdrant / Pinecone (Databases) |
|---|---|---|
| Architecture | In-memory library | Client-Server API |
| Language | Python / C++ | Any (via HTTP/gRPC) |
| GPU Support | ✅ Exceptional | ❌ Usually CPU-only |
| Metadata Filtering | ❌ Difficult / Manual | ✅ Built-in & Fast |
| Setup Time | Instant (pip install) | Minutes/Hours (Deploy server) |
🏗️ Core Concepts of FAISS
FAISS is built around the concept of an Index. You create an index, add your vectors to it, and then query the index.
The 3 Main Index Types
FAISS provides dozens of index types, but 99% of use cases fall into these three:
IndexFlatL2(Exact Search)- How it works: Brute-force calculation of Euclidean distance against every vector.
- Pros: 100% accurate (Recall = 1).
- Cons: Very slow for > 100k vectors.
IndexIVFFlat(Fast / Approximate)- How it works: Uses Inverted File (IVF). It clusters your vectors into "Voronoi cells" (using K-Means). At query time, it only checks the vectors inside the closest clusters.
- Pros: Extremely fast.
- Cons: Requires a "training" phase to build the clusters. Slight drop in accuracy.
IndexIVFPQ(Fast + Low Memory)- How it works: Combines IVF with Product Quantization (PQ). It compresses the vectors (e.g., from 1536 floats down to just 8 bytes) to fit massive datasets in RAM.
- Pros: Can fit 1 Billion vectors in standard server RAM.
- Cons: Noticeable drop in accuracy due to compression.
🔄 The FAISS Pipeline
If you use an advanced index like IVF, you must train it before adding data.
💻 Code Example (Python)
Because FAISS is heavily optimized for Python data science workflows, Python is the standard way to use it.
# Install CPU or GPU version
pip install faiss-cpu
# pip install faiss-gpuExample: IVF + PQ (Fast & Compressed)
import faiss
import numpy as np
# 1. Setup dimensions and data
d = 128 # Vector dimension (e.g., 1536 for OpenAI)
n_data = 100000 # Number of database vectors
n_queries = 5 # Number of search queries
# Generate random mock data
np.random.seed(42)
database_vectors = np.random.random((n_data, d)).astype('float32')
query_vectors = np.random.random((n_queries, d)).astype('float32')
# 2. Define the Index parameters
nlist = 100 # Number of clusters (Voronoi cells)
m = 8 # Number of sub-quantizers for PQ (compresses vector to 8 bytes)
# 3. Create the Index
quantizer = faiss.IndexFlatL2(d) # The metric used to measure distance
index = faiss.IndexIVFPQ(quantizer, d, nlist, m, 8)
# 4. Train the Index (crucial step for IVF and PQ)
print(f"Is index trained? {index.is_trained}") # False
index.train(database_vectors)
print(f"Is index trained? {index.is_trained}") # True
# 5. Add vectors to the index
index.add(database_vectors)
print(f"Total vectors in index: {index.ntotal}")
# 6. Perform a Search
index.nprobe = 10 # How many clusters to search? (Higher = more accurate, slower)
k = 3 # Return top 3 results
# Search!
distances, indices = index.search(query_vectors, k)
print("\nSearch Results for Query 0:")
print(f"Matched Vector IDs: {indices[0]}")
print(f"Distances: {distances[0]}")🏎️ GPU Acceleration
FAISS's superpower is its GPU implementation. If you have an NVIDIA GPU, you can move your index to VRAM, allowing you to perform millions of distance calculations per second.
# Create a standard CPU index
cpu_index = faiss.IndexFlatL2(d)
# Move it to the GPU
res = faiss.StandardGpuResources()
gpu_index = faiss.index_cpu_to_gpu(res, 0, cpu_index)
# Add and search just like before, but 10x-50x faster!
gpu_index.add(database_vectors)
distances, indices = gpu_index.search(query_vectors, k)✅ Checklist Before Moving On
- [ ] I understand that FAISS is an in-memory library, not a standalone database server.
- [ ] I know why I would choose FAISS (speed, GPU, offline batching) over Pinecone (REST API, metadata).
- [ ] I can explain why an
IVFindex requires a training phase. - [ ] I understand how
nprobecontrols the trade-off between search speed and accuracy.
➡️ Next: Level 4 — Caching
