ann (approximate nearest neighbor) indexes

  ·  2 min read

inspiration - im working on vector, hybrid search for data catalogs and got curious about the different index structures used at scale.

firstly, what is ann index? approximate nearest neighbor index, a data structure that store your vectors in a way that lets you avoid comparing against all of them. so, i was deep diving into ann, and i’ve learnt a few interesting indexes, based on scale of the data points. there are 2 most popular indexes - graph and cluster based.

|---------------------------|---------------------------|
|                           |                           |
|    x---------x            |    0         1            |
|    |          \           |                           |
|    x---x-------x--x       |    0   0       1  1       |
|     \ /        |   \      |                           |
|      x         x----x     |      0         1    1     |
|     / \       /           |                           |
|    x   \     /            |    0                      |
|         \   /             |                           |
|          \ /              |                           |
|           x---x           |           2   2           |
|           |               |                           |
|           x               |           2               |
|                           |                           |
|                           |                           |
|    Graph-based index      |  Clustering-based index   |
|---------------------------|---------------------------|

graph based indexes

data points are stored as nodes in a graph, with edges linking nearby points. at query time, the search starts from an entry point and navigates through neighbors, moving closer to the query vector until the nearest candidates are found.

  • hnsw (hierarchical navigable small world) -> fully in-memory, ultra-low latency
  • diskann (disk based ann index) -> stores only a small navigation graph in RAM, all vectors are stored on ssd (disk).

cluster based indexes

data points are grouped into clusters (or partitions), each holding a subset of vectors. at query time, you first find the nearest cluster(s) to the query, then search within them.

  • ivf (inverted file) -> coarse quantization, common in faiss
  • pq (product quantization) -> compresses vectors into codes to reduce memory usage.

choosing an ann index depends on trade-offs between accuracy, recall, latency, and scale.

| index   | accuracy   | recall       | latency     | typical scale         |
|---------|------------|--------------|-------------|-----------------------|
| hnsw    | high       | high         | low         | millions (fits in RAM) |
| diskann | high       | high         | low–medium  | billions (RAM + SSD)   |
| spann   | high       | high         | medium      | billions–trillions     |
| ivf     | medium     | medium–high  | low–medium  | millions–billions      |
| pq      | medium–low | medium–high  | low–medium  | billions (compressed)  |

turbopuffer on the other hand uses cluster based indexes, along with SPFresh - which allows them to incrementally in-place update the index with new, updated, and deleted vectors without rebuilding the index from scratch.