indexes

July 18, 2025 · 2 min read

an index is a data structure that improves the speed of data retrieval operations by providing quick access to specific information without having to search through every piece of data. (Eg, a books index - look for the word and then navigate to that page)

forward [ O(n) ]
- we have multiple documents (pages), and we extract words from each document and store it in a data structure
  - doc 1 — “book”, “pen”, “student”
  - doc 2 — “pen”
  - doc 3 — “student”
- when doing a search, it needs a full scan across the docs to retrieve which docs has the word.
- bring a pen -> do a full text scan on each doc -> Doc 1, 2
reverse/inverted [ O(1) ]
- we maintain a table which maps in reverse “words” -> “docs”
- now we can just lookup for the word, and then retrieve the docs that has this
  - “pen” — doc 1, 2
  - “student” — doc 1, 3
  - “book” — doc 1
vector
- a special data structure which organises high dimensional vectors for efficient similarity search
  - HNSW - hierarchical navigable small world
  - Annoy - approximate nearest neighbours
  - IVF - inverted file index
  - LSH - locality sensitive hashing
b-tree index
- default index type in most of the databases (postgres, mysql, oracle)
- search, insert, delete: O(log n)
- range query: O(log n + k) where k = number of results

here are only a few, there are many more… but just rewinding on the basics.