indexes

  ·  2 min read

an index is a data structure that improves the speed of data retrieval operations by providing quick access to specific information without having to search through every piece of data. (Eg, a books index - look for the word and then navigate to that page)

  • forward [ O(n) ]

    • we have multiple documents (pages), and we extract words from each document and store it in a data structure
      • doc 1 — “book”, “pen”, “student”
      • doc 2 — “pen”
      • doc 3 — “student”
    • when doing a search, it needs a full scan across the docs to retrieve which docs has the word.
    • bring a pen -> do a full text scan on each doc -> Doc 1, 2
  • reverse/inverted [ O(1) ]

    • we maintain a table which maps in reverse “words” -> “docs”
    • now we can just lookup for the word, and then retrieve the docs that has this
      • “pen” — doc 1, 2
      • “student” — doc 1, 3
      • “book” — doc 1
  • vector

    • a special data structure which organises high dimensional vectors for efficient similarity search
      • HNSW - hierarchical navigable small world
      • Annoy - approximate nearest neighbours
      • IVF - inverted file index
      • LSH - locality sensitive hashing
  • b-tree index

    • default index type in most of the databases (postgres, mysql, oracle)
    • search, insert, delete: O(log n)
    • range query: O(log n + k) where k = number of results

here are only a few, there are many more… but just rewinding on the basics.