bf-tree

  ·  1 min read

Bf-tree

decouple cache pages from disk pages, it no longer has to mirror disk 1:1

lets understand a little deeper:

  1. problem with B-tree, data lives in fixed size pages (4kb), buffer pool caches whole pages in RAM.
  2. to update a record -> read full page -> modify a few bytes -> write back 4kb
  3. but, cache doesnt require to mirror 1:1 disk so, in this paper they’ve introduced - “mini pages”

mini pages - a variable length in-mem fragments, so you dont need to hold the full 4kb page in buffer.

mini pages

so instead of:

disk page ↔ identical in-memory page

we have:

disk page + 0/1 mini page (hot subset + unflushed changes) and buffer pool decides how big mini page is.

they have also introduced, circular buffer in RAM

  1. fixed total size (N bytes)
  2. tail pointer - where new mini page allocations are appended
  3. head pointer - where evicts starts
  4. when full, it evicts mini pages near head pointer and updates back to disk to free space.

so Bf-tree is basically trying to dominate the “triangle” of read, write, range-scan, it hits a sweet spot where both B-tree and LSM are compromises.

bf-tree