Let’s understand a little bit more about how these btrees work! First,
some theory.

Normally when I think about tracking tree accesses, I think about it
in terms of “how many times you need to jump to a new node”. So in a
binary tree with depth 10, you might need to jump up to 10 times.

One of the most important things in database optimization is disk I/O.
Reading more data than you absolutely need to read is really
expensive, because seeking to a new location in a file takes a long
time. It takes way less CPU time to search through your data than
it does to read the data into memory (see:
Computers are fast!,
Latency Numbers Every Programmer Should Know).

So for a simple database scan, reading more data than you need is a
huge problem

So far we know that:

Our database is organized into 1k pages

We want to read as little of the database as possible to write a
query, and

My filesystem has a “block size” of 4k, which means that it’s
impossible to read less than 4k at a time.

We can conclude from this that we want to read as few pages as
possible. btrees are organized so that each node has lots of
children, to keep the depth small, and so that we won’t have to read
too many pages to find a row.

My 100,000 row SQLite database has a btree with depth 3, so to fetch a
node I only need to read 3 pages. If I’d used a binary tree I would
have needed to do log(100000) / log(2) = 16 seeks! That’s more than
five times as many. So these btrees seem like a pretty good idea.

The index and table btrees

So far we’ve been talking like there’s only one btree. This isn’t
actually true at all! My database has one table, and two btrees.

Each table has a btree, made up of interior and leaf nodes. Leaf nodes
contain all the data, and interior nodes point to other child nodes.

Every index for that table also has its own btree, where you can look
up which row id a column value corresponds to. This is why maintaining
lots of indexes is slow – every time you insert a row you need to
update as many trees as you have indexes.

Let’s dive into a query a bit more deeply. It turns out SQLite does a
binary search inside every page of every btree to find out what node
to go to next. I’ve printed out the indexes it tries while doing the
binary search.

Wow, that was a lot of work. There were actually two separate searches
here, in two different btrees.

Look up the rowid associated with the number 1000 in the
index btree – 99001.
(some rowid docs).

Look up 99001 in the table btree.

It’s really neat to see it doing the binary search for 99001 inside
each page! I couldn’t quite figure out how to get it to print out the
comparisons it was doing when looking up 1000 in the index btree,
because it does some weird function pointer magic to do comparisons.

Kamal is parsing SQLite databases
with the Python construct module next to me right now and it is
AMAZING. There may be future installments. Who knows!