Tuesday, September 25, 2012

In this post I would like to talk a little bit about page buffering. We remember that the previous posts had subjects like File System Abstration where the focus was on reading and writing blocks from or to the disk. So what is a page? A page is just a block of bytes, so it is quite similar to the block which was explained previously. But it's not only a new name for the same thing, but it differs
regarding it's use case. A block is something which is addressed
physically on disc and which accessed in an abstract way by the DBMS,
whereby a page is a block of bytes which is kept inside the main memory.
So a page is managed within the page buffer. For this purpose it has
some additional information attached

So the purpose of the buffer is to keep pages
available within the main memory. The main memory access is a lot
faster than the hard disc access. A buffer can be cold or hot. A hot
buffer contains almost every typically used page. So the target is to
reach a cache hit ratio which is more than 90%. The cache hit ratio is
the ratio between cache hits and cache misses. A cache hit means that a
required page was available in the buffer. The opposite is a cache miss,
which means that the page was gathered from disc.

You may ask why we explicitely implement a buffer by not using the
operating system one. The answer is that the characteristics of an OS
buffer are not everytime sufficient. Not every OS buffer knows e.g. of
the pin-operation. This operation allows to tell the buffer that a
specific page should not moved out of the buffer. Let's assume a nested
loop join. Which means two nested loops those are accessing pages. A column-value
of a row of a page within the buffer should be compared with other
values of other pages. It's easy to see that we need to keep the outer
page inside the buffer whereby comparing with the other ones.

A buffer has a maximum size. So it can only keep a specified number
of pages. If the buffer is full, which means that every buffer frame is
used, it then is required to replace an existing page with the requested
one. So strongly related to the buffer is the scheduler. A scheduler
decides which page should become replaced. Schedulers can be very simple
or more complex. Here some possible scheduling algorithms:

F(irst) (I)n F(irst) O(ut): Replaces just the oldest page

L(east) F(requently) U(sed): Replaces the page which was less often referenced

D(ynamic) G(erneralized) CLOCK: Goes clockwise through the frames by
counting the local references and by having also one global counter. If
the global counter reaches a limit, then a normalization happens.

jCoreDB does at first implemented LFU. Afterwards we will implement
DGCLOCK. LFU has the disadvantage that it only takes the number of
references into account but not the age of the page within the
buffer. So the most often used pages are kept inside the buffer. We can
easily see that this is not fair enough for just added pages
those have also a higher probability to be used the next time. So we also
should take the age into account to get a better scheduler. DGCLOCK
does this by combining a reference counter with an age counter.

Besides the scheduling strategy, a buffer also realizes a write strategy. The following strategies are available:

Write Through: Just write every change (Update, Insert) from the buffer directly to the underlying file system.

Write Behind: Perform the change at first in the buffer. If a page
has another state within the buffer than on disk it is marked as
'dirty'. One or multiple writers are used to afterwards write the dirty
pages to the underlying file system.

jCoreDB does at first implement Write Through, just because the
implementation effort is less than for Write Behind. Later we will
implement Write Behind. Write Behind is necessary to realize a multi
threaded buffer. It will also become important related to transaction management.

Currently the following classes were implemented in order to realize the Page Buffer requirements:

IPageBuffer: Defines a wrapper for the file system functions by providing additional buffering

IScheduler: Defines which operations a scheduler should support

Page: A page extends a block. A page has a state within the buffer (pinned, dirty,...) and a type (data, index, overflow)

SchedulerFactory: Returns a scheduler implementation based on the configuration

LFUScheduler: The scheduler which realizes the LFU algorithm

PageIndex: A hash index which maps the block id to the buffer frame index