This is a [http://code.google.com/summerofcode.htmlGoogle Summer of Code] project proposal. The goal of the project is to implement, test and integrate a cache manager for [http://db.apache.org/derby/Apache Derby] which uses the Least Recently Used principle when evicting data blocks from the cache.

This is a [[http://code.google.com/summerofcode.html|Google Summer of Code]] project proposal. The goal of the project is to implement, test and integrate a cache manager for [[http://db.apache.org/derby/|Apache Derby]] which uses the Least Recently Used principle when evicting data blocks from the cache.

[http://db.apache.org/DerbyDerby] is a full-feature relational database management system implemented in Java. Currently its cache manager uses a clock-based algorithm when evicting pages from the cache. The primary goal of this project is to build a cache manager using a LRU eviction policy which can be dropped into Derby instead of it's existing clock cache manager. [http://wiki.apache.org/db-derby/LRUCacheThis] page has more information from the mentor.

[[http://db.apache.org/Derby|Derby]] is a full-feature relational database management system implemented in Java. Currently its cache manager uses a clock-based algorithm when evicting pages from the cache. The primary goal of this project is to build a cache manager using a LRU eviction policy which can be dropped into Derby instead of it's existing clock cache manager. [[http://wiki.apache.org/db-derby/LRUCache|This]] page has more information from the mentor.

* Instead of making a home-made extension to LRU with priorities, a buffer management algorithm which is resistant to scanning behavior, such as [http://www.informatik.uni-trier.de/~ley/db/conf/vldb/vldb94-439.html"2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm."] could be investigated. It doesn't seem to be overly complex or expensive, but is specifically designed for buffer management, and may not be the right choice for other caches such as the statement cache.

* Instead of making a home-made extension to LRU with priorities, a buffer management algorithm which is resistant to scanning behavior, such as [[http://www.informatik.uni-trier.de/~ley/db/conf/vldb/vldb94-439.html|"2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm."]] could be investigated. It doesn't seem to be overly complex or expensive, but is specifically designed for buffer management, and may not be the right choice for other caches such as the statement cache.

* 24 year old M.Sc student at the [http://www.idi.ntnu.no/english/department of Computer and Information Science] at the [http://www.ntnu.no/indexe.phpNorwegian University of Science and Technology], in fourth year of a five-year program.

* 24 year old M.Sc student at the [[http://www.idi.ntnu.no/english/|department of Computer and Information Science]] at the [[http://www.ntnu.no/indexe.php|Norwegian University of Science and Technology]], in fourth year of a five-year program.

Implementation of a LRU cache manager for Apache Derby

This is a Google Summer of Code project proposal. The goal of the project is to implement, test and integrate a cache manager for Apache Derby which uses the Least Recently Used principle when evicting data blocks from the cache.

Name

Anders Morken

Email

Project Title

derby-cache-manager

Synopsis

Derby is a full-feature relational database management system implemented in Java. Currently its cache manager uses a clock-based algorithm when evicting pages from the cache. The primary goal of this project is to build a cache manager using a LRU eviction policy which can be dropped into Derby instead of it's existing clock cache manager. This page has more information from the mentor.

Benefits

Possible performance improvements - the clock algorithm is more commonly used in virtual memory subsystems in operating systems due to its low per-access overhead. However, most database systems use LRU-based cache managers because they provide a more intelligent page eviction choice at a cost which isn't unreasonable given the already high cost of page access. A successful LRU cache implementation which correctly handles sequential scans might prove better than the clock algorithm.

Demonstration of Derby's modular architecture - Derby is designed and implemented as a set of modules with defined interfaces. An internal API interface can have multiple implementations, and Derby's Monitor subsystem can choose between them at Derby boot time.

Configurability - some workloads may be suitable for a LRU cache, others for a clock cache. Derby's choice of a cache manager could be configurable if this is desired.

Plan of approach

As there is a known good Clock cache manager in Derby already, work on the LRU cache manager will start from that. Clock's bookkeeping and eviction logic will be ripped out and replaced with the needed logic for a LRU cache.

Specifically, instead of a "recently used" flag which is cleared by a periodic walk through the cache, the cache items will maintain a linked list of themselves in "order of last use".

Initially, the implementation will be rather naive. No attempts at identifying special access patterns (such as sequential scans) will be made. As Derby's cache manager is used by more than just the page cache, access pattern detection might or might not be desirable.

A look at other database systems' buffer management algorithms could yield useful insight into what direction Derby should go.

Code in Derby that performs sequential scans will be identified and investigated, and a decision on how such scans should be handled will be made with assistance from the mentor and other project participants. Possibilities include bypassing the cache entirely on misses, or extending the cache API to allow the cache user to specify a lower or higher priority for different types of data. Making the cache manager itself identify seq. scans is probably difficult and error-prone, and we already have perfect knowledge of that in Derby code outside the cache manager.

Instead of making a home-made extension to LRU with priorities, a buffer management algorithm which is resistant to scanning behavior, such as "2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm." could be investigated. It doesn't seem to be overly complex or expensive, but is specifically designed for buffer management, and may not be the right choice for other caches such as the statement cache.

Schedule

In light of the constraints detailed below, this is a tentative schedule.

May 23rd: Tentative project start if accepted by the Apache and the Summer of Code program.

June 5th: Initial delivery of preliminary patch with a simple LRU-ified version of existing cache manager, for review. derbyall test suite should pass using this cache. Deadline probably achievable if work is started ahead of SoC official start.

June 26th: Delivery of improved LRU cache patch, closer to production quality. Should be able to replace existing cache manager and pass derbyall test suite with flying colours.