Google Summer of Code 2006 – Project Proposal

Subject ID

derby-cache-manager http://wiki.apache.org/db-derby/LRUCache

Title

Implement an LRU-based cache manager for Derby

Author

Karthik Thyagaraja

Mentor

Øystein Grøvlen

Name

Email

IM

karthik_vnit (yahoo)

Mentor

Øystein Grøvlen (oystein.grovlen at sun dot com)

Motivation

A good page replacement algorithm is very important for a Database Cache Manager.It decreases the response time & increases the speed with which results are delivered.The LRU based cache manager will try to replace the currently used Clock based Cache Manager in Derby.The clock algorithm has sub-optimal performance in many cases.The new cache manager aims to give a good performance(almost optimal) under most of the situations.Having a good buffer manger will lower the number of Disk I/O's & thus lead to improvement in the overall system performance.This would be a valuable addition to the Apache Derby Database.

Project Description

An important part of any database system is the buffer/cache manager.It is the interface between main memory and disk.The database buffer is divided into equal sizedpages so that data can be exchanged between the disk and the main memory.Typically the page size & the buffer size are parameters in the catalog that remain fixed once the database starts running.To change this parameter the catalog has to be updated and the database has to be restarted.Since a physical access to a page on disk is more expensive than an access to a page in the cache, the aim of the database buffer manager is to minimize the number of Disk I/O's incurred. When a page is requested from the buffer the following things have to be done:

  1. Search if the page requested already exists in the buffer
  2. If it exists, then service the request from this page.
  3. If it doesn't exist(This is called a page fault),then fetch it from the disk & add it into the buffer.If the buffer is completely filled then some of the pages have to be replaced in order to accommodate the new request.The decision of:Which page to be replaced? is taken by the page replacement algorithm.

So page replacement algorithm plays a crucial role in the performance of the cache manager.An ideal replacement algorithm has least number of page-faults.The various types of page replacement algorithms are: NRU(Not-Recently-Used), FIFO(First-in-First-Out), Clock Algorithm, LRU(Least Recently Used). Currently Derby uses Clock algorithm for its Cache Manger.Clock Algorithms are more suitable for handling virtual memory in operating systems than for database buffers because in a database cache the amount of work associated with each page access is much larger.Clock algorithms don't perform optimally in many situations.LRU(Least recently algorithm) policy has been proven to be a better algorithm for page replacement.It has been widely used & studied and is amenable to statistical analysis.It has been proved that LRU can never result in more than N-times more page faults than theoretically optimal page replacement algorithm ( N is proportional to the number of pages in the buffer).

Naive LRU Replacement Algorithm: It works on the principle of locality.In this approach each page is assigned a time stamp or some other time based ID.On a page fault the page which hasn't been accessed for the longest time is replaced.Some drawbacks of this naive method is that :It is unable to differentiate between pages that have relatively frequent reference and pages that have very infrequent reference.Once a page has been read into the buffer it stays for a long time in the buffer even if it isn't referenced anymore. LRU-K algorithm: It is an enhancement to the LRU algorithm.It takes into account the access frequency & history of each page in order to make better decisions while replacing a page. Advantages of this policy are:

  1. It better distinguishes between page sets with different levels of reference frequency
  2. It adapts well to the changing access patterns
  3. It is simple and doesn't need lot of bookkeeping
  4. "It detects locality of reference within query executions,across multiple queries in the same transaction, and also locality across multiple transactions in a multiuser environment."(Elizabeth et.al.)

This technique is discussed in detail in this paper: http://citeseer.ist.psu.edu/16869.html

Issues to be considered while designing:

In this project i aim to:

Some implementation details

The project will be implemented in Java. The cache manager can be implemented as a doubly-linked list of memory pages.We will have two pools:An Used and an Unused pool of pages.This way it is easy to service any request for a page.

An AVL Tree can be used to store the details of each page like it's timestamp,reference to the page etc.An AVL tree is more suitable because it has lookup, insertion, and deletion operations of O(log n) in both the average and worst cases.

Project Timeline and Deliverables

-May 23-25

-May 25-27

-May 29

-May 30-June 30

-June 30-July 10

-July 10-July20

-July 20 onwards

-Documentation -August 19th

Bio/About Me

I am a second year Masters student(MS in Computer Science and Engineering) at Univeristy of Florida(UFL),Gainesville,USA.I had taken a Database Implementation course(Grad Level Course) in which i learnt in detail the workings of a typical relational database system.I have also designed and implemented a single user relational DBMS.I single handedly coded about 11K lines of C++ code for the DBMS.The components of the DBMS are Buffer Manager,Sorted & B-tree file organizations,Query Compiler,Query Optimizer & Query Execution engine.It supports Selection,Projection,Join,Multiply,Remove Duplicates and Aggregate Functions.

Some significant highlights of the project were -

I have about 3 years of good Java experience.I have coded most of my academic projects in Java.Some big projects that i have built using Java are:

I am very excited about this project.Having already built a single user DBMS and having implemented a buffer manager that makes use of LRU algorithm, i am very confident that i will be able to do an excellent job on this project.A chance to contribute to the Open Source community makes me feel elated.

I am just taking 3 research credits this semester.So i will be free most of this summer.

References