Differences between revisions 2 and 3
Revision 2 as of 2006-05-11 22:14:10
Size: 4480
Comment:
Revision 3 as of 2009-09-20 23:36:33
Size: 4480
Editor: localhost
Comment: converted to 1.6 markup
No differences found!

Google Summer of Code 2006 : Proposal

Project Title

Implement an LRU-based Cache Manager for Derby

Student

Maryam Moazeni

Email

mmoazeni@gmail.com

Abstract

Caching in Database Systems is used to reduce database accesses. Most of Database Systems employ LRU-based algorithm to provide high performance data caching. However, Apache Derby uses a clock algorithm which seems to be far away from optimum. This project is aimed to implement an LRU-based Cache Manager for Apache Derby.

Project overview

There are various projects and products that are using or supporting Apache Derby. Therefore, a step toward Derby’s optimization is a great help to the community; cache management optimization could be a big step towards this optimization. LRU-based algorithm which is used by many database systems works based on replacing the page in memory that has not been referenced for the longest time. By the principle of locality, this should be the page least likely to be referenced in the near future. The problem with this approach is the difficulty in implementation.

One approach would be to tag each page with the time of its last reference; this would have to be done at each memory reference. Another approach would be to maintain a page reference stack. The two proposed approaches should be studied and compared for potential overhead and expenses.

Project plan

This project can be broken up into five different phases:

Phase 1: Initial planning and design

During this phase I hope to comprehend the concepts of cache management. A review over the implementation of the current Derby’s cache manager would be helpful as well. This should effectively lead me into determining how to implement the LRU-based algorithm.

Deliverable(s) – Initial Prototype LRU-Based Cache Manager

Estimated completion – June 15th

Phase 2: Initial Implementation

Based on the approaches introduced in phase one, the algorithm would be implemented. A prototype and documentation for the mid-term evaluation would also be prepared during this phase.

Deliverable(s) - Prototype and Documentation for mid-term evaluation

Estimated completion – June 30th

Phase 3: Secondary Implementation Based on Comments

Based on the feedback from the mentors, the Derby developers and users, required changes will be done.

Deliverable(s) – Secondary Prototype of LRU-Based Cache Manager

Estimated completion - July 20th

Phase 4: Implementation of Unit Tests

Unit Tests will be implemented in this phase.

Deliverable(s) – Unit Tests for the LRU-Based Cache Manager

Estimated completion – July 31st

Phase 5: Study the Performance of the Algorithm

In this phase, I will study the new cache manager under various loads and compare its behavior to the old clock-based cache manager.

Deliverable(s) – Unit Tests for the LRU-Based Cache Manager

Estimated completion – August 5th

Phase 6: Feature Addition

Based on the feedback from the mentors, the Derby developers and users required new features will be added. Considering how to assign different priorities to different types of pages would be in mind in this phase.

Deliverable(s) - Almost Production Quality Code

Estimated completion – August 15th

Phase 7: Final Code Touch Ups and Documentation

During this phase the code would be integrated. Documentation would also be completed for the final evaluation.

Deliverable(s) - Final Product and Documentation

Estimated completion – August 21st

Biography

I am a graduate student at University of California, Los Angeles (UCLA), Department of Computer Science. I recently worked with one of the Apache products that use the Apache Derby. I realized how easy it is to work with Derby unlike other databases and what rich features it got. Therefore, I decided to be a contributor to Derby Project. The reason that Derby interested me is because it is written in Java and one can embed it within a Java application and deploy it anywhere as a standalone package. Memory Management especially Cache Management is an interesting topic; it has a tremendous impact on the performance of a product involving large amount of data. Therefore, completing this project gives very good edge for Apache Derby and that is why I chose it.

MaryamMoazeni/Derby/proposal (last edited 2009-09-20 23:36:33 by localhost)