Google Summer of Code 2006 : Proposal
Implement an LRU-based Cache Manager for Derby
Caching in Database Systems is used to reduce database accesses. Most of Database Systems employ LRU-based algorithm to provide high performance data caching. However, Apache Derby uses a clock algorithm which seems to be far away from optimum. This project is aimed to implement an LRU-based Cache Manager for Apache Derby.
There are various projects and products that are using or supporting Apache Derby. Therefore, a step toward Derby’s optimization is a great help to the community; cache management optimization could be a big step towards this optimization. LRU-based algorithm which is used by many database systems works based on replacing the page in memory that has not been referenced for the longest time. By the principle of locality, this should be the page least likely to be referenced in the near future. The problem with this approach is the difficulty in implementation.
One approach would be to tag each page with the time of its last reference; this would have to be done at each memory reference. Another approach would be to maintain a page reference stack. The two proposed approaches should be studied and compared for potential overhead and expenses.
This project can be broken up into five different phases:
Phase 1: Initial planning and design
During this phase I hope to comprehend the concepts of cache management. A review over the implementation of the current Derby’s cache manager would be helpful as well. This should effectively lead me into determining how to implement the LRU-based algorithm.
Deliverable(s) – Initial Prototype LRU-Based Cache Manager Estimated completion – June 15th
Phase 2: Initial Implementation
Based on the approaches introduced in phase one, the algorithm would be implemented. A prototype and documentation for the mid-term evaluation would also be prepared during this phase.
Deliverable(s) - Prototype and Documentation for mid-term evaluation Estimated completion – June 30th
Phase 3: Secondary Implementation Based on Comments
Based on the feedback from the mentors, the Derby developers and users, required changes will be done.
Deliverable(s) – Secondary Prototype of LRU-Based Cache Manager Estimated completion - July 20th
Phase 4: Implementation of Unit Tests
Unit Tests will be implemented in this phase.
Deliverable(s) – Unit Tests for the LRU-Based Cache Manager Estimated completion – July 31st
Phase 5: Study the Performance of the Algorithm
In this phase, I will study the new cache manager under various loads and compare its behavior to the old clock-based cache manager.
Deliverable(s) – Unit Tests for the LRU-Based Cache Manager Estimated completion – August 5th
Phase 6: Feature Addition
Based on the feedback from the mentors, the Derby developers and users required new features will be added. Considering how to assign different priorities to different types of pages would be in mind in this phase.
Deliverable(s) - Almost Production Quality Code Estimated completion – August 15th
Phase 7: Final Code Touch Ups and Documentation
During this phase the code would be integrated. Documentation would also be completed for the final evaluation.
Deliverable(s) - Final Product and Documentation Estimated completion – August 21st
I am a graduate student at University of California, Los Angeles (UCLA), Department of Computer Science. I recently worked with one of the Apache products that use the Apache Derby. I realized how easy it is to work with Derby unlike other databases and what rich features it got. Therefore, I decided to be a contributor to Derby Project. The reason that Derby interested me is because it is written in Java and one can embed it within a Java application and deploy it anywhere as a standalone package. Memory Management especially Cache Management is an interesting topic; it has a tremendous impact on the performance of a product involving large amount of data. Therefore, completing this project gives very good edge for Apache Derby and that is why I chose it.