LRU-based Cache Manager
Derby currently uses a clock algorithm to determine which disk pages to replace in its page cache. We have indications that this algorithm is far from optimal for some type of loads. For example, we have observed much disk activity on tables that ideally should be entirely cached in memory. The clock algorithm is ideal for handling virtual memory in an OS since there will be very little overhead per memory access. For a database system, however, more advanced algorithms may be used since the amount of work associated with each page access will be much larger. Many database systems use variants of an LRU-based algorithm, where the pages that are least recently used will be picked as candidates for replacement.
This project will involve:
- Implementation of an LRU-based cache manager
- Implementation of unit tests for the new cache manager
- Study the new cache manager under various loads and compare its behavior to the old clock-based cache manager
If time allows, consider how to assign different priorities to different types of pages (e.g, higher priority for index pages, lower priority for pages accessed by a sequential scan1)
With a pure LRU strategy, a single sequential table scan may replace all pages used by concurrent transactions. In many cases, these pages will not be accessed again for a long time. Hence, this approach is not optimal. (1)