Google Summer of Code 2010 - Project Proposal
Project |
Derby-4587- Add tools for improved analysis and understanding of query plans and execution statistics |
Student Name |
Nirmal Fernando |
IM |
nirmal070125 (Google Talk) |
Abstract
Apache Derby, an Apache DB subproject, is an open source relational database implemented entirely in Java and available under the Apache License, Version 2.0, and is a very interesting piece of software. Derby is based on Java, JDBC and SQL. Quite frequently, users of Derby have troubles in comprehending
- how their query is being translated into a query plan by the optimizer
- what is the execution-time, resource usage at the various steps of a query plan
The objective of this project is to provide visual displays to help people understand the way their query is being run.
Description
Derby is known as a system which is easy to install, deploy and maintain in production environments. Latest official release of Derby is 10.5.3.0, and was released in year 2009. Apache Derby (here after mention as Derby) has an active developers’ community behind the wall, who are dedicating, hours of their time to improve/ fix bugs that are raised by the users. The Derby developers’ community is really helpful and they are easy to work with. Users, of an open source project like Derby are the most important group that will take the software an extra mile from what it is at a particular instance. That’s why the developers’ pay a great attention for the issues or improvements that are raised by users of Derby. This project addresses such an improvement that users of Derby are expecting. Quite frequently, they have troubles in comprehending the steps followed by the optimizer when it translates a query into a query plan, and details such as time to execute, resource utilization at each step of the query plan. There are low-level features in Derby which capture this information and record it, such as logQueryPlan, and the XPLAIN tables, so this project will lead to a development of a new tool which can process the query plan and execution statistics information and present it in a more comprehensible fashion i.e. a graphical interpretation of the query plan. After went through some possible ways of implementing this tool which were suggested by my mentor for this project Mr. Bryan Pendleton, we came across the following idea which will display the query plan in a web browser.
Figure1: Process View of the tool http://img682.imageshack.us/img682/9691/derbya.png
- XSLT Style Sheet:
First I do a research on Derby to recognize all the possible instances that can be occur in a query such as name of the table, sort, hash-join etc. Next I planned to write the template, as it iteratively looks for the instance occurred at that particular step and displays the image + details (execution time, resource usage), for all the steps that were followed when executing the query. Here, image and details can be obtained through the XML file emitted by the query plan.
- Raw XML Document:
The XML document should contain a predefined set of tags for all possible instances of a query as mentioned above and each tag (Child) should contain sub-children which will describe the image + details. And this will create at the run time in order of the execution procedure.
- Next I will link Raw XML document and the XSL style sheet. This step is fairly easy; I just need to provide the reference to the XSL style sheet inside the XML document.
- After that XSLT compliant browser will transform XML document generated by query plan into a XHTML page, which will show the graphical query plan + details.
- We planned to approach the complete graphical query explainer, incrementally, starting from very basic queries.
Things Done So Far
- Checked out Derby trunk from the SVN
- Setup my development and testing environment
- Built Derby from the source and got familiar with the folder structure, functionalities etc.
Searched and went through the explain query plans of other well-known open source database management systems such as PostgreSQL-PgAdmin tool, MySQL. PgAdmin provides a good graphical explain of a query and support for many functionalities than MySQL. So, checked the possibility of using PgAdmin tool with Derby, and found that PgAdmin is not allowing, using other database management systems.
- Currently I am assigned myself to the issue Derby-4406-Wrong order when using ORDER BY on non-deterministic function [2], and hope to fix it soon.
Deliverables
- Test cases required to verify the functionality of the graphical query explain tool.
- Code of the tool that emits the data in XML tagged format.
- Code of the tool that format XML data into visual information in a browser.
- XSLT style sheet which templates the graphical output.
- Documentation of new tools and screen-shots of the graphical output for sample queries.
Development Schedule
- 10th May 2010 – Complete reading the documentation and analysing the existing Derby code base while paying higher attention on the relevant areas to the project.
- 15th May 2010 – Finalize the process view after getting the comments from the developers’ community and from my mentor and finalize the way that a user might view the graphical explainer.
- 23rd May 2010 – Recognize all the possible instances that can be occur in a query and complete developing test cases required to test the tool.
- 6th of July 2010 – Build a tool (a small Java program using standard JDBC) which can read the query execution data for a SQL statement from the Derby XPLAIN tables and emit the data in XML tagged format. This involves specifying the XML schema for the data, writing the program to produce the data, and building tests for it.
- 12th of July 2010 – Submit workings for mid-term evaluations.
- 31st of July 2010 – Build a tool to format the XML-formatted data into visual information in a browser, using an XSLT style sheet. This involves conceptualizing the visual display, designing and writing the XSLT style sheet, and building tests for it.
- 9th August 2010 – Complete testing new tools and create documentation for those and update Derby documentation.
- 16th August 2010 – Submit workings for final evaluations.
About Me
I am Nirmal Fernando, 3rd year undergraduate at Department of Computer Science and Engineering at University of Moratuwa, Sri Lanka. I am very competent at Java programming language, OOP, relational database schemas, SQL, XML, XSL and data structures and algorithms. I had gone through a course module in Database Management Systems, few months’ back that will help me to understand the underlying concepts fairly easily. I had created an extension for OpenOffice.org as my level-3-semester-I project [3].
Community Interaction
Apache Derby developers’ community is the main community behind this project and I highly appreciated comments/ ideas of the expert developers of Derby and consider those as a great opportunity to learn and contribute more and more to the improvement of Derby. I already got subscribed to the derby-dev mailing list and getting the continuous feed backs and ideas regarding the project as well as the other issues of Derby.
Resources/References
[1] Project Idea on Apache JIRA: https://issues.apache.org/jira/browse/DERBY-4587
[2] Derby-4406-Involvements: https://issues.apache.org/jira/browse/DERBY-4406
[3] Open Office.org extension: http://extensions.services.openoffice.org/en/project/MatrixManipulator
Thank you !!