Tiago Espinha - Google Summer of Code Projects

Project

DERBY-728 - Unable to create databases whose name containg Chinese characters through the client driver

http://www.tiagoespinha.net/wp-content/uploads/2008/08/fb_small.jpg

Student

Tiago A. R. Espinha

Mentor

Kathey Marsden

E-mail

tiago@espinhas.net (personal e-mail)

tiago.derby@yahoo.co.uk (Derby matters)

IM

tiago@espinhas.net (Google Talk/MSN) | etiago (IRC)

Google Summer of Code 2010

Abstract

Apache Derby relies on the open standard Distributed Relational Database Architecture (DRDA) to implement the abstraction between SQL and a standard DRDA language. Its implementation on Derby is currently limited to ASCII characters.

There is an actual and current need to support Japanese and Chinese characters as requested by the community. My task will be to refactor and improve the code so that these characters are supported by the DRDA engine on Derby.

Application

Apache Derby is a RDBMS entirely developed using Java. Due to its small footprint and also due to its flexibility and ability to be deployed in a multitude of environments, Derby is one of the best open source alternatives currently out there. The fact that it is the foundation for Sun’s Java DB clearly demonstrates this. As a passionate software developer with a strong educational background on database administration, Derby called my attention.

Last year I participated and successfully completed a GSoC project with the Apache Software Foundation. It was an incredible and enriching experience that allowed me to connect with the community and have a better grasp of how it functions. My project consisted of creating unit tests (converting them into a new standard) and help with the always on-going bug fixing. From a technical point of view, this experience taught me a lot.

These past contributions to Derby are still available at [1].

I believe that my previous experience with Derby will help me succeed again this year. My mentor (Kathey Marsden) offered to mentor DERBY-728 [2] and suggested that I apply to this project: rolling in support for Chinese characters through the client driver [2][3].

Right now when the client driver is used, the requests are piped through the DRDA engine. In Derby’s implementation of DRDA, the requests are encoded using EBCDIC [4] and this encoding uses an 8-bit representation which limits the number of characters it can represent by 256. This limitation is fine when it comes to US-ASCII characters (a sub-set of EBCDIC) but it does not encode the thousands of Chinese and Japanese characters. For this, we require a broader encoding such as UTF-8. Since backwards compatibility is always an issue, we must also ensure that not only the new character encoding is put into place, but that the older encoding types are still supported.

There is currently an Architecture Change Request (ACR7007) [5] with The Open Group undergoing fast track review to make this change an actual component of the DRDA specification. This ACR proposes that an encoding is agreed at the EXCSAT stage between the Application Requester (AR) and Application Server (AS). This encoding can then be the default EBCDIC or UTF-8 for the added range of characters. It is this encoding that is then used for commands following the ACCSEC (which is still negotiated using the normal EBCDIC encoding).

In the meanwhile, I have also setup my build environment and I have also taken on a smaller task [6] that will help me build up to the main one. According to my mentor, this project would ideally be undertaken by someone with previous experience in contributing to Derby and as such, I qualify for the task. Also, if my project turns out to be ahead of schedule and I finish early, I will also continue my last year project by assisting with the other issues.

I am still passionate about developing software and as a fresh graduate I can also use all the experience I can get. This program provides students with that experience and I am thrilled to be a part of it again.

At this point I am preparing to write the dissertation for my Master’s degree in Advanced Software Engineering and I have no other time-consuming commitments for the duration of the GSoC program. I am eager to take on this project and I look forward to work once more with the Apache Derby community.

Deliverables

Schedule

Section I - Problem Analysis

Section II - Implementation

Section III - Testing and deployment

References

[1] http://bit.ly/TiagoASF

[2] DERBY-728 (“Unable to create databases whose name containg Chinese characters through the client driver”)

[3] DERBY-4009 (“Accommodate length delimited DRDA strings where character length does not equal byte length”)

[4] EBCDIC Table

[5] ACR7007

[6] DERBY-4584 (“Unable to connect to network server if client thread name has Japanese characters”)

Google Summer of Code 2009

DerbyTestAndFix

Student: Tiago Espinha

Mentor: Kathey Marsden


DERBY-3656

ERROR XJ073: The data in this BLOB or CLOB is no longer available. should include the possibility that the lob has been freed

Done

DERBY-3839

Convert "org.apache.derbyTesting.functionTests.tests.store.holdCursorJDBC30.sql" to junit.

Done

DERBY-3842

Convert "org.apache.derbyTesting.functionTests.tests.store.holdCursorExternalSortJDBC30.sql" to junit.

Done

DERBY-4051

The javadoc for SpaceTable refers to an alias that doesn't seem to work

Done

DERBY-4090

Provide the ability to run tests concurrently on the same machine

In progress

DERBY-4223

Provide the ability to use properties with ij.runScript()

Done

DERBY-4217

Make the default port for the suites.All run configurable with a system property

Done

DERBY-3834

Convert derbynet/runtimeinfo to JUnit

Waiting commit

DERBY-4192

OFFSET and FETCH FIRST documentation improvement

Done

Side-work for the community:


All my JIRA issues