MySQL to Apache Derby Migration Tool

Ramin Moazeni

Version 1.2

Introduction

When you work with databases, you will often need to copy data from one database to another. The Apache Derby database currently is lacking a migration tool that that enables you to quickly migrate popular databases and data to Apache Derby database. Using a Graphical User Interface (GUI), the Derby migration tool will assist in successfully migrating databases. The use of Derby migration tool will assist in migration of databases including schema objects such as tables, indexes, views, and stored procedures to Apache Derby database. The advantage of using the migration tool will be Cost Saving, improved productivity, and reduced risks.

I propose to implement the migration tool that will support migration of tables, views, indexes, and stored procedures. The steps required to migrate from MySQL to Apache Derby generally involves:

This paper serves as a proposal for implementing the MySQL to Derby migration tool. The implementation language for this project is Java since it can be run on different platforms.

The Migration Process

The MySQL to Apache Derby Migration tool is a framework that enables migration and conversion of data from MySQL database to Apache Derby database. The system consists of several components responsible for capturing source database schema, database object mapping, datatype mapping and capturing source data to migrate a MySQL database to Apache Derby.

This following diagram describes processes used by the system to implement its capabilities. It explains how the system interacts with its actors (i.e users and databases) and the specific steps and interactions will the users have with the system.

http://www-scf.usc.edu/~moazeni/GSoC/mysql_7.jpg

The above processes, tasks and interactions are summarized as the following use cases:

The Migration tool will be designed to migrate the following objects:

The Migration tool will not handle migration of the following objects:

High Level Design

The system will be implemented in Java language using Java classes and interfaces. The following class diagram exhibits the relationship among classes and interfaces of the MySQL to Derby Migration tool:

http://www-scf.usc.edu/~moazeni/GSoC/mysql_2.gif

Approach 1: Using DatabaseMetaData

Schema Migration

A schema is a collection of components and database objects under the control of a given database user. The task for schema migration is to capture the schema and its components and database objects and then proceed with migration.

To migrate a database you must obtain the catalog information about the database as well as list of the tables in the database and a list of all columns within each table. That information lets you create the SQL statements necessary to migrate the data. To do this, Java Database Connectivity (JDBC) can be used. Most databases provide a system table that will give you a list of the tables. However because the name of the system table varies from one database to another, the most compatible way to list and capture information about database objects is to use the Java class DatabaseMetaData. This class provides information about the database, such as the names of the tables and column information, names of views, procedure information, etc.

The DatabaseMetaData class has many properties and methods. For example, for obtaining the table information, the getTables() method can be used to return the list of tables in ResultSet format. After obtaining the list of tables, a list of the columns in that table should be obtained. Doing that lets you create a proper CREATE TABLE statement, as well as INSERT and SELECT statements to match the current table. The process of obtaining a list of columns is through the use of getColumns() method. After capturing the database catalog information (tables and columns), the required SQL statements can be generated. The same process can be used for capturing and recreating views, indexes, and procedures except that a different method of DatabaseMetaData may be used to capture the information.

Any data type mappings needs to be done when re-creating the SQL statements as per table 1. For example the following SQL statement from MySQL:

Considering the syntax of creating a view, we can see that having a query is necessary to create a view. Therefore, in addition to data types that needs to be mapped, join statements also need to be considered when creating views. The problem is that some MySQL join syntax are not supported by Apache Derby. The following are examples of join operations and their compatible syntax in Apache Derby:

CROSS JOIN

Derby

SELECT * FROM tab1 CROSS JOIN tab2

MySQL

SELECT * FROM tab1, tab2

INNER JOIN

Derby

SELECT * FROM tab1 INNER JOIN tab2 USING(id)

MySQL

SELECT * FROM tab1 INNER JOIN tab2 ON tab1.id=tab2.id

LEFT OUTER JOIN

Derby

SELECT * FROM tab1 LEFT OUTER JOIN tab2 USING (id)

MySQL

SELECT * FROM tab1 LEFT OUTER JOIN tab2 ON tab1.id = tab2.id

RIGHT OUTER JOIN

Derby

SELECT * FROM tab1 RIGHT OUTER JOIN tab2 USING (id)

MySQL

SELECT * FROM tab1 RIGHT OUTER JOIN tab2 ON tab1.id = tab2.id

In addition to join syntax, the reference manuals of both MySQL and Derby needs to be consulted for proper syntax during migration to Apache Derby.

Data Migration

In this section, the process of data migration from MySQL to Apache Derby database will be discussed. However, before doing this, there is a need to find out the differences between the MySQL and Apache Derby structure. The table below shows the compatibility between built in data types. Some data types have a one to one correspondence while others require more consideration before migrating. The tables include information on the following:

Numeric Types:

http://www-scf.usc.edu/~moazeni/GSoC/numeric.jpg

Date and Time Types:

http://www-scf.usc.edu/~moazeni/GSoC/datetime.jpg

String Types:

http://www-scf.usc.edu/~moazeni/GSoC/string.jpg

Table 1: Data type Mapping from MySQL to Apache Derby

To manipulate and migrate the data, Apache Derby supports INSERT, UPDATE and DELETE statements with similar syntax and behavior as MySQL. These statements can be used programmatically to migrate the data

This step is usually done after schema migration most importantly when tables are migrated. To do that, both SELECT and INSERT statements must be created. The SELECT statement will read the table data from the source database. The INSERT statement will write the data to the target database.

The task involves retrieving columns information from source database and re-creates the SELECT and INSERT statements. The SELECT statement can then be executed and a list of records will be returned. Using the returned list of records, the INSERT statement can be created by inserting the individual column data and then executing the INSERT statement. Any data type mappings needs to be done when re-creating the INSERT statement as per the above tables.

Approach 2: Using DdlUtils

Another approach would be to use DdlUtils. DdlUtils is a component for working with Database Definition (DDL) files. DDL files are XML files that contain the definition of a database schema, e.g. tables and columns. These files can also be fed into DdlUtils in order to create the corresponding database. An example of such a DDL file is:

{{{ <?xml version="1.0"?> <!DOCTYPE database SYSTEM "http://db.apache.org/torque/dtd/database.dtd">

</database> }}}

The DdlUtils tool handles datatype mappings using JDBC datatypes rather than SQL datatypes which are database dependent. Looking at http://db.apache.org/ddlutils/database-support.html, we can see the datatype mapping between JDBC types and SQL types. For example, a TIINYINT datatype from MySQL, will be mapped to SMALLINT in Derby.

Please note that the DdlUtils doesn’t support views, constraints (CHECK constraint), triggers, stored procedures, etc. A possible solution would be to execute database-specific commands or to read source database system tables to capture required information. As an example, MySQL provide the following SHOW commands to capture the exact string that was used to create the original object:

SHOW CREATE TABLE

SHOW CREATE VIEW

SHOW DATABASES

SHOW TRIGGERS

SHOW CREATE PROCEDURE

SHOW CREATE FUNCTION

Recommended Approach

I suggest that the DatabaseMetaData approach is used for implemntation of the MySQL to Derby Migration tool for migration of view, constraints, and stored procedures and use the DdlUtils component for migration of database tables.

Performance

In order to improve performance, the batch update facility can be used to allow multiple update operations to be submitted to a data source for processing at once. The Statement object can keep track of a list of commands that can be submitted together for execution. The Statement.addBatch method can be used to add commands to a statement’s batch by providing it with the SQL command that needs to be added.

Submitting multiple updates together, instead of single updates, can greatly improve performance.

Internationalization

Initial Prototypes

The migration tool will provide both graphical user interface as well as command line interface. I propose the following way of performing activities using the Graphical User Interface that the system provides:

Welcome Screen:

http://www-scf.usc.edu/~moazeni/GSoC/mysql_1.jpg

Selecting Source and Target Database:

http://www-scf.usc.edu/~moazeni/GSoC/mysql_2.jpg

Source Database Schema Selection

http://www-scf.usc.edu/~moazeni/GSoC/mysql_3.jpg

Select Objects to be Migrated:

http://www-scf.usc.edu/~moazeni/GSoC/mysql_4.jpg

Object Mapping Errors

http://www-scf.usc.edu/~moazeni/GSoC/mysql_5.jpg

Migration Complete

http://www-scf.usc.edu/~moazeni/GSoC/mysql_8.jpg

MysqlDerbyMigration/DesignDocument (last edited 2009-09-20 22:11:52 by localhost)