Derby Lite

Overview

On this wiki page we can brainstorm how to add a simpler API to Derby's store. The following email thread raised this topic: http://www.nabble.com/simpler-api-to-the-Derby-store-td18137499.html#a18137499

A simple store API could be useful for applications which just need to put and get data by key value. These would be applications which don't need complex queries or SQL. However, these applications might need transactions, multi-user concurrency, and recoverability. Without the overhead of the SQL interpreter, it seems that such a kernel could potentially:

Similar Functionality

The following other products provide similar functionality:

Product

Description

License

JDBM

"JDBM is a transactional persistence engine for Java. It aims to be for Java what GDBM is for other languages (C/C++, Python, Perl, etc.): a fast, simple persistence engine. You can use it to store a mix of objects and BLOBs, and all updates are done in a transactionally safe manner. JDBM also provides scalable data structures, such as HTree and B+Tree, to support persistence of large object collections."

Currently being incubated under the Apache license.

BerkeleyDB-JE

"Oracle Berkeley DB Java Edition is a open source, embeddable, transactional storage engine written entirely in Java."

The BDB-license appears to be viral.

Perst

"A Simple, Fast, Convenient Object Oriented Java Database and .NET Database"

Dual licensed: either a viral GPL or a for-fee commercial license.

API

It would be great if we could add a lean package to Derby, which exposed a small set of concepts and then mapped between those concepts and the richer api in org.apache.derby.iapi.store.access. For starters, the lean api package could model these concepts:

Concept

Description

Database

This would be a collection of BTrees operated by a connected user. Maybe we would also need to model the concepts of session and Database factory, but it would be interesting to see how far we could get with just a simple Database container.

Types

We might be able to do without the concept of a type system. We could, for instance, decide that we will only support opaque byte arrays as the types of keys and values. Other, similar products seem to operate this way. However, it seems a shame to not expose Derby's ability to handle complicated, multi-column keys, each component of which can be sorted ascending or descending independently of the others. It ought to be cheap to maintain this differentiating feature. A first release wouldn't necessarily have to expose the full complexity of the types we support. We could, for instance, get away with just supporting two types, which correspond to VARBINARY and LONGVARBINARY. The first could be used in keys and values. The second could only be used in values. Later on, we could add other types like INT, BIGINT, DOUBLE, DECIMAL, etc..

Tuple

This would a wrapper around keys and values. Again, we could probably omit this concept if we just wanted to support byte arrays as the only type for keys and values. If we wanted more complicated keys and values, then a Tuple would be a nice abstraction. A Tuple would be a sequence of typed values.

BTree

This would be a persistent btree. It would hold keys in intermediate nodes. It would hold values in leaf nodes. You would get a BTree from a Database.

Transaction

This would be used to commit and rollback work. You would get a Transaction from a Database and you would use it in operations on BTrees.

The following concepts turn up in other, similar products but we may not need them:

Concept

Description

Comparator

This is a scheme for ordering opaque byte arrays. This helps compensate for the fact that other products don't let you split the key value into independently orderable columns. There is not much need for this if we support multi-column keys. In addition, it seems that we ought to be able to realize some performance benefits by not having to serialize and deserialize keys for use by Comparators. For truly opaque byte arrays, the lexicographic ordering of Derby VARBINARY may be good enough.

As extra credit, in a later release, or in an add-on package, we could add the following:

Additional Work

Description

Serializer

We could add machinery to help users de/serialize Java objects from/to opaque byte arrays and extract key Tuples from Java objects.

Qualifier

We could add some simple abstraction on top of the current Qualifier interface so that users could push simple qualifications into the store.

Issues

Here we track some issues which we will have to address.

Issue

Description

Payload

For performance reasons, we probably want to just store values in the BTree leaf pages. Currently, the BTree pages don't store row contents, but instead store a RowLocation pointer into a neighboring heap conglomerate.

Locks

Eliminating the heap means that we will need to re-phrase locks to operate against BTree nodes. Currently, Row-level locks are bound to heap records, not to BTree nodes.

Upgrade

We won't need most of the catalogs which support the SQL interpreter. We need to make sure that upgrade will work after we prune out the catalogs.

Sample Jar File Sizes

The 10.5 derby.jar weighs in around 2500K. By pruning out packages which I thought weren't needed by the Store, I created the following 2 jarballs:

Jarball

Description

Size

derbyMin.jar

This contains only the Store and io packages.

478K

derbyLean.jar

This contains only the Store plus some other machinery which I thought that the Store probably needed.

844K

Probably it would be tricky to build a Derby Lite which is smaller than derbyLean.jar. Shrinking Derby Lite below the size of derbyMin.jar seems like a very big project.

The following tables show the sizes of what's in derbyLean.jar and what's excluded. These are the sizes of directory subtrees after unjarring the jar file on disk. For reference, if you unjar the 10.5 derby.jar, the whole directory tree weighs in at 8900K.

Contents of derbyLean.jar

Percent of total

Size

Subtree

24%

768K

org.apache.derby.impl.store.raw

16%

524K

org.apache.derby.impl.store.access

13%

424K

org.apache.derby.iapi.types

6%

188K

org.apache.derby.iapi.store.access

5%

152K

org.apache.derby.iapi.services.io

5%

144K

org.apache.derby.iapi.store.raw

4%

124K

org.apache.derby.impl.services.monitor

3%

104K

org.apache.derby.impl.services.locks

3%

088K

3/5 OF THE LOCALIZATIONS WERE REMOVED FROM org.apache.derby.loc

2%

072K

org.apache.derby.iapi.util

2%

064K

org.apache.derby.impl.services.cache

2%

056K

org.apache.derby.impl.io

1%

040K

org.apache.derby.iapi.error

1%

036K

org.apache.derby.impl.db

1%

036K

org.apache.derby.iapi.services.context

1%

028K

org.apache.derby.iapi.services.locks

1%

028K

org.apache.derby.iapi.services.cache

1%

024K

org.apache.derby.impl.services.jce

1%

024K

org.apache.derby.iapi.services.property

1%

024K

org.apache.derby.iapi.services.monitor

1%

020K

org.apache.derby.iapi.tools

1%

016K

org.apache.derby.impl.services.stream

1%

016K

org.apache.derby.impl.services.daemon

1%

016K

org.apache.derby.iapi.services.io

1%

016K

org.apache.derby.iapi.services.info

1%

016K

org.apache.derby.iapi.services.i18n

0%

012K

org.apache.derby.impl.services.uuid

0%

012K

org.apache.derby.iapi.services.stream

0%

012K

org.apache.derby.iapi.services.diag

0%

012K

org.apache.derby.iapi.services.daemon

0%

012K

org.apache.derby.iapi.services.crypto

0%

004K

org.apache.derby.tools

0%

004K

org.apache.derby.shared

0%

004K

org.apache.derby.osgi

0%

004K

org.apache.derby.impl.services.timer

0%

004K

org.apache.derby.impl.services.jmxnone

0%

004K

org.apache.derby.iapi.services.uuid

0%

004K

org.apache.derby.iapi.services.timer

0%

004K

org.apache.derby.iapi.services.memory

0%

004K

org.apache.derby.iapi.services.jmx

0%

004K

org.apache.derby.iapi.services.info

0%

004K

org.apache.derby.database

Removed from derby.jar to make derbyLean.jar

Size

Subtree

3600K

org.apache.derby.impl.sql

0540K

org.apache.derby.iapi.sql

0512K

org.apache.derby.impl.jdbc

0220K

(Mostly removed, see above) org.apache.derby.loc

0136K

org.apache.derby.jdbc

0136K

org.apache.derby.iapi.jdbc

0132K

org.apache.derby.catalog

0100K

org.apache.derby.impl.store.replication

0084K

org.apache.derby.iapi.services.classfile

0072K

org.apache.derby.impl.services.bytecode

0064K

org.apache.derby.impl.services.reflect

0048K

org.apache.derby.vti

0048K

org.apache.derby.impl.tools

0044K

org.apache.derby.iapi.services.loader

0044K

org.apache.derby.diag

0036K

org.apache.derby.iapi.db

0024K

org.apache.derby.impl.services.jmx

0020K

org.apache.derby.security

0016K

org.apache.derby.iapi.store.replication

0016K

org.apache.derby.mbeans

0016K

org.apache.derby.iapi.services.compiler

0012K

org.apache.derby.iapi.security

0008K

org.apache.derby.authentication

DerbyLite (last edited 2009-09-20 22:11:12 by localhost)