Derby Lite
Contents
Overview
On this wiki page we can brainstorm how to add a simpler API to Derby's store. The following email thread raised this topic: http://www.nabble.com/simpler-api-to-the-Derby-store-td18137499.html#a18137499
A simple store API could be useful for applications which just need to put and get data by key value. These would be applications which don't need complex queries or SQL. However, these applications might need transactions, multi-user concurrency, and recoverability. Without the overhead of the SQL interpreter, it seems that such a kernel could potentially:
- Consume fewer run-time resources, occupy a smaller jarball footprint, and so be a better fit for resource-constrained devices.
- Be faster.
- Be simpler to use.
Similar Functionality
The following other products provide similar functionality:
Product |
Description |
License |
"JDBM is a transactional persistence engine for Java. It aims to be for Java what GDBM is for other languages (C/C++, Python, Perl, etc.): a fast, simple persistence engine. You can use it to store a mix of objects and BLOBs, and all updates are done in a transactionally safe manner. JDBM also provides scalable data structures, such as HTree and B+Tree, to support persistence of large object collections." |
Currently being incubated under the Apache license. |
|
"Oracle Berkeley DB Java Edition is a open source, embeddable, transactional storage engine written entirely in Java." |
The BDB-license appears to be viral. |
|
"A Simple, Fast, Convenient Object Oriented Java Database and .NET Database" |
Dual licensed: either a viral GPL or a for-fee commercial license. |
API
It would be great if we could add a lean package to Derby, which exposed a small set of concepts and then mapped between those concepts and the richer api in org.apache.derby.iapi.store.access. For starters, the lean api package could model these concepts:
Concept |
Description |
Database |
This would be a collection of BTrees operated by a connected user. Maybe we would also need to model the concepts of session and Database factory, but it would be interesting to see how far we could get with just a simple Database container. |
Types |
We might be able to do without the concept of a type system. We could, for instance, decide that we will only support opaque byte arrays as the types of keys and values. Other, similar products seem to operate this way. However, it seems a shame to not expose Derby's ability to handle complicated, multi-column keys, each component of which can be sorted ascending or descending independently of the others. It ought to be cheap to maintain this differentiating feature. A first release wouldn't necessarily have to expose the full complexity of the types we support. We could, for instance, get away with just supporting two types, which correspond to VARBINARY and LONGVARBINARY. The first could be used in keys and values. The second could only be used in values. Later on, we could add other types like INT, BIGINT, DOUBLE, DECIMAL, etc.. |
Tuple |
This would a wrapper around keys and values. Again, we could probably omit this concept if we just wanted to support byte arrays as the only type for keys and values. If we wanted more complicated keys and values, then a Tuple would be a nice abstraction. A Tuple would be a sequence of typed values. |
BTree |
This would be a persistent btree. It would hold keys in intermediate nodes. It would hold values in leaf nodes. You would get a BTree from a Database. |
Transaction |
This would be used to commit and rollback work. You would get a Transaction from a Database and you would use it in operations on BTrees. |
The following concepts turn up in other, similar products but we may not need them:
Concept |
Description |
Comparator |
This is a scheme for ordering opaque byte arrays. This helps compensate for the fact that other products don't let you split the key value into independently orderable columns. There is not much need for this if we support multi-column keys. In addition, it seems that we ought to be able to realize some performance benefits by not having to serialize and deserialize keys for use by Comparators. For truly opaque byte arrays, the lexicographic ordering of Derby VARBINARY may be good enough. |
As extra credit, in a later release, or in an add-on package, we could add the following:
Additional Work |
Description |
Serializer |
We could add machinery to help users de/serialize Java objects from/to opaque byte arrays and extract key Tuples from Java objects. |
Qualifier |
We could add some simple abstraction on top of the current Qualifier interface so that users could push simple qualifications into the store. |
Issues
Here we track some issues which we will have to address.
Issue |
Description |
Payload |
For performance reasons, we probably want to just store values in the BTree leaf pages. Currently, the BTree pages don't store row contents, but instead store a RowLocation pointer into a neighboring heap conglomerate. |
Locks |
Eliminating the heap means that we will need to re-phrase locks to operate against BTree nodes. Currently, Row-level locks are bound to heap records, not to BTree nodes. |
Upgrade |
We won't need most of the catalogs which support the SQL interpreter. We need to make sure that upgrade will work after we prune out the catalogs. |
Sample Jar File Sizes
The 10.5 derby.jar weighs in around 2500K. By pruning out packages which I thought weren't needed by the Store, I created the following 2 jarballs:
Jarball |
Description |
Size |
derbyMin.jar |
This contains only the Store and io packages. |
478K |
derbyLean.jar |
This contains only the Store plus some other machinery which I thought that the Store probably needed. |
844K |
Probably it would be tricky to build a Derby Lite which is smaller than derbyLean.jar. Shrinking Derby Lite below the size of derbyMin.jar seems like a very big project.
The following tables show the sizes of what's in derbyLean.jar and what's excluded. These are the sizes of directory subtrees after unjarring the jar file on disk. For reference, if you unjar the 10.5 derby.jar, the whole directory tree weighs in at 8900K.
Contents of derbyLean.jar
Percent of total |
Size |
Subtree |
24% |
768K |
org.apache.derby.impl.store.raw |
16% |
524K |
org.apache.derby.impl.store.access |
13% |
424K |
org.apache.derby.iapi.types |
6% |
188K |
org.apache.derby.iapi.store.access |
5% |
152K |
org.apache.derby.iapi.services.io |
5% |
144K |
org.apache.derby.iapi.store.raw |
4% |
124K |
org.apache.derby.impl.services.monitor |
3% |
104K |
org.apache.derby.impl.services.locks |
3% |
088K |
3/5 OF THE LOCALIZATIONS WERE REMOVED FROM org.apache.derby.loc |
2% |
072K |
org.apache.derby.iapi.util |
2% |
064K |
org.apache.derby.impl.services.cache |
2% |
056K |
org.apache.derby.impl.io |
1% |
040K |
org.apache.derby.iapi.error |
1% |
036K |
org.apache.derby.impl.db |
1% |
036K |
org.apache.derby.iapi.services.context |
1% |
028K |
org.apache.derby.iapi.services.locks |
1% |
028K |
org.apache.derby.iapi.services.cache |
1% |
024K |
org.apache.derby.impl.services.jce |
1% |
024K |
org.apache.derby.iapi.services.property |
1% |
024K |
org.apache.derby.iapi.services.monitor |
1% |
020K |
org.apache.derby.iapi.tools |
1% |
016K |
org.apache.derby.impl.services.stream |
1% |
016K |
org.apache.derby.impl.services.daemon |
1% |
016K |
org.apache.derby.iapi.services.io |
1% |
016K |
org.apache.derby.iapi.services.info |
1% |
016K |
org.apache.derby.iapi.services.i18n |
0% |
012K |
org.apache.derby.impl.services.uuid |
0% |
012K |
org.apache.derby.iapi.services.stream |
0% |
012K |
org.apache.derby.iapi.services.diag |
0% |
012K |
org.apache.derby.iapi.services.daemon |
0% |
012K |
org.apache.derby.iapi.services.crypto |
0% |
004K |
org.apache.derby.tools |
0% |
004K |
org.apache.derby.shared |
0% |
004K |
org.apache.derby.osgi |
0% |
004K |
org.apache.derby.impl.services.timer |
0% |
004K |
org.apache.derby.impl.services.jmxnone |
0% |
004K |
org.apache.derby.iapi.services.uuid |
0% |
004K |
org.apache.derby.iapi.services.timer |
0% |
004K |
org.apache.derby.iapi.services.memory |
0% |
004K |
org.apache.derby.iapi.services.jmx |
0% |
004K |
org.apache.derby.iapi.services.info |
0% |
004K |
org.apache.derby.database |
Removed from derby.jar to make derbyLean.jar
Size |
Subtree |
3600K |
org.apache.derby.impl.sql |
0540K |
org.apache.derby.iapi.sql |
0512K |
org.apache.derby.impl.jdbc |
0220K |
(Mostly removed, see above) org.apache.derby.loc |
0136K |
org.apache.derby.jdbc |
0136K |
org.apache.derby.iapi.jdbc |
0132K |
org.apache.derby.catalog |
0100K |
org.apache.derby.impl.store.replication |
0084K |
org.apache.derby.iapi.services.classfile |
0072K |
org.apache.derby.impl.services.bytecode |
0064K |
org.apache.derby.impl.services.reflect |
0048K |
org.apache.derby.vti |
0048K |
org.apache.derby.impl.tools |
0044K |
org.apache.derby.iapi.services.loader |
0044K |
org.apache.derby.diag |
0036K |
org.apache.derby.iapi.db |
0024K |
org.apache.derby.impl.services.jmx |
0020K |
org.apache.derby.security |
0016K |
org.apache.derby.iapi.store.replication |
0016K |
org.apache.derby.mbeans |
0016K |
org.apache.derby.iapi.services.compiler |
0012K |
org.apache.derby.iapi.security |
0008K |
org.apache.derby.authentication |