Summer of Code Proposal - Support for full text search in Cayenne

Subject ID

cayenne-search http://wiki.apache.org/general/SummerOfCode2006#cayenne-search

Subject

Support for full text search in Cayenne

Author

Tomas Jucius

Author

I am an undergraduate student in my final year of Bachelor Computer Science - Programming Technologies at Kaunas University of Technology (KTU), Lithuania. I am working as a programmer on java web-application development using tapestry, wicket(also jsf, jsp) frameworks and on desktop application development. At the moment I am developing content-collaboration-community system using tapestry and cayenne. I have been using Cayenne for about 2 years already. Also I know very well other ORM and data persistence frameworks: I have been using hibernate and torque persistence frameworks.

I am interested in this project because the company I am working for is using Cayenne in several projects. I am looking for every new cayenne release and new features. I think than full text search using lucene search engine would be great feature for Cayenne ORM.

Project

Cayenne is a powerful, full-featured Java Object Relational Mapping framework. The proposal is to implement a full text search support using lucene search engine.

Nowadays, mostly all application, especially web sites, is considering google-like search function. Complex multi-criteria search screens are often perceived by users as being too complex, and are in fact rarely used. Users want to be able to just type the word and have the computer do the rest. For this sort of rich full-text search functionality Lucene is perfect choice. Lucene is powerful full-text search API written in Java. You can use Lucene to provide consistent full-text indexing across database objects and documents.

Deliverables

A plugin for Compass framework which provides support for indexing datasource through OSEM and the Cayenne Object Relational Mapping framework and enables full text search functionality on application domain model.

Sample Cayenne application demonstrating how Cayenne and Compass frameworks work together. This application will demonstrate how searching tasks are done using Compass framework and in standart Cayenne style. This application will allow to see advantages provided by this plugin.

Support for full text search in cayenne using lucene search engine. Full text search support would make it easy to implement google style search in applications. Lucene will provide consistent full-text indexing across database objects. In-Memory object evaluation would become more sophisticated. Some of possible advantages suggested by lucene:

Benefits for the Apache community

The Cayenne framework will support lucene full text search through Compass plugin and other outgoing features. Framework will become more sophisticated.

Design / Approach

I have done research and came to conclusion that implementing a plugin for Compass framework would be much more elegant approach than building a facade to the existing databases full text search capabilities.

I'll name main reasons why I draw such conclusion:

Facade approach:

1) Because not all DB's have full text search capabilities implementing as a facade to existing capabilities would make Cayenne framework inconsistent with some DB's. I suppose that one great quality of present framework is compatibility with supported databases.

2) Not elegant approach.

3) Taking care about all differences between DB's may be time consuming. Also future changes in different DB's may lead to error-prone code..

Compass plugin approach:

1) Compass allows declaratively map object domain model to the underlying Search Engine, synchronizing data changes between Index and different datasources. Also implements fast index operations and optimization.

2) Provides a abstraction on top of the Lucene low level API.

3) OSEM - ability to map java objects to the search engine through simple xml mapping files.

4) Hibernate ORM framework and OJB tool has compass plugin witch provides support for indexing datasource through OSEM and the named frameworks. So we can take good points from Hibernate plugin development experience.

5) Cayenne framework is a natural match to Compass OSEM features. A simple Compass::Gps device can index all the data that has both OSEM and ORM mapping definition can be easily developed, and if the ORM tool provides support for events, then mirroring can be supported as well.

Integrating lucene search engine into cayenne framework. Some of possible basic advantages suggested by lucene: ranked searching, fielded searching, date-range searching, multiple-index searching with merged results

First step will be: to decide which concrete functionality we expect from lucene integration.

Second: designing that functionality

Third: Implementation

Fourth: .....

Timeline

Date

Task

May 23

Project commencement

May 24

Consideration of features to implement, Design

June 4

Implementing basic search capabilities

June 20

Milestone 1 release

June 27

More sopthisticated features

July 4

Going for holiday

July 10

Coming back

July 15

Milestone 2 release

August 2

Testing

August 12

Documentation

August 20

Final release

August 21

Project submission

Me and this project

I have been using several ORM frameworks and during few years I have found that Cayenne ORM is best suited for my projects. There a lot of factors: speed of development, documentation, good mailing list and active community in which I participate too. I am happy about Cayenne joining Apache. This step shows the quality of this product. Only high quality projects belong to Apache Software Foundation. I believe I'm very well suited for this project.