This page continues a discussion about how to integrate Lucene with Derby. Lucene is an Apache text search engine. The discussion began on the Derby user mail list with the Full Text Indexing thread. JIRA enhancement request DERBY-472 tracks this discussion.

This page briefly describes Lucene's capabilities and then explores text-searching features and use cases which Derby might support. Please feel free to expand this list of features and use cases.

Lucene's Capabilities

Lucene provides a java library for indexing and searching documents. Lucene ships with English, German, and Russian support and you can find plugins for other languages, including Chinese, Japanese, and Korean. Plugins exist for the following document formats:

plain text

html

xml

Open Office

Word

Excel

Powerpoint

IMAP mail

RTF

PDF

The following high level concepts drive Lucene's design:

Features We Want

Integrating Lucene with Derby may involve some or all of the following features. Probably we would phase in features over a number of releases.

Use Cases to Support

Use Case

Description

Example

Loose Coupling

Store documents outside Derby in a filesystem or web.

Web-advertising: Maintain a searchable web of content. When the user searches for content, return web pages as well as advertising jsps bound to certain keywords.

Moderate Coupling

Store documents inside Derby but maintain text-search indexes outside Derby in a filesystem. Provides transactional versioning and audit trail for documents which can be text-searched.

Law office: Be able to transactionally store legal documents and search for them later.

Tight Coupling

Transactionally store documents and text-search indexes inside Derby.

Online market: Be able to search for an item immediately after its description is posted.

Issues

LuceneIntegration (last edited 2009-09-20 22:11:25 by localhost)