Differences between revisions 2 and 3
Revision 2 as of 2006-05-09 05:29:32
Size: 4605
Editor: ToddBealmear
Comment:
Revision 3 as of 2009-09-20 23:36:26
Size: 4605
Editor: localhost
Comment: converted to 1.6 markup
No differences found!

ToddBealmear/SummerOfCode2006

Subject

lucene-gdata-server

Author

Todd Bealmear, Alamo, California/Tucson, Arizona

Bio

My name is Todd Bealmear – I’m a high school senior admitted to the University of Arizona for the fall 2006 semester. My intended major is Computer Science.

I’ve used many Apache products in the past; whether it was the HTTPD or, more recently, Tomcat and Lucene. We used Tomcat and Struts to rebuild my high school’s entire website, including core features such as a student/staff file manager and an online course documents system. Additionally, I used Lucene to power the site’s search engine (which has yet to be implemented on the live site, unfortunately).

The lucene-gdata-server project combines a lot of different topics I’m interested in, primarily Java and Search technologies. The project is of interest because it combines the two topics I just discussed with the exciting Gdata feed format – an interesting new tool for data manipulation and distribution. I’d like to be involved in the project because it presents me with an outlet to exercise my interest in all three areas.

Project Overview

Google’s new Gdata format presents developers with an innovative new method for distributing and manipulating data; it combines the best features of RSS 2.0 and Atom with REST functionality, allowing for the creation, deletion, and editing of data stored in the feed while adding a robust authentication system. The goal of the project will be to create a system through which Apache Lucene and Gdata can communicate effectively. While this can be done in various ways, the method preferred is to use a Servlet Container (read: Tomcat) to execute a WAR of Servlets that demonstrates the full potential of implementing Lucene with a feed format that implements the REST architecture (such as Gdata).

The project should include ways for users to both populate Gdata feeds with data from a Lucene Index and vice-versa. Both Lucene and the Gdata API include methods for doing so, halving any development time that might be spent on writing code to populate or edit either the index or the feed. Additionally, the Gdata API includes methods to parse to Gdata feeds, removing any need to implement external, third-party XML parsers such as Xalan or Xerces (though those would be difficult at best to use with Gdata, considering it doesn’t follow RPC standards).

To take full advantage of Gdata’s REST features, it is necessary to add functionality for user authentication and CRUD (Create, Read, Update, Delete). Only authenticated users will be able to execute commands that query a Gdata feed. Unfortunately, Google’s recommended method for authentication on Web Applications, Proxy Authentication for Web Applications, is not yet available (though it is slated to be released shortly). Assuming Google releases the API, users will sign into the service using their Google Accounts. The API would then return a token for our application to interpret. Once a user is authenticated, they can then query a Gdata feed. Any create, update, or delete queries, while effecting the Gdata feed itself, will also automatically reflect on the Lucene Index as well; this ensures that both are never out of sync.

Finally, what use is the system without some way to present the data to the end-user? Since Gdata is XML based, it should be fairly easy to implement a feature to transform the XML into a presentable interface.

Deliverables

  1. Create utilities to implement CRUD functionality between Lucene and Gdata. This would include any authentication and query components that would need to be added to the implementation.
  2. Create utilities that allow Lucene to index any data from a Gdata document using a read query.
  3. Implement methods for XML Transformations (to provide a presentable interface for users to access Gdata…erm, data.)
  4. Complete documentation (Javadocs).

Timeline

May 24, 2006 – SoC starts

June 25, 2006 – Gdata manipulation utilities complete

July 25, 2006 – Gdata reading/searching utilities complete

August 25, 2006 – Presentation interface complete

August 26-September 4, 2006 – Testing, debugging, and completed documentation

September 5, 2006 – SoC ends

Follow Up

Any questions about this proposal can (and should) be sent to me via the following communication mediums:

Email: todd _at_ bouncyglue _dot_ com

AIM: dialtfortodd

Gtalk: todd _at_ bouncyglue _dot_ com

ToddBealmear/SummerOfCode2006 (last edited 2009-09-20 23:36:26 by localhost)