Added Tika Project Report for Oct07
|Deletions are marked like this.||Additions are marked like this.|
|Line 171:||Line 171:|
October 2007 Board reports (see ReportingSchedule).
These reports are due to the Incubator PMC by 10 October 2007
Project name - Apache CXF
Description - SOA enabling framework, web services toolkit
Date of entry - August 2006
Items to resolve before graduation:
- Diversity - Active committers are mostly IONA people. We did vote in two new independent committers and Dan Diephouse now works for an IONA competitor, so some progress is being made.
- Voted in Benson Margulies due to excellent work in several areas of the code, but specifically in the Aegis and XFire migration things.
- Voted in Glen Mazza due to excellent work cleaning up various parts of the code, reviewing everyones commits, updating docs, etc....
- Released 2.0.1-incubator and 2.0.2-incubator.
Started a discussion to talk about Graduation ([http://www.nabble.com/Graduating.....-tf4231363.html#a12038088 link]) but the discussion turned more to how to increase diversity when Jim J. expressed concerns about that.
The cxf-user mail list traffic has tripled since June, and the Chinese language list (http://groups.google.com/group/cxf-zh) is now at over 60 people. Many new users are participating. However, cxf-dev traffic has remained constant except for a HUGE spike (2.4x normal) in September.
- Released 2.0.1-incubator and 2.0.2-incubator - These were bug fixes from 2.0, but some minor new features were added.
Started discussing a roadmap for a 2.0.3-incubator release (bug fixes) as well as a 2.1 release. (http://incubator.apache.org/cxf/roadmap.html)
Project name - Apache FtpServer
Description - Java based FtpServer
Date of entry - March, 2003
Progress since last report
* Only minor development efforts since last report due to commiter time limitations, mostly focused on fixing bug reports and feature requests which are coming in at a steady pace. More people seems to use FtpServer now that activity is steady on the project. Voted in a new commiter, Clint Foster. He is currently in progress of setting up his account and get going with the practical details. * Niclas Hedhman added as a much needed mentor to the project. * Continous builds are in the works using the vmbuild1 server, with the aim of producing snapshots which has been repeatably asked for
Top three items to resolve
1. Growth of community - we still need more active members 2. Getting snapshot builds published automatically by the build server (minor details left to finish) 3. Getting s stable release out, need more active members to have good review
- Getting a build server in place so that we can generate snapshots, they have been requested by several people.
Description - Ivy is a dependencies management tool mostly used in combination with Apache Ant
Date of entry - October 23rd, 2006
Remaining item to resolve:
- Vote for graduation at general incubator list
- We have made research for Ivy trademarkt
- We have defined some plan for for the 2.0 release
- We have seen new contributors on our dev mailing list (some ant commiters, and some others)
- Gilles Scokart has join the PPMC
- We have voted to join ant as subproject
- Ant TLP has voted to accept ivy as subproject
- Second release done
- The coding activities has been been more limited, but our development efforts still focus on the integration with maven repository, bug fixing, and tutorials enhancements.
RCF is a rich component set (ajax-style) for JSF. The RCF Wiki has been created and all committers are now set. We started discussion on the OpenSource efforts, how to manage it, since we need to provide a migration path for current users. We are waiting for an import of a current code drop into the repository.
Project name - Apache Lucene.Net
Description - Lucene.Net is a source code, class-per-class, API-per-API and algorithmtic port of the Java Lucene search engine to the C# and .NET platform utilizing Microsoft .NET Framework.
Date of entry - April, 2006
Progress since last report - We have seen good activities in the past three months on the mailing list (both the dev and user). Unlike in the past, where I was taking care of all fixes in the code in preparation to a release, I have seen considerable patches submitted by the community to resolve open issues in Lucene.Net 2.1 work. This is very encouraging and I would like to suggest to the community to vote on adding one of the active patch submitter as a committer.
Code aspects - Code is very stable with current "beta" version of 2.1 which we expect to release in the next few weeks.
Community aspects - We have good followers and the past three months have shown good commitments for submitting patches.
Top three items to resolve -
1. Growth of community - while this has been improving, we still need more active members, specially those who submit patches.
2. Release Apache Lucene.Net 2.1 and start working on 2.2 / 2.3.
3. Vote in a new committer.
The Apache Qpid Project provides an open and interoperable, multiple language implementations of the Advanced Messaged Queuing Protocol (AMQP) specification
Date of entry to the Incubator : 2006-09
Top three items to resolve before graduation
We aborted the M2 release vote mid way due to some key bugs. We are building new RC's and will re-vote. Once M2 is out we will seek graduation.
1. There don't seem to be any major issues currently, or items that need to be raised. Most notable is that the Qpid community and users have had quite a lot of debate on code and practices in the last period. Some debates where quite intense, but all very productive furthering team work and community.
* Any legal, cross-project or personal issues that still need to be addressed?
From last report: The whole project has not gone through release review and the license files and notices need to be checked for all languages and components.
- Done - Legal review was done prior to M2 vote for the full code base. M2 vote was aborted for key bug, and is being restarted Oct 8th.
* Latest developments.
- Since entering into incubation we have had one release of the java code base (M1).
- We have migrated our java build system from ant to maven.
- Development has been moving forward. with improvements in memory footprint management passing the JMS TCK in with the java broker.
- Addition of .NET client
- Successfully voted to give 7+ new committers access rights
- Successfully voted to give a new member contributor rights to cwiki.
- The creation of the Web site
- General progress on all code bases
- Created python test suite
- Added Ruby language support
- We have stabilized our M2 release, voted, aborted vote - and re-vote Oct 8th
- Lots of code and clean up and new functionality
- Increase in user mail to the point of requesting and creating a user list
- Requests from other projects to integrate with us
- Building M3 / V1 if graduated.
* Plans and expectations for the next period?
During the next period, once we have M2 out I believe we will see if we can graduate.
iPMC questions / comments:
Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser Libraries. Tika entered incubation on March 22nd, 2007.
There have been a number of positive items within Tika during the last few months. The traffic on the Tika mailing list has increased significantly (with typically 2, 3 questions, and 1 or 2 commits every day, or every other day), and there have been a lot of recent inquiries from external projects wanting to collaborate with Tika (including Aperture, PDFBox and a fellow developing a JSon library currently hosted at Google code). In addition, Tika's architecture has become a recent discussion of interest (as we'll see below).
We recently elected Keith Bennett as a new committer to Tika. Keith has been spearheading many of the new patches committed to Tika, as well as participating in discussions about the architecture, and future direction of the project.
Tika will be represented at the "Fast Feather" track at Apache Con US by Jukka Zitting. The rest of the community is helping to create the content for the presentation. The abstract is listed below:
Tika is a new content analysis framework borne from the desire to factor our commonality from the Apache Nutch search engine framework. Tika provides a mime detection framework, an extensible parsing framework and metadata environment for content analysis. Though in its nascent stages, progress on Tika has recently taken shape and the project is nearing a stable 0.1 release. In this talk, we'll describe the core APIs of Tika and discuss its use in several distinct domains including search engines, scientific data dissemination and an industrial setting.
There have been a flurry of JIRA issues and code activity  including 47 issues currently in JIRA, with 32 resolved issues, 14 closed issues, and 2 open major/minor issues in progress).
Tika's Parser interface (one of its key components) has just undergone a major overhaul led by Jukka Zitting, and Chris Mattmann has recently contributed a MimeType system (with help from fellow Apache Nutch committer Jerome Charron) to Tika. We also cleaned up and refactored large parts of the rest of the code (removing references to LuisLite and branding the project wherever possible with the Tika name), in preparation for an upcoming 0.1 release.
Chris Mattmann has led an effort to carve out the existing MimeType detection system in Apache Nutch  and replace it with Tika's improved MimeType detection system. There is a patch sitting in JIRA right now , and barring objections, Nutch will rely on Tika for its MimeType detection abilities.
Also active recently were committers Bertrand Delacretaz, Sami Siren and Rida Benjelloun, committing patches and improvements wherever needed.
Issues before graduation
No changes since our last report: the Tika project is still at an early stage of incubation. We need to continue bringing in the initial codebases and are targeting an initial incubating release (0.1) probably within the next month. We also need to work on growing the community and figuring out how to best interact with external parser projects.
TripleSoup is intended to provide an RDF store, tooling to work with that database, and a REST web interface to talk to that database using SPARQL, implemented as an apache webserver module.
TripleSoup has voted itself into dormant mode. The main reason the project did not quite materialize seems to be that we don't have enough people with sufficient interest, need, and time to make this really take off.
UIMA is a component framework for the analysis of unstructured content such as text, audio and video. UIMA entered incubation on October 3, 2006.
Some recent activity:
- Version 2.2 was released 8/2007, our second incubator release. This one went without a hitch.
- We're currently discussing the next release, which will likely be mostly a bug fix release with only minor new features.
Items to complete before graduation:
- We still need to attract more new committers. We're trying to spark even more activity in the sandbox to get people to contribute.
- We have recently welcomed our first new committer, Jörn Kottman. Jörn contributed and continues working on an Eclipse based editor. He also made contributions to our build process and design discussions.
- There's a good amount of traffic on both the dev and user list.