Clerezza Proposal

This proposal is being discussed on the general at incubator.apache.org mailing list.

Abstract

Clerezza is an OSGi-based modular application and set of components (bundles) for building RESTFul Semantic Web applications and services.

Proposal

Clerezza can be used as a platform providing all the compile and runtime requirement for building semantic applications, or used as individual bundles within an OSGi framework, e.g. Apache Sling, Apache ServiceMix, or the Eclipse platform.

Clerezza provides:

The RDF abstraction layer can be used independently of other aspects of Clerezza. It allows applications to be written regardless the used backend. In its purpose, it is similar to RDF2Go, but provides a significantly more modular interface allowing e.g. to independently switch the storage, querying, or serialization layer. Furthermore, it doesn't introduce concepts alien to the RDF model such as blank node labels, but is in its core strictly limited to RDF semantics.

The JAX-RS implementation can also be used independently of any other components. It allows OSGi services to provide a RESTful interface to their methods. By being based on wymiwyg WRHAPI, it can run both on the default OSGi Web Service as well as on a jetty instance listening on a different port.

Background

The current web trends focusing on information sharing, interoperability and collaboration. Therefore the behaviour of the end-user has changed over the last years: end-users not only consuming information they also producing content anytime anywhere - in contrast to non-interactive websites where users are limited to the passive viewing of information that is provided to them. Since the end-users are sensitized to the possibilities of the web the web application requirements increases. Examples of such applications are social-networking sites, wikis, blogs and mashups.

The REST paradigm and Semantic Web technologies support these trends and form the basis for the upcoming Web of Data (a.k.a. linked data, Web 3.0). They change the paradigms for developing complex Web applications. Clerezza allows to develop applications that integrate perfectly in the Semantic Web providing all accessible resources in machine understandable formats without imposing additional burdens on the developer. Additionally, thanks to the flexibility of the RDF model used as back-end, some tedious database related tasks required for traditional Web application development are no longer needed.

Rationale

Most Web application framework are not designed to leverage the full power of HTTP but often try to reproduce non Web design patterns for the Web environment. In general, application frameworks are oriented towards relational or hierarchical data structures. While attempts to overcome this such as Drupal have become very popular, they do not at their core base on the stack of Semantic Web standards. Clerezza will prove that the flexibility of the RDF doesn't result in increased complexity, but on the contrary allows for fast prototyping and development.

Initial Goals

The initial goals for Clerezza are:

Current Status

The current codebase is developed and tested using Apache Felix. It has been developed intensively and reviewed at trialox since August 2008 using Scrum, with a development process emphasizing individual accountability and reviews. We have internally demonstrated that we can release codes as scheduled. Platform core functionalities are available, however the need for new features may arise, performance and robustness could be improved. Incomplete documentation for the project is available with the individual artifacts, both in the generated maven sites as in a version available exposed by Clerezza at runtime. We also have a wiki at http://wiki.trialox.org with some information mainly on the development process. We also use mailing lists for communication among developers and users.

Meritocracy

The core developers understand what it means to have a process based on meritocracy. We will provide continuous efforts to build an environment that supports this, encouraging community members to contribute.

Community

Trialox has been developing the current codebase since August 2008. Trialox was founded in partnership with the University of Zurich and could benefit from previous research work at the Department of Informatics. Part of the team from the beginning was Reto Bachmann Gmür who has been developing open source Semantic Web applications for many years, including working with the Jena team at HP Labs.

Trialox has contributed to the JAX-RS specification. Some of the code written by trialox is used by Open Source projects such Paxle and Gradino.

Clerreza is used by globally active non-profit organisations such as the WWF. These organisations have strong developer networks including motivated volunteers, which will contribute to Clerezza.

Core Developers

People from Trialox, the University of Zurich, as well as partner companies of Trialox have contributed to the project. Currently, the following persons are core developers of Clerezza:

Alignment

We provide a launcher which runs Clerezza's bundles within Apache Felix. We also provide feedback about the usage of Apache Felix and its components including the framework security via the mailing list. For building Clerezza's bundles we use Apache Maven and various plugins. We also have developed a plugin to help managing projects which contain ontologies, so that Java classes representing those ontologies can be pre-compiled. Other projects which based on Web services and/or RDF can benefit from Clerezza or its specific bundles. We are open to collaborate with other Apache projects which can benefit from functionality provided by Clerezza. Clerezza has the advantage of being very modular and independent of application frameworks, thus can be easily integrated with other Apache projects. UIMA and Tika come to mind, as they would help extract semantic information from various data types and formats. An alignment to the JAX-RS implementation in Apache CXF could not only help removing the dependencies to the CDDL-licensed code taken from Jersey but also help providing a fully framework independent implementation with a larger group of developers and thus higher quality.

Known Risks

The current team of Clerezza core developers is small, but being an innovative project in the semantic "space", we are confident that Clerezza can attract new developers.

Clerezza has been started as an Open Source project providing mercurial repository for public access to source codes and also a publicly accessible JIRA instance for issues tracking. Clerezza is licensed since project begin under Apache License version 2.0. Some of the initial committers already have strong experiences with Open Source software development. Others, while not being totally inexperience, are willing to learn.

The risk that Clerezza will be an orphaned product is considered small. Three main factors will avoid this to happen:

Documentation

A small set of further documentation is available under the following links:

Initial Source

Clerezza has been in development since mid 2008. Public access to the source is provided through http://scm.trialox.org/.

Source and Intellectual Property Submission Plan

The current codebase is owned by trialox, and will be donated together with its documentation. We will get the paperwork out of the way as soon as possible.

External Dependencies

There are quite a few open source libraries already used. They have Apache compatible licenses, with one issue to solve around Jersey which is CDDL.

The libraries, their sources and licenses are listed here:

Apache Felix, ASL:

OSGi Alliance, ASL:

Apache Maven, ASL:

Eclipse, ASL:

OPS4J, ASL:

WYMIWYG, ASL:

jQuery, MIT license:

Hewlett-Packard Development Company, BSD License:

OpenRDF.org, BSD license (for the optional sesame backend adaptor):

Mulgara.org, Open Software License ("OSL") v. 3.0 (for the optional mulgara backend adaptor):

XSite (http://xsite.codehaus.org/), BSD license:

Jersey (https://jersey.dev.java.net/), CDDL license

The current code bases on code licensed under the CDDL, according to http://apache.org/legal/resolved.html we understand we have to get rid of these before making a release, or redistribute in binary form only. The following files are affected.

Required Resources

Mailing lists:

Subversion:

Issue Tracking:

Initial Committers

These committers have either worked on the initial codebase (Reto, Immanuel, Tsuy, Hasan) or expressed an interest in extending the project:

Affiliations

Manuel Innerhofen, Tsuyoshi Ito and Reto Bachmann-Gmür work at trialox and might get paid to work on Clerezza.

Hasan Hasan from University of Zurich is paid to work in a project that is based on Clerezza.

Michael Marth and Bertrand Delacretaz work for Day Software.

Sponsors

We have approached both the champion and an initial list of mentors that have agreed to mentor this project.

Champion:

Mentors:

Sponsor: