Apache Stanbol incubation proposal

Status

Proposal has been accepted: http://s.apache.org/stanbol.vote - see you soon at http://incubator.apache.org/stanbol

Abstract

Apache Stanbol is a modular software stack and reusable set of components for semantic content management.

Proposal

Stanbol components are meant to be accessed over RESTful interfaces to provide semantic services for content management. The current code is written in Java and based on the OSGi modularization framework, but other server-side languages might be used as well.

Applications include extending existing content management systems with (internal or external) semantic services, and creating new types of content management systems with semantics at their core.

The architecture of the current (alpha-level) code consists of four layers:

Background

Stanbol comes out of the IKS project (Interactive Knowledge Stack, http://iks-project.eu/), a research project funded by the European Community (EC) which aims to create a semantic content management software stack.

One of the goals of IKS is for its software to survive the 4-year funding period of the EC, which ends in 2012.

Developing its code in the open at the Apache Software Foundation, and growing a community before IKS funding runs out, is the best way to ensure the sustainability of the Stanbol software.

For more background information, some articles and tutorials on FISE, which was the first usable IKS module, can be found in the “FISE links” section of http://wiki.iks-project.eu/index.php/FISE

Rationale

Content Management Systems (CMS) can benefit from semantic add-ons in a number of ways, including more intelligent linking, automatic or semi-automatic tagging of content, enhanced user interactions based on intelligent and dynamically adaptable user scenario modeling, etc.

However, many CMS vendors and developers are not aware of or skilled enough in semantic technologies to make effective use of them. Research in semantic technologies often happens in academic circles which might not make their findings available in a way that’s easily consumable by today’s CMS vendors and developers.

Some big companies are using semantic technologies behind the scenes to provide powerful services, but that technology is usually not accessible to smaller vendors.

Stanbol aims to bridge these gaps by providing CMS vendors and developers with easy to integrate semantic components that add value to their offerings.

At the same time, more experimental advanced semantic applications will be built on the Stanbol stack, with the medium-term goal of enabling pure semantic-based content management and other applications.

Initial Goals

Current Status

Meritocracy

As IKS is an EC research project with funding, it does not formally operate as a meritocracy.

However, due to the open source way of working adopted by the consortium, an informal meritocracy has emerged within IKS.

We estimate that adapting to the ASF’s meritocratic way of working will be easy for the initial set of Stanbol committers, as the differences to the current way of working are not dramatic.

Community

The IKS project plan includes an important effort to build a community around the software that it produces. Several community workshops have already taken place, attended by more than 40 European CMS developers and vendors.

See http://wiki.iks-project.eu/index.php/Workshops for more info.

A community is emerging around IKS, and moving to the Apache project governance model should help grow it - also by reassuring community members that the software will continue to be available and maintainable once the IKS EC funding runs out.

Core Developers

The IKS consortium consists of seven academic research groups and six “industrial partners”, companies active in the CMS space.

See http://iks-project.eu/team/team for the list.

The current IKS software has been written by a group of about a dozen developers from this consortium, with few external contributions until now. Members of the Clerezza community have contributed some key pieces, and ties between both communities are strong.

Alignment

As many Apache projects have something to do with content management, obvious synergies exist, which should allow us to grow the community from inside the ASF as well as from the outside.

Known Risks

Orphaned Products

The IKS code as it stands now might be orphaned when the EC funding of IKS runs out at the end of 2012.

That’s why we want to move to Apache now, to have a bit more than two years to make Stanbol independent of its EC funding.

Inexperience with Open Source

The IKS team includes a number of very experienced Open Source developers, along with people doing their first open source contributions.

Since the IKS consortium started writing code early this year, we have had ample opportunity to bring everybody up to speed as to how open source works, and we’re confident that the initial committers will quickly adapt to the ASF’s way of working.

Homogeneous Developers

The current developers are spread amongst the IKS consortium partners, with no dominant company or organization.

Reliance on Salaried Developers

Until the end of 2012, the work of IKS consortium members is funded by the consortium, so there is a “common boss” problem, and we can assume that most or all of that work is salaried.

Moving software development to the ASF, and especially growing a community to include committers from outside the IKS consortium, should help reduce or eliminate this risk. Even IKS partners using the software in their products will help reduce the “common boss” problem, as both the IKS and the partner company will have a need for Stanbol software.

Relationships with Other Apache Products

The IKS software is written as a set of OSGi components and runs on Apache Felix, using the launcher from Apache Sling.

It also uses several key components from the Apache Clerezza incubating project, along with a number of other Apache libraries. Several Clerezza committers have been contributing in IKS workshops, without being part of the IKS consortium.

Clerezza in turn uses Jena, which is also joining the Apache Incubator.

Lucene/Solr will be used for indexing and search.

We also expect to use software from or collaborate with Mahout, Tikka, Jackrabbit, UIMA and Chemistry.

An Excessive Fascination with the Apache Brand

The brand is not what makes the difference for the IKS team, the motivation is the opportunity to build and grow a community.

Documentation

Existing components are documented at http://wiki.iks-project.eu/ and http://code.google.com/p/iks-project/w/list but that information is still incomplete due to the alpha status of most of that software.

Initial Source

http://code.google.com/p/iks-project/

External Dependencies

Appendix A contains the list of Maven groupIds of dependencies of the various Stanbol modules.

Most of those are compatible with ASF requirements (http://apache.org/legal/resolved.html) but an extensive check is needed, to remove/change any non-compatible ones.

Required Resources

Mailing Lists

Subversion Directory

Issue Tracking

Other Resources

We will probably request a wiki once the podling is setup, and access to a Hudson continuous build server.

Initial Committers and affiliations

The following people are members of the IKS consortium, see http://iks-project.eu/team/team for a description of their organizations:

The following initial committers are not members of the IKS consortium:

Sponsors

Champion

Nominated Mentors

Sponsoring Entity

Apache Incubator.

Appendix A: list of dependencies

Here's the list of Maven groupIds of the current Stanbol dependencies, omitting org.apache.* and commons-* groupIds but including transitive dependencies.

asm
com.aetrion.flickr
com.beetstra.jutf7
com.drewnoakes
com.googlecode.json-simple
com.hp.hpl.jena
com.ibm.icu
com.sun.jersey
com.sun.xml.bind
dom4j
edu.smu.tspell
eu.iksproject
hermit
info.aduna.commons
it.unimi.dsi.fastutil.chars
javax.activation
javax.mail
javax.servlet
javax.ws.rs
javax.xml
javax.xml.bind
javax.xml.stream
jetty
jtidy
junit
local.jrdf
log4j
mysql
net.fortuna.ical4j
net.fortuna.mstor
net.sf.jacob-project
net.sf.kxml
net.sourceforge
net.sourceforge.juniversalchardet
org.antlr
org.bibsonomy
org.bouncycastle
org.clojars.thnetos
org.codehaus.castor
org.codehaus.jackson
org.codehaus.jettison
org.codehaus.woodstox
org.freemarker
org.hsqldb
org.htmlparser
org.jaudiotagger
org.jdom
org.json
org.mockito
org.mortbay.jetty
org.nsdl.mptstore
org.openrdf.sesame
org.ops4j.base
org.ops4j.pax.exam
org.ops4j.pax.runner
org.osgi
org.samba.jcifs
org.scala-lang
org.semanticdesktop.aperture
org.semanticdesktop.nepomuk
org.semanticweb.owlapi
org.semweb4j
org.slf4j
org.textmining
org.wymiwyg
owl-link
owlapi
ronaldhttpclient
stax
trove
xerces
xmlpull

StanbolProposal (last edited 2010-11-16 14:06:25 by BertrandDelacretaz)