Apache Annotator (proposal)

Abstract

Annotation enabling code for browsers, servers, and humans.

Proposal

The Annotator community seeks to build a foundational set of libraries under a liberal license providing the pieces necessary for developers to add annotation to their projects.

Background

Annotator.js was originally created by Open Knowledge (formerly The Open Knowledge Foundation) to provide annotation over works by Shakespeare. Since that time, Annotator has found its way into a wide range of browser-based annotation systems such as Hypothes.is, LacunaStories.com, and various academic, publishing, and scientific research projects.

Sadly, this increased usage has primarily happened in forks of the main code or through copy-left licensed plugins that prevent their use by many community members.

However, the community remains interested in combined collaboration and interested in a foundational future for annotation--both in browsers as well as servers and desktop/mobile applications.

Rationale

Annotation is often implemented in projects in ad hoc ways with developers often re-solving problems well known to the Annotator community. The Annotator community works to provide knowledge and code to help developers more quickly implement or improve annotation within their projects.

We believe bringing the Annotator community into the Apache Software Foundation will allow for wider recognition of the annotation problem space, help more developers find their way to solving this shared problem, provide increased cohesion for our own somewhat fractured community, and increase the use of commonly shared code within a wide range of projects.

Initial Goals

Current Status

Meritocracy

The project is in transition from a primarily BDFL-based model to one with a more diverse set of committers. There are 36 total known commiters to Annotator. 3 commiters having done the bulk of the coding and decision making. 2 of those commiters acting as project leadership.

However, the community is much larger and more diverse when the various forks and plugin authors are considered.

We intend to invite and include participants from a wide array of annotation problem spaces to collaborate in this new shared space.

Community

Community calls had been being done every 3-6 months with reports of the calls outcome being posted to the mailing list and the annotatorjs.org website.

Most activity within the project happens on the mailing list. There is also a relatively inactive #annotator channel on irc.freenode.net. The website is primarily for promotion and includes promotion of community plugins and showcases projects using Annotator. Documentation is published on readthedocs.org and linked to from the website.

There are many Annotator and W3C Annotation Data Model related projects found on GitHub. Our objective would be to invite these communities to join this collaborative community with the hope of greater stability and community longevity.

Core Developers

The 3 primary committers to the project are Nick Stenning of The Hypothesis Project, Randall Leeds of Medal, and Aron Carroll of Dropbox, Inc. Nick Stenning is the original creator of Annotator. Randall Leeds is an Apache CouchDB committer. Aron has been a frequent contributor. All three have been members of The Hypothes.is Project in past years.

Other currently active community members include:

Other committers have contributed significant amounts of code, content, or issues and discussions, but are currently (in the last 3-6 months) less active on the project. However, at recent annotation related conferences the scale of the plugin, fork, and ancillary project activity was shown to be much higher than what was apparent from activity on the main Annotator mailing list--in part due to community fracturing...something we hope to fix with joining the ASF.

A full list of Annotator contributors can be seen here: https://github.com/openannotation/annotator/graphs/contributors

Alignment

The Annotator community believes that the Apache Software Foundation promotes and enforces the sort of community that will best serve the future of the project. It is also believed that Annotator can serve the ASF by providing its tools to bring annotation into various Apache projects and eventually to the apache.org site, project documentation, and other tools within the ASF.

The priority is on increasing community involvement, defining--via the Apache Way--how we will code and collaborate going forward, and upon creating the best possible annotation solution born out of that collaboration.

Known Risks

Orphaned products

The majority of the core committers were formerly from The Hypothes.is Project which used an earlier version of Annotator within it's annotation web service and BSD-licensed h annotation software. However, Hypothesis and most other organizations and projects using Annotator have forked the main code base or created unique plugins which only exist within their projects and have not been contributed upstream.

The fracturing of the community and previous single-entity contribution has greatly prohibited collaboration and growth of the community. Concurrently, interest and growth of annotation projects from a wide constituents has grown--though around a much wider array of code and projects. The hope is that the creation of a collaborative space built for discussion and sharing of these tools would provide the opportunity to reach a common core to be shared among the many diverse players.

As such, the Annotator project has begun the process of becoming an Apache project to establish a development and community process that encourages diversity and cross-organization collaboration.

Inexperience with Open Source

Annotator was established as an Open Source project in 2011 with it's first, v0.0.1 release being made on January 1st of that year: https://github.com/openannotation/annotator/releases/tag/v0.0.1

The project has continued since that time as an open source project developed on GitHub. The community has grown in diversity since that time and was moved into a separate "openannotation" GitHub organization (from the original "okfn" GitHub organization) in 2014 in an effort to increase community involvement and diversity.

Each of the core committers have worked on and created open source software for themselves or various organizations for the greater than 5 years. Two of the contributors mentioned above also have greater than 5 years contributor experience at the ASF and are both now core committers to a top-level project (Apache CouchDB).

Homogeneous Developers

Active community members as well as plugin and compatible annotation storage system builders are from a diverse, though scattered, range of organizations and individually driven projects.

The Annotator community is seeking to combine its efforts into a core group of committers to more accurately encourage a shared foundation as well as continue the growth in diversity of the community.

Geographically, the Annotator community is widely distributed from Germany, Hungary, the East and West coasts of the US, and Australia.

Additionally, the wide range of annotation related projects that may be considered as input for this projects code explorations range in size, contributor diversity, and growth.

Reliance on Salaried Developers

In the past, contributors to Annotator project were solely from The Hypothes.is Project and their activity was driven primarily by the needs of that project. However, the diversity of interested participants has greatly increased. There is an additional hope of creating an aggregated community from various projects (including Annotator, Hypothesis' h code, and various related libraries and plugins) as well as exploring the creation of new tools--not only for the browser--to further widen the interest and activity around annotation.

Relationships with Other Apache Projects

The Annotator community also provides an annotation storage system ("annotator-store") built upon ElasticSearch. There are compatible implementations of that API built on various storage systems (including Apache CouchDB), and the community would encourage the creation of other compatible storage systems built upon other Apache storage projects.

Additionally, Annotator is a JavaScript library which could serve any of the various CMS projects within Apache.

The roadmap for Annotator also includes compatibility with the Web Annotation Data Model which is a JSON-LD serialization of an RDF-based data model for annotation. The growing number of RDF-focused Apache projects could take advantage of and contribute to the creation of these features.

The W3C Annotation Working Group is also creating multiple related deliverables around Web Annotation including an Linked Data Platfrom-based Protocol specification, a note about selector systems, and future notes for various serialization and integration opportunities for the Web Annotation Data Model. Apache Marmotta is one project within the ASF which has native support for LDP and may have an interest in collaborating around implementation of the Web Annotation Protocol.

Lastly, Apache UIMA can currently generates Open Annotation Data Model annotations as an output of it's Natural Language Processing system. These annotations could be displayed via code written within this new Apache project--which could further leverage user interaction with those NLP-based annotation (such as confirmation, rejection, or modification of the annotations made by Apache UIMA's NLP process). There are other NLP projects within the ASF which could similarly benefit from these explorations and code generated here.

A Excessive Fascination with the Apache Brand

The Annotator community acknowledges the value and recognition that the Apache brand would bring to the Annotator project. However, the primary interest is in the community building process and long-term stability that the Apache Software Foundation provides for its projects.

We do hope for increased recognition of and contribution to an array of annotation code projects built within this community. However, we primarily hope for community aggregation driven by building a core set of tools for our shared set of needs which are now scattered across various annotation endeavors.

Integrating those developers into this new community and adding them as contributors is seen as a much higher priority then increasing awareness through branding.

Documentation

Websites:

Documentation:

Mailing List:

Code:

Annotator plugin index:

Initial Source

The original Annotator code base was created by Nick Stenning while at the Open Knowledge Foundation. The code has been in development since before 2011 with the first public release (v0.0.1) happening on January 1st, 2011 on GitHub.

The example annotation storage system (which works with Annotator's stock Store plugin) had it's first release in February 21, 2011 and was originally built for Apache CouchDB. The contributor list of annotator-store is similar, but the license is simply the MIT (rather than MIT & GPL). The stated copyright is 2010-2012 Open Knowledge Foundation.

Additionally, there is a growing list of forks, plugins, and related tooling created by the community in various places--often embedded within larger projects. The Annotator Plugins index has reference to some such possible inputs to this project's code. The W3C specifications are also being implemented and the growing number of projects available around those specifications would also be considered as possible inputs. Most specifically, Randal Leeds (also a contributor to Annotator) has built a set of libraries focus on implementing the W3C selectors. These libraries could serve as an initial foundation for a core library for browsers or JavaScript-base server code.

Source and Intellectual Property Submission Plan

Our primary goal is to aggregate communities that center around annotation. We intend to focus our initial work on a JavaScript-based library built from Randall Leeds dom-anchor-* libraries (single owner copyright; MIT licensed) and potentially reusing code from Annotator (mixed owner copyright; MIT & GPL dual-licensed).

The Annotator community has a stated copyright owner of "The Annotator Community." All contributions are believed to have been made "in kind" and the copyright owned by the various contributors. The three primary committers have stated a willingness to donate their contributions to the Apache Software Foundation and the minimal parts with copyright owned by others will likely be rewritten. Though we also hope to engage these individuals to join the combined efforts being made at the ASF.

The annotator-store project is under a clearer, single BSD license. The copyright holder is stated to be the Open Knowledge Foundation with the years 2010-2012. It is likely that this code will only be used for reference or via library inclusion and not directly developed upon within the ASF.

An earlier process was undertaken to collect re-licensing permission from known contributors via the existing mailing list and GitHub issues--using a model similar to Twitter's when it relicensed Bootstrap. General agreement was reached, but no decisive actions were taken as many contributors of smaller amounts of code were no longer reachable.

We hope to engage the various plugin and fork authors along with similar annotation projects to engage future work under a shared license and developed within The Apache Way. The contribution of specific code to this project or its future deliverables will be handled individually by the community over the course of the project.

One core goal of bringing the community to the ASF is to avoid this confused licensing situation in the future.

External Dependencies

Annotator depends on the following JavaScript modules from NPM:

annotator-store depends on the following Python modules:

MongoServer (a Web Annotation Platform implementation) is a single owner project currently licensed under the Apache License 2.0.

Randall Leeds dom-anchor-* libraries are all licensed under the MIT and include these dependencies:

Required Resources

Mailing Lists

Note: the Annotator community currently uses a single list hosted by Open Knowledge at: https://lists.okfn.org/mailman/listinfo/annotator-dev

Git Repository

Note: the Annotator community hosts its code on GitHub as part of the "openannotation" organization. Randall Leeds also uses GitHub for his dom-anchor-* libraries as does Rob Sanderson for his Web Annotation Protocol implementation. These are all potential code inputs to be considered for reuse or continuation by this community.

Issue Tracking

The Annotator community would prefer to continue using GitHub Issues if that is a possibility.

Other Resources

Initial Commiters

Affiliations

Sponsors

Champion

Daniel Gruno aka `humbedooh`

Nominated Mentors

Sponsoring Entity

The Incubator

AnnotatorProposal (last edited 2016-07-19 20:21:35 by BrianMcCallister)