Gora is an ORM framework for column stores such as Apache HBase and Apache Cassandra with a specific focus on Hadoop.
Although there are various excellent ORM frameworks for relational databases, data modeling in NoSQL data stores differ profoundly from their relational cousins. Moreover, data-model agnostic frameworks such as JDO are not sufficient for use cases, where one needs to use the full power of the data models in column stores. Gora fills this gap by giving the user an easy-to-use ORM framework with data store specific mappings and built in Apache Hadoop support.
The overall goal for Gora is to become the standard data representation and persistence framework for big data. The roadmap of Gora can be grouped as follows.
ORM stands for Object Relation Mapping. It is a technology which abstacts the persistency layer (mostly Relational Databases) so that plain domain level objects can be used, without the cumbersome effort to save/load the data to and from the database. Gora differs from current solutions in that:
ORM frameworks are nothing new. But with the explosion of data generated in Terabytes and even Petabytes, NoSQL data stores are gaining ever-increasing popularity. Coupled with limited support to already-proven Apache Hadoop support in current ORM frameworks, there was a need for a new project.
Gora is currently hosted at Github. However, Gora has ties to ASF in many ways. As detailed in the proposal section, Gora will be a high level client for many Apache projects and subprojects including Hadoop(common, hdfs, and mapreduce), HBase, Cassandra, Avro, Lucene, Solr, Pig, and Hive. Gora already uses Hadoop, HBase, Cassandra and Avro. Moreover, Gora started its life inside Apache Nutch project, and now Nutch trunk uses Gora as a library. Even more, the initial set of committers are all ASF members. Therefore, we think that Apache will be an excellent home for Gora.
Initial goals for Gora can be summarized as:
Current commit rights belong to the initial list of committers four of who are also ASF members. All the developers have extensive experience with Apache projects. We honor the meritocracy policy of ASF foundation.
Gora’s community mostly overlap with that of Nutch, Hadoop, HBase, Avro and Cassandra. We have a small community for now (5 initial committers, 18 people tracking the project at Github), but have been piggybacking the Nutch community for a while. If Gora is accepted to Apache Incubator, we expect more traction. Moreover, with the increasing popularity of NoSQL databases, we expect more users.
Gora was started by the initial code base inside Apache Nutch by Doğacan Güney. Then Enis Söztutar has refactored and re-architected the project out of Nutch. Later Julien Nioche, Andrzej Bialecki and Doğacan has ported Nutch to use the newly formed project. Later, Sertan Alkan has joined. Doğacan and Julien are Nutch PMC members, Andrzej is the Nutch PMC chair. Enis is an Apache Hadoop PMC member.
As discusssed in the second paragraph of Rationale Section, all of the current developers are Apache people, and four of them are PMC members, which shows that we have some experience with the Apache way. Moreover, Gora is tightly related with lots of Apache projects, Nutch, Hadoop, HBase, Cassandra, Avro, Pig, Hive, Lucene to name a few. Gora has started its life inside Nutch, and now nutch trunk uses Gora to persist web crawl data to HBase, Cassandra and MySQL, which means that Gora is a very critical component in Nutch.
Most of the development depends on Enis and Doğacan for now. Both of them intent to continue Gora development. However, we also acknowledge that more core developers are needed for the project to be truly successful. The general strategy to acquire more developers will be to acquire more users, and encourage users to be active in the community and develop patches. Moreover, the next release of Nutch planned before the end of 2010 has extensive Gora support. We expect more interest from Nutch community, and we will continue to announce Gora notifications at Hadoop,HBase and Cassandra mailing lists.
We believe that all of the developers have extensive open source experience. Four of the initial committers are apache members. The codebase is also open source since April 2010. We also have some documentation, wiki pages, issue tracker and dev mailing list.
We have a semi-distributed development environment where Doğacan, Enis and Sertan share the same office, but Andrzej and Julien are independent. With the aim of acquiring more developers, we expect more heterogeneous development.
Gora development have been supported by ant.com search engine as contract work. It is expected that this contract will continue in the future. However, even without sponsors, we are commited to continue on Gora development, since we believe in the technology it brings and it’s vital role in Nutch, and our other closed sourced projects.
Gora will be tightly related to lots of Apache projects:
Gora is a natural fit for Apache due to it's current commiters and depending projects.
The initial source was developed as a patch to the Apache Nutch project. But the storage abstraction layer was orthogonal to the web crawler, and we decided to extract it to a separate project with much wider goals. Thus Gora, as a project, was born. The initial code is developed by Enis and Dogacan with ant.com’s sponsorship.
The code can be found at http://github.com/enis/gora/.
External dependencies excluding Apache projects are as follows
We need a wiki at http://wiki.apache.org. Currently, we have a wiki at Github, Since there is not a lot of pages there, we can manually move the pages to the wiki at wiki.apache.org.
Name |
Affiliation |
Timezone |
|||
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="8916d453-42da-4145-b579-07a8ef803b3d"><ac:plain-text-body><![CDATA[ |
Enis Söztutar |
enis [at] apache.org |
Konneka |
+3 |
]]></ac:plain-text-body></ac:structured-macro> |
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="9fb9b1a6-dbc7-48dd-a6b8-7c453afd8c09"><ac:plain-text-body><![CDATA[ |
Doğacan Güney |
dogacan [at] apache.org |
Konneka |
+3 |
]]></ac:plain-text-body></ac:structured-macro> |
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="89fb6e7d-6f0d-4540-afb8-485f5f7ca0d8"><ac:plain-text-body><![CDATA[ |
Sertan Alkan |
sertanalkan [at] gmail.com |
Konneka |
+3 |
]]></ac:plain-text-body></ac:structured-macro> |
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="9e216d96-e8d3-40dc-96d4-823ec3680d54"><ac:plain-text-body><![CDATA[ |
Julien Nioche |
jnioche [at] apache.org |
[DigitalPebble] |
+1 |
]]></ac:plain-text-body></ac:structured-macro> |
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="336b57b8-6a86-441f-bc45-af83179a4263"><ac:plain-text-body><![CDATA[ |
Andrzej Bialecki |
ab [at] apache.org |
Sigram |
|
]]></ac:plain-text-body></ac:structured-macro> |
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="37462f77-2efa-4bf5-bf1e-6a57a3d8e16a"><ac:plain-text-body><![CDATA[ |
Andrew Hart |
ahart [at] apache.org |
NASA JPL |
-8 |
]]></ac:plain-text-body></ac:structured-macro> |
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="ad6a10f0-e53b-4271-97f8-5a69dec7eba8"><ac:plain-text-body><![CDATA[ |
Dave Woollard |
woollard [at] apache.org |
NASA JPL |
-8 |
]]></ac:plain-text-body></ac:structured-macro> |
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="6803686a-2765-4e76-aab6-776da7e22863"><ac:plain-text-body><![CDATA[ |
Henry Saputra |
hsaputra [at] apache.org |
Yahoo! |
-8 |
]]></ac:plain-text-body></ac:structured-macro> |
All of the parties are affiliated with companies and organizations that are familiar with the development of open source . Most of the original Gora development was sponsored by ant.com, however we expect that the amount of volunteer work will increase, and more developers will come on board.
Apache Incubator. Successful graduation can result in either being a TLP, or a subproject of Hadoop, since most of the community is projected to overlap.