Differences between revisions 2 and 4 (spanning 2 versions)
Revision 2 as of 2014-01-07 10:39:45
Size: 2405
Comment:
Revision 4 as of 2014-01-07 10:42:07
Size: 2414
Comment:
Deletions are marked like this. Additions are marked like this.
Line 12: Line 12:
1. checkout the Gora source here e.g. svn checkout https://svn.apache.org/repos/asf/gora/trunk/
2. Build the Gora source code by navigating to $GORA_HOME and running mvn install -DskipTests=true, this will build the artifacts you require e.g. gora-core-0.4-SNAPSHOT and gora-cassandra-0.4-SNAPSHOT. You will find these in the respective module target directories
3. Ensure that the WebPage schema you are using is the one included in [0]. This would replace the one in $NUTCH_HOME/src/gora
4. I am assuming that you've already got the Nutch configuration sorted for gora-cassandra so there should be no further Nutch config required.
5. Copy the gora artifacts to $NUTCH_HOME/build/lib these should replace the existing 0.3 artifacts and prevent classloading issues.
6. Use the Nutch build.xml and invoke target generate-gora-src. Some info on the GoraCompiler can be found here [1]
7. Check that the NEW data beans have been generated
8. Invoke the ant job target this *should* build your new job file which you can use in Hadoop. Please check that the generated job file has 0.4-SNAPSHOT gora artifacts included...
 1. checkout the Gora source here e.g. svn checkout https://svn.apache.org/repos/asf/gora/trunk/
 2. Build the Gora source code by navigating to $GORA_HOME and running mvn install -DskipTests=true, this will build the artifacts you require e.g. gora-core-0.4-SNAPSHOT and gora-cassandra-0.4-SNAPSHOT. You will find these in the respective module target directories
 3. Ensure that the WebPage schema you are using is the one included in [0]. This would replace the one in $NUTCH_HOME/src/gora
 4. I am assuming that you've already got the Nutch configuration sorted for gora-cassandra so there should be no further Nutch config required.
 5. Copy the gora artifacts to $NUTCH_HOME/build/lib these should replace the existing 0.3 artifacts and prevent classloading issues.
 6. Use the Nutch build.xml and invoke target generate-gora-src. Some info on the Gora Compiler can be found here [1]
 7. Check that the NEW data beans have been generated
 8. Invoke the ant job target this *should* build your new job file which you can use in Hadoop. Please check that the generated job file has 0.4-SNAPSHOT gora artifacts included...

Working With Gora Snapshots

Apache Gora is released as source code only due to changing user requirements and the fact that code may need to be compiled and recompiled in an ad-hoc fashion. This however poses a bit of a problem for Nutch'ers when they need to go and get the code.

Right now this page should act as a go-to for folks lost in the minefield which is getting Nutch 2.x running with stable Gora (trunk) SNAPSHOT's.

N.B. Over in Gora we are in the process of integrating the provisioning of stable SNAPSHOT's to The Apache Repository so please keep your eyes on GORA-282. This will make it much easier to simply add SNAPSHOT's to your project build.

In the meantime, in order to use Gora SNAPSHOT's in your Nutch 2.x deployment you can follow the guide as below... admitedly it is both manual and a bit of a footer but will become easier as we put more time into getting things right.

  1. checkout the Gora source here e.g. svn checkout https://svn.apache.org/repos/asf/gora/trunk/

  2. Build the Gora source code by navigating to $GORA_HOME and running mvn install -DskipTests=true, this will build the artifacts you require e.g. gora-core-0.4-SNAPSHOT and gora-cassandra-0.4-SNAPSHOT. You will find these in the respective module target directories

  3. Ensure that the WebPage schema you are using is the one included in [0]. This would replace the one in $NUTCH_HOME/src/gora

  4. I am assuming that you've already got the Nutch configuration sorted for gora-cassandra so there should be no further Nutch config required.
  5. Copy the gora artifacts to $NUTCH_HOME/build/lib these should replace the existing 0.3 artifacts and prevent classloading issues.
  6. Use the Nutch build.xml and invoke target generate-gora-src. Some info on the Gora Compiler can be found here [1]
  7. Check that the NEW data beans have been generated
  8. Invoke the ant job target this *should* build your new job file which you can use in Hadoop. Please check that the generated job file has 0.4-SNAPSHOT gora artifacts included...

That should be it. If you have any issues with this guide please write to user at nutch dot apache dot org

[0] https://issues.apache.org/jira/secure/attachment/12559893/webpage.avsc [1] http://gora.apache.org/current/compiler.html

WorkingWithGoraSnapshots (last edited 2014-01-23 11:51:58 by LewisJohnMcgibbney)