Differences between revisions 4 and 5
Revision 4 as of 2014-01-07 10:42:07
Size: 2414
Comment:
Revision 5 as of 2014-01-07 13:08:55
Size: 2193
Comment:
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
[[http://gora.apache.org|Apache Gora]] is released as source code '''only''' due to changing user requirements and the fact that code may need to be compiled and recompiled in an ad-hoc fashion. This however poses a bit of a problem for Nutch'ers when they need to go and get the code. [[http://gora.apache.org|Apache Gora]] is released as source code '''only''' (as oppose to both binary and source) due to changing user requirements and the fact that code may need to be compiled and recompiled in an ad-hoc fashion. This however poses a bit of a problem for Nutch'ers when they need to go and get the code as they may need to work with Gora SNAPSHOT's.
Line 5: Line 5:
Right now this page should act as a go-to for folks lost in the minefield which is getting Nutch 2.x running with stable Gora (trunk) SNAPSHOT's. For a full account of how to work with Gora SNAPSHOT's please consult the [[http://gora.apache.org/current/quickstart.html|Gora Quickstart Guide]]
Line 7: Line 7:
'''N.B.''' Over in Gora we are in the process of integrating the provisioning of stable SNAPSHOT's to [[https://repository.apache.org/|The Apache Repository]]
so please keep your eyes on [[https://issues.apache.org/jira/browse/GORA-282|GORA-282]]. This will make it much easier to simply add SNAPSHOT's to your project build.

In the meantime, in order to use Gora SNAPSHOT's in your Nutch 2.x deployment you can follow the guide as below... admitedly it is both manual and a bit of a footer but will become easier as we put more time into getting things right.
If the above does not work for you an alternative option in order to use Gora SNAPSHOT's in your Nutch 2.x deployment can be found below. Admitedly it is both manual and a bit of a footer but will become easier as we put more time into getting things right.
Line 23: Line 20:
[0] https://issues.apache.org/jira/secure/attachment/12559893/webpage.avsc
[1] http://gora.apache.org/current/compiler.html
 [0] https://issues.apache.org/jira/secure/attachment/12559893/webpage.avsc
 [1] http://gora.apache.org/current/compiler.html

Working With Gora Snapshots

Apache Gora is released as source code only (as oppose to both binary and source) due to changing user requirements and the fact that code may need to be compiled and recompiled in an ad-hoc fashion. This however poses a bit of a problem for Nutch'ers when they need to go and get the code as they may need to work with Gora SNAPSHOT's.

For a full account of how to work with Gora SNAPSHOT's please consult the Gora Quickstart Guide

If the above does not work for you an alternative option in order to use Gora SNAPSHOT's in your Nutch 2.x deployment can be found below. Admitedly it is both manual and a bit of a footer but will become easier as we put more time into getting things right.

  1. checkout the Gora source here e.g. svn checkout https://svn.apache.org/repos/asf/gora/trunk/

  2. Build the Gora source code by navigating to $GORA_HOME and running mvn install -DskipTests=true, this will build the artifacts you require e.g. gora-core-0.4-SNAPSHOT and gora-cassandra-0.4-SNAPSHOT. You will find these in the respective module target directories

  3. Ensure that the WebPage schema you are using is the one included in [0]. This would replace the one in $NUTCH_HOME/src/gora

  4. I am assuming that you've already got the Nutch configuration sorted for gora-cassandra so there should be no further Nutch config required.
  5. Copy the gora artifacts to $NUTCH_HOME/build/lib these should replace the existing 0.3 artifacts and prevent classloading issues.
  6. Use the Nutch build.xml and invoke target generate-gora-src. Some info on the Gora Compiler can be found here [1]
  7. Check that the NEW data beans have been generated
  8. Invoke the ant job target this *should* build your new job file which you can use in Hadoop. Please check that the generated job file has 0.4-SNAPSHOT gora artifacts included...

That should be it. If you have any issues with this guide please write to user at nutch dot apache dot org

WorkingWithGoraSnapshots (last edited 2014-01-23 11:51:58 by LewisJohnMcgibbney)