Differences between revisions 5 and 6
Revision 5 as of 2014-01-07 13:08:55
Size: 2193
Comment:
Revision 6 as of 2014-01-23 11:51:58
Size: 1570
Comment:
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
For a full account of how to work with Gora SNAPSHOT's please consult the [[http://gora.apache.org/current/quickstart.html|Gora Quickstart Guide]] For an official account of how to work with Gora SNAPSHOT's in your projects please consult the [[http://gora.apache.org/current/quickstart.html|Gora Quickstart Guide]]
Line 7: Line 7:
If the above does not work for you an alternative option in order to use Gora SNAPSHOT's in your Nutch 2.x deployment can be found below. Admitedly it is both manual and a bit of a footer but will become easier as we put more time into getting things right. Since [[https://issues.apache.org/jira/browse/NUTCH-1696|NUTCH-1696]] was committed it is a piece of cake to work with Gora SNAPSHOT's within Nutch 2.x. Simply read the following
Line 9: Line 9:
 1. checkout the Gora source here e.g. svn checkout https://svn.apache.org/repos/asf/gora/trunk/
 2. Build the Gora source code by navigating to $GORA_HOME and running mvn install -DskipTests=true, this will build the artifacts you require e.g. gora-core-0.4-SNAPSHOT and gora-cassandra-0.4-SNAPSHOT. You will find these in the respective module target directories
 3. Ensure that the WebPage schema you are using is the one included in [0]. This would replace the one in $NUTCH_HOME/src/gora
 4. I am assuming that you've already got the Nutch configuration sorted for gora-cassandra so there should be no further Nutch config required.
 5. Copy the gora artifacts to $NUTCH_HOME/build/lib these should replace the existing 0.3 artifacts and prevent classloading issues.
 6. Use the Nutch build.xml and invoke target generate-gora-src. Some info on the Gora Compiler can be found here [1]
 7. Check that the NEW data beans have been generated
 8. Invoke the ant job target this *should* build your new job file which you can use in Hadoop. Please check that the generated job file has 0.4-SNAPSHOT gora artifacts included...
 1. Open $NUTCH_HOME/ivy/ivy.xml and look for the Gora dependencies
 2. Change the dependency configuration from this
Line 18: Line 12:
That should be it. If you have any issues with this guide please write to user at nutch dot apache dot org {{{<dependency org="org.apache.gora" name="gora-core" rev="0.3" conf="*->default"/> }}}

to this

{{{<dependency org="org.apache.gora" name="gora-core" rev="0.4-SNAPSHOT" conf="*->default" changing="true"/>}}}

You'll now be able to work with Gora SNAPSHOT's.

'''N.B.''' It should be noted that from time to time Avro schema's may change and require re-compilation of persistent classes. This is OK, simply run ''' ant generate-gora-src''' from $NUTCH_HOME and watch the [[http://gora.apache.org/current/compiler.html|GoraCompiler]] do it's work.

See below for an additional schema resource.
Line 21: Line 25:
 [1] http://gora.apache.org/current/compiler.html

Working With Gora Snapshots

Apache Gora is released as source code only (as oppose to both binary and source) due to changing user requirements and the fact that code may need to be compiled and recompiled in an ad-hoc fashion. This however poses a bit of a problem for Nutch'ers when they need to go and get the code as they may need to work with Gora SNAPSHOT's.

For an official account of how to work with Gora SNAPSHOT's in your projects please consult the Gora Quickstart Guide

Since NUTCH-1696 was committed it is a piece of cake to work with Gora SNAPSHOT's within Nutch 2.x. Simply read the following

  1. Open $NUTCH_HOME/ivy/ivy.xml and look for the Gora dependencies
  2. Change the dependency configuration from this

<dependency org="org.apache.gora" name="gora-core" rev="0.3" conf="*->default"/> 

to this

<dependency org="org.apache.gora" name="gora-core" rev="0.4-SNAPSHOT" conf="*->default" changing="true"/>

You'll now be able to work with Gora SNAPSHOT's.

N.B. It should be noted that from time to time Avro schema's may change and require re-compilation of persistent classes. This is OK, simply run ant generate-gora-src from $NUTCH_HOME and watch the GoraCompiler do it's work.

See below for an additional schema resource.

WorkingWithGoraSnapshots (last edited 2014-01-23 11:51:58 by LewisJohnMcgibbney)