Differences between revisions 1 and 2
Revision 1 as of 2011-08-02 20:57:50
Size: 320
Comment:
Revision 2 as of 2011-08-03 21:32:29
Size: 3325
Comment:
Deletions are marked like this. Additions are marked like this.
Line 4: Line 4:

<<TableOfContents(3)>>

== Nutch 2.0 and Apache Cassandra ==
When trying to configure Nutch (running in distributed mode on Cloudera's CDH3) with Cassandra as the Gora storage mechanism, the following NoSuchMethodError results when attempting to inject the crawldb with a seed list.
{{{
Caused by: java.lang.NoSuchMethodError:
org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
        at org.apache.cassandra.thrift.CfDef.<clinit>(CfDef.java:299)
        at org.apache.cassandra.thrift.KsDef.read(KsDef.java:753)
        at
org.apache.cassandra.thrift.Cassandra$describe_keyspace_result.read(Cassandra.java:24338)
        at
org.apache.cassandra.thrift.Cassandra$Client.recv_describe_keyspace(Cassandra.java:1371)
        at
org.apache.cassandra.thrift.Cassandra$Client.describe_keyspace(Cassandra.java:1346)
        at
me.prettyprint.cassandra.service.AbstractCluster$4.execute(AbstractCluster.java:192)
        at
me.prettyprint.cassandra.service.AbstractCluster$4.execute(AbstractCluster.java:187)
        at
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)
        at
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:232)
        at
me.prettyprint.cassandra.service.AbstractCluster.describeKeyspace(AbstractCluster.java:201)
        at
org.apache.gora.cassandra.store.CassandraClient.checkKeyspace(CassandraClient.java:82)
        at
org.apache.gora.cassandra.store.CassandraClient.init(CassandraClient.java:69)
        at
org.apache.gora.cassandra.store.CassandraStore.<init>(CassandraStore.java:68)
        ... 18 more
}}}

When using different Gora storage mechanisms we have to manually tweak the Nutch Ivy configuration depending on the choice of Gora store, in this case Cassandra.

This is what was added to ivy/ivy.xml:
{{{
<dependency org="org.apache.gora" name="gora-cassandra" rev="0.2-incubating" conf="*->compile"/>
<dependency org="org.apache.cassandra" name="cassandra-thrift" rev="0.8.1"/>
<dependency org="com.ecyrd.speed4j" name="speed4j" rev="0.9" conf="*->*,!javadoc,!sources"/>
<dependency org="com.github.stephenc.high-scale-lib" name="high-scale-lib" rev="1.1.2" conf="*->*,!javadoc,!sources"/>
<dependency org="com.google.collections" name="google-collections" rev="1.0" conf="*->*,!javadoc,!sources"/>
<dependency org="com.google.guava" name="guava" rev="r09" conf="*->*,!javadoc,!sources"/>
<dependency org="org.apache.cassandra" name="apache-cassandra" rev="0.8.1"/>
<dependency org="me.prettyprint" name="hector-core" rev="0.8.0-2"/>
}}}

In this particular case it was mentioned that Cloudera CDH3 was being used. It has a hue plugins jar with an older thrift library in it, therefore removing this jar from the classpath resolved further errors with running Nutch in distributed mode.

Correspondence on this error can be seen in context [[http://www.mail-archive.com/dev%40nutch.apache.org/msg03482.html|here]]

Error Messages in Nutch 2.0

This page acts as a repository for potential error messages you might experience whilst using Nutch 2.0. It will most likely be dynamic in nature due to the variety of additional software projects which can be combined with Nutch 2.0 and the potential for errors which this presents.

Nutch 2.0 and Apache Cassandra

When trying to configure Nutch (running in distributed mode on Cloudera's CDH3) with Cassandra as the Gora storage mechanism, the following NoSuchMethodError results when attempting to inject the crawldb with a seed list.

Caused by: java.lang.NoSuchMethodError: 
org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
        at org.apache.cassandra.thrift.CfDef.<clinit>(CfDef.java:299)
        at org.apache.cassandra.thrift.KsDef.read(KsDef.java:753)
        at 
org.apache.cassandra.thrift.Cassandra$describe_keyspace_result.read(Cassandra.java:24338)
        at 
org.apache.cassandra.thrift.Cassandra$Client.recv_describe_keyspace(Cassandra.java:1371)
        at 
org.apache.cassandra.thrift.Cassandra$Client.describe_keyspace(Cassandra.java:1346)
        at 
me.prettyprint.cassandra.service.AbstractCluster$4.execute(AbstractCluster.java:192)
        at 
me.prettyprint.cassandra.service.AbstractCluster$4.execute(AbstractCluster.java:187)
        at 
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)
        at 
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:232)
        at 
me.prettyprint.cassandra.service.AbstractCluster.describeKeyspace(AbstractCluster.java:201)
        at 
org.apache.gora.cassandra.store.CassandraClient.checkKeyspace(CassandraClient.java:82)
        at 
org.apache.gora.cassandra.store.CassandraClient.init(CassandraClient.java:69)
        at 
org.apache.gora.cassandra.store.CassandraStore.<init>(CassandraStore.java:68)
        ... 18 more

When using different Gora storage mechanisms we have to manually tweak the Nutch Ivy configuration depending on the choice of Gora store, in this case Cassandra.

This is what was added to ivy/ivy.xml:

<dependency org="org.apache.gora" name="gora-cassandra" rev="0.2-incubating" conf="*->compile"/>
<dependency org="org.apache.cassandra" name="cassandra-thrift" rev="0.8.1"/>
<dependency org="com.ecyrd.speed4j" name="speed4j" rev="0.9" conf="*->*,!javadoc,!sources"/>
<dependency org="com.github.stephenc.high-scale-lib" name="high-scale-lib" rev="1.1.2" conf="*->*,!javadoc,!sources"/>
<dependency org="com.google.collections" name="google-collections" rev="1.0" conf="*->*,!javadoc,!sources"/>
<dependency org="com.google.guava" name="guava" rev="r09" conf="*->*,!javadoc,!sources"/>
<dependency org="org.apache.cassandra" name="apache-cassandra" rev="0.8.1"/>
<dependency org="me.prettyprint" name="hector-core" rev="0.8.0-2"/>

In this particular case it was mentioned that Cloudera CDH3 was being used. It has a hue plugins jar with an older thrift library in it, therefore removing this jar from the classpath resolved further errors with running Nutch in distributed mode.

Correspondence on this error can be seen in context here

ErrorMessagesInNutch2 (last edited 2013-04-27 00:14:01 by LewisJohnMcgibbney)