Differences between revisions 8 and 9
Revision 8 as of 2007-03-29 12:55:53
Size: 2754
Editor: pannous
Comment:
Revision 9 as of 2009-09-20 23:09:34
Size: 2754
Editor: localhost
Comment: converted to 1.6 markup
No differences found!

===Problem (exception) / Solution pairs===


P:Not a known field name:DEFAULT

S: Add plugin

  • <property> <name>plugin.includes</name> <value>query-basic|.....

in nutch-default.xml


P: java.lang.NullPointerException at java.io.Reader.(Reader.java:61) ... at org.apache.nutch.analysis.CommonGrams.init(CommonGrams.java:152) at

S: the file common-terms.utf8 needs to be in the right directory (lib | classes?)

  • <property>

    • <name>analysis.common.terms.file</name> <value>common-terms.utf8</value> <description>The name of a file containing a list of common terms

      • that should be indexed in n-grams.

      </description>

    • </property>


P: Bad mapred.job.tracker: local

S: if you want to run crawl without hdfs you can omit start-all.sh

  • just do " nutch crawl urlsdir "


P: ... getlocalpath NullPointerException

S: check mapred.local.dir and other tmp dirs in nutch-default.xml / hadoop-default.xml


P:extension point: org.apache.nutch.net.URLNormalizer does not exist

S:check your plugins + plugin.includes settings and add urlnormalizer-regex or urlnormalizer-(pass|regex|basic) --- P:java.net.UnknownHostException "hostname"

S: add 127.0.0.1 "hostname" to the /etc/hosts file.


P: ...[null] MalformedUrlException

S: add common-terms.utf8 to nutch dir


P: java.lang.ClassCastException: org.apache.hadoop.io.Text

S: wrong hadoop version / patch http://files.pannous.de/org.rar


P:java.lang.NoSuchMethodError: org.apache.hadoop.io.MapFile $Writer.

S: wrong hadoop version / patch http://files.pannous.de/org.rar


P: NullPointerException when crawling :

S: add to nutch-site.xml: <property>

  • <name>http.agent.name</name> <value>NutchCVS</value> <description>Our HTTP 'User-Agent' request header.</description>

</property>


P: java.io.IOException: config()

S: ignore it ! ;)


P: nutch crawl ... Job Failed!

S: manifold. set log4j.properties debug level ! log4j.rootLogger=ALL, stdout log4j.appender.stdout=org.apache.log4j.ConsoleAppender


P: No scoring plugins - at least one scoring plugin is required!

S: Add "scoring-opic" to <property> <name>plugin.includes</name>


P: ... java.net.SocketTimeoutException: Accept timed out

S: try using nutch without hdfs / check ports in hadoop file / RPC problems : start crawl without startall.sh ?


P: java.lang.NoClassDefFoundError xyz on windows

S: get rid of spaces in your classpath and path variables !


solved_problems (last edited 2009-09-20 23:09:34 by localhost)