RunNutchInEclipse

RunNutchInEclipse

This is a work in progress. If you find errors or would like to improve this page, just create an account [UserPreferences] and start editing this page :-)

Tested with

Before you start

Setting up Nutch to run into Eclipse can be tricky, and most of the time you are much faster if you edit Nutch in Eclipse but run the scripts from the command line (my 2 cents). However, it's very useful to be able to debug Nutch in Eclipse. But again you might be quickier by looking at the logs (logs/hadoop.log)...

Steps

Install Nutch

Create a new java project in Eclipse

If you're using the trunk

As of revision 511012 there were a few plugins on the trunk and a couple other files that did not build, and are actually excluded from the ant projects. You may want to remove the following projects from the build structure:

Configure Nutch

Build Nutch

Create Eclipse launcher

org.apache.nutch.crawl.Crawl
urls -dir crawl -depth 3 -topN 50
-Dhadoop.log.dir=logs -Dhadoop.log.file=hadoop.log

Debug Nutch in Eclipse

Fetcher [line: 371] - run
Fetcher [line: 438] - fetch
Fetcher$FetcherThread [line: 149] - run()
Generator [line: 281] - generate
Generator$Selector [line: 119] - map
OutlinkExtractor [line: 111] - getOutlinks

If things do not work...

Yes, Nutch and Eclipse can be a difficult companionship sometimes ;-)

eclipse: Cannot create project content in workspace

The nutch source code must be out of the workspace folder. My first attemp was download the code with eclipse (svn) under my workspace. When I try to create the project using existing code, eclipse don't let me do it from source code into the workspace. I use the source code out of my workspace and it work fine.

plugin dir not found

Make sure you set your plugin.folders property correct, instead of using a relative path you can use a absoluth one as well in nutch-defaults.xml or may be better in nutch-site.xml

<property>
  <name>plugin.folders</name>
  <value>/home/....../nutch-0.8/src/plugin</value>

No plugins loaded during unit tests in Eclipse

During unit testing, Eclipse ignored conf/nutch-site.xml in favor of src/test/nutch-site.xml, so you might need to add the plugin directory configuration to that file as well.

Unit tests work in eclipse but fail when running ant in the command line

Suppose your unit tests work perfectly in eclipse, but each and everyone fail when running ant test in the command line - including the ones you haven't modified. Check if you defined the plugin.folders property in hadoop-site.xml. In that case, try removing it from that file and adding it directly to nutch-site.xml

Run ant test again. That should have solved the problem.

If that didn't solve the problem, are you testing a plugin? If so, did you add the plugin to the list of packages in plugin\build.xml, on the test target?

classNotFound

missing org.farng and com.etranslate

You may have problems with some imports in parse-mp3 and parse-rtf plugins. Because of incompatibility with apache licence they were left from sources. You can find it here:

[WWW] http://nutch.cvs.sourceforge.net/nutch/nutch/src/plugin/parse-mp3/lib/

[WWW] http://nutch.cvs.sourceforge.net/nutch/nutch/src/plugin/parse-rtf/lib/

You need to copy jar files into plugin "lib" path and refresh the project.

debugging hadoop classes

Original credits: RenaudRichardet

last edited 2007-03-02 17:49:37 by RicardoJMendez