Nutch 0.9 is delivered with hadoop hdfs deactivated by default, so this article does not apply.

To start with nutch 0.8 or to crawl intranets it might be a good idea to run nutch the simple way: without the hadoop distributed virtual filesystem.

The easiest way to achieve this is to edit your hadoop-default.xml file:

<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.fs.LocalFileSystem</value>
<!-<value>org.apache.hadoop.dfs.DistributedFileSystem</value>->
</property>

  • No labels