Differences between revisions 219 and 220
Revision 219 as of 2011-08-03 10:58:05
Size: 5027
Comment:
Revision 220 as of 2011-08-03 22:08:04
Size: 4988
Comment:
Deletions are marked like this. Additions are marked like this.
Line 68: Line 68:
 * ErrorMessagesInNutch2 -- What they mean and suggestions for getting rid of them. /!\ :This page is in construction: /!\  * ErrorMessagesInNutch2 -- What they mean and suggestions for getting rid of them.

Welcome to the Apache Nutch Wiki

http://www.interadvertising.co.uk/files/nutch_logo_medium.gif

Please contribute your knowledge about Nutch here!

Nutch Version 1.3 Administration

Tutorials

  • RunningNutchAndSolr - How to configure Nutch 1.3 to crawl in local mode and post to Apache Solr for search/index.

  • RunningNutchInDeployMode - How to configure Nutch 1.3 to crawl in deploy mode. /!\ :TODO:This tutorial is in construction. /!\

  • Hadoop Tutorial Nutch being based Hadoop, it helps to have a better understanding of Hadoop.

  • RunNutchInEclipse - How to configure, build, crawl and debug Nutch 1.3 within Eclipse

Configuration

  • OverviewDeploymentConfigs /!\ :This full page requires a complete update to reflect Nutch 1.3 release: /!\

  • NutchConfigurationFiles

  • HttpAuthenticationSchemes - How to enable Nutch to authenticate itself using NTLM, Basic or Digest authentication schemes.

  • NonDefaultIntranetCrawlingOptions - Desirable options to add to your Nutch 1.3 intranet crawling configuration.

  • OptimizingCrawls - How to optimize your crawling/fetching speed with Nutch.

  • ErrorMessages -- What they mean and suggestions for getting rid of them. /!\ :This requires extensive updating to reflect Nutch 1.3. In addition the legacy indexing and searching material should be archived. We also need to create a similar page for Nutch 2.0 as the errors are different in nature as are the solutions required to fix them. /!\

  • SetupProxyForNutch - using Tinyproxy on Ubuntu

General Information

Nutch Development

Nutch 2.0

Pre Nutch 1.3 and Archive

How to edit this Wiki

This Wiki is a collaborative site, anyone can contribute and share:

  • Create an account by clicking the "Login" link at the top of any page, and picking a username and password.
  • Edit any page by pressing Edit at the top or the bottom of the page

There are some conventions used on the Nutch wiki:

  • /!\ :TODO: /!\ (/!\ :TODO: /!\ ) is used to denote sections that definitely need to be cleaned up.

Some general info on using this Wiki Software:

FrontPage (last edited 2018-09-27 15:44:39 by RoannelFernandez)