Differences between revisions 232 and 233
Revision 232 as of 2011-09-23 18:05:16
Size: 4770
Comment:
Revision 233 as of 2011-10-16 22:50:53
Size: 4924
Editor: induct3
Comment:
Deletions are marked like this. Additions are marked like this.
Line 4: Line 4:
Please contribute your knowledge about Nutch here!
<<TableOfContents(3)>>
Please contribute your knowledge about Nutch here! <<TableOfContents(3)>>
Line 11: Line 10:
Line 13: Line 13:
 *  [[http://hadoop.apache.org/common/docs/stable/|Hadoop Tutorial]] Nutch being based Hadoop, it helps to have a better understanding of Hadoop.  * [[http://hadoop.apache.org/common/docs/stable/|Hadoop Tutorial]] Nutch being based Hadoop, it helps to have a better understanding of Hadoop.
Line 16: Line 16:
 * [[IntranetDocumentSearch|Intranet Document Search]] - Index and search Microsoft Office, PDF etc documentsin a file system hierachy with a Solr backend.
Line 17: Line 19:
 * OverviewDeploymentConfigs /!\ :This full page requires a complete update to reflect Nutch 1.3 release: /!\   * OverviewDeploymentConfigs /!\ :This full page requires a complete update to reflect Nutch 1.3 release: /!\
Line 23: Line 25:
 * SetupProxyForNutch - using Tinyproxy on Ubuntu   * SetupProxyForNutch - using Tinyproxy on Ubuntu
Line 29: Line 31:
 * Current [[NutchGotchas|Nutch Gotchas]]   * Current [[NutchGotchas|Nutch Gotchas]]
Line 38: Line 40:
 * [[FAQ]]   * [[FAQ]]
Line 44: Line 46:
 * PluginCentral -- How to write your own plugins and use other people's.   * PluginCentral -- How to write your own plugins and use other people's.

Welcome to the Apache Nutch Wiki

http://www.interadvertising.co.uk/files/nutch_logo_medium.gif

Please contribute your knowledge about Nutch here!

Nutch Version 1.3 Administration

Tutorials

  • NutchTutorial - How to configure Nutch 1.3 to crawl in local mode and post to Apache Solr for search/index.

  • Hadoop Tutorial Nutch being based Hadoop, it helps to have a better understanding of Hadoop.

  • Nutch Hadoop Tutorial - How to setup and run Nutch in deploy mode over a Hadoop cluster. /!\ :This tutorial is in development: /!\

  • RunNutchInEclipse - How to configure, build, crawl and debug Nutch 1.3 within Eclipse

  • Intranet Document Search - Index and search Microsoft Office, PDF etc documentsin a file system hierachy with a Solr backend.

Configuration

General Information

Nutch Development

Nutch 2.0

Pre Nutch 1.3 and Archive

How to edit this Wiki

This Wiki is a collaborative site, anyone can contribute and share:

  • Create an account by clicking the "Login" link at the top of any page, and picking a username and password.
  • Edit any page by pressing Edit at the top or the bottom of the page

There are some conventions used on the Nutch wiki:

  • /!\ :TODO: /!\ (/!\ :TODO: /!\ ) is used to denote sections that definitely need to be cleaned up.

Some general info on using this Wiki Software:

FrontPage (last edited 2018-09-27 15:44:39 by RoannelFernandez)