Please contribute your knowledge about Nutch here!
General Information
PublicServers running Nutch
Presentations on Nutch
Press Articles
Evaluations of Search Quality
Help Wanted organizations hiring Nutch expertise
Commercial Support and developers for hire
Mailing Lists
AcademicArticles that deal with Nutch
Nutch Administration
Tutorial -- Latest step by Step Installation guide for dummies: Nutch 0.9.
Tutorial -- A Step-by-Step guide to getting Nutch up and running. NutchTutorial on the wiki
Nutch - The Java Search Engine (Builds on the basic tutorials. Includes index maintenance scripts)
Nutch Hadoop Tutorial - How to setup Nutch and Hadoop over a cluster of machines
Automating Fetches with Python - How to automatic the Nutch fetching process using Python
Upgrading Hadoop Version in Nutch - Basic steps for upgrading Hadoop in Nutch.
Commandline options for 0.7.x
Commandline options for version 0.8
GettingNutchRunningWithUtf8 - For support of non-ASCII characters (Chinese, German, Japanese, Korean).
GettingNutchRunningWithResin - Resin is a JSP/Servlet/EJB application server (alternative to tomcat).
ErrorMessages -- What they mean and suggestions for getting rid of them.
SetupProxyForNutch - using Tinyproxy on Ubuntu
CreateNewFilter - for example to add a category metadata to your index and be able to search for it
RunNutchInEclipse for v0.8
RunNutchInEclipse0.9 for v0.9
Crawl - script to crawl (and possible recrawl too)
IntranetRecrawl - script to recrawl a crawl
MergeCrawl - script to merge 2 (or more) crawls
SearchOverMultipleIndexes - configuring nutch to enable searching over multiple indexes
MonitoringNutchCrawls - techniques for keeping an eye on a nutch crawl's progress.
HttpAuthenticationSchemes - How to enable Nutch to authenticate itself using NTLM, Basic or Digest authentication schemes.
NonDefaultIntranetCrawlingOptions - Desirable options to add to your intranet crawling configuration.
RunningNutchAndSolr - How to configure Nutch to crawl, but post to Solr for search/index
Nutch Development
Becoming a Nutch Developer - Start developing and contributing to Nutch.
PluginCentral -- How to write your own plugins and use other people's.
InternalDocumentation -- How Nutch works.
JavaDocs -- The JavaDocs for Nutch. MultiLingualSupport - In development.
FixingOpicScoring - In planning.
TaskList -- Tasks for Nutch developers.
Development -- More tasks for Nutch developers.
Committer's_Rules -- Committers should follow these guidelines when deciding, which branch to use for committing the patches and when to commit.
JavaDemoApplication - A simple demonstration of how to use the Nutch APIin a Java application
Nutch 2.0
Nutch2Architecture -- Discussions on the Nutch 2.0 architecture.
Other Resources
Doug's Weblog -- He's the one who originally wrote Lucene and Nutch.
Frutch Wiki -- French Nutch Wiki The
Old Wiki Search_Theory Search Theory & White Papers
Tutorial Hadoop+Nutch 0.8 night build Roberto Navoni 24-07-06
FooFactory Nutch and Hadoop related posts
Spinn3r
Open Source components (our contribution to the crawling OSS community with more to come).
Larger / better quality Nutch logos Re-created Nutch logos available in GIF, PNG & EPS in resolutions up to 1200 x 449