Please contribute your knowledge about Nutch here!
General Information
PublicServers running Nutch
Presentations on Nutch
Press Articles
Evaluations of Search Quality
Help_Wanted organizations hiring Nutch expertise
Commercial Support and developers for hire
Mailing Lists
AcademicArticles that deal with Nutch
Nutch Administration
Tutorial -- Latest step by Step Installation guide for dummies: Nutch 0.9.
Tutorial -- A Step-by-Step guide to getting Nutch up and running.
NutchTutorial on the wiki
Nutch_-_The_Java_Search_Engine (Builds on the basic tutorials. Includes index maintenance scripts)
Nutch Hadoop Tutorial - How to setup Nutch and Hadoop over a cluster of machines
Automating Fetches with Python - How to automatic the Nutch fetching process using Python
Upgrading Hadoop Version in Nutch - Basic steps for upgrading Hadoop in Nutch.
Commandline options for 0.7.x
Commandline options for version 0.8
Current CommandLineOptions
GettingNutchRunningWithUtf8 - For support of non-ASCII characters (Chinese, German, Japanese, Korean).
GettingNutchRunningWithResin - Resin is a JSP/Servlet/EJB application server (alternative to tomcat).
ErrorMessages -- What they mean and suggestions for getting rid of them.
SetupProxyForNutch - using Tinyproxy on Ubuntu
CreateNewFilter - for example to add a category metadata to your index and be able to search for it
RunNutchInEclipse for v0.8
RunNutchInEclipse0.9 for v0.9 (Linux and Windows)
RunNutchInEclipse1.0 for v1.0 (Linux and Windows)
Crawl - script to crawl (and possible recrawl too)
IntranetRecrawl - script to recrawl a crawl
MergeCrawl - script to merge 2 (or more) crawls
SearchOverMultipleIndexes - configuring nutch to enable searching over multiple indexes
MonitoringNutchCrawls - techniques for keeping an eye on a nutch crawl's progress.
HttpAuthenticationSchemes - How to enable Nutch to authenticate itself using NTLM, Basic or Digest authentication schemes.
NonDefaultIntranetCrawlingOptions - Desirable options to add to your intranet crawling configuration.
RunningNutchAndSolr - How to configure Nutch to crawl, but post to Solr for search/index
NutchWithChineseAnalyzer - References to some Chinese articles explaining how to setup Nutch with 3rd party Chinese analyzers
Nutch Development
Becoming a Nutch Developer - Start developing and contributing to Nutch.
PluginCentral -- How to write your own plugins and use other people's.
InternalDocumentation -- How Nutch works.
JavaDocs -- The JavaDocs for Nutch.
MultiLingualSupport - In development.
FixingOpicScoring - In planning.
TaskList -- Tasks for Nutch developers.
Development -- More tasks for Nutch developers.
Committer's_Rules -- Committers should follow these guidelines when deciding, which branch to use for committing the patches and when to commit.
JavaDemoApplication - A simple demonstration of how to use the Nutch APIin a Java application
ApacheConUs2009MeetUp - List of topics for MeetUp at ApacheCon US 2009 in Oakland (Nov 2-6)
Nutch 2.0
Nutch2Architecture -- Discussions on the Nutch 2.0 architecture.
NewScoring -- New stable pagerank like webgraph and link-analysis jobs.
NewScoringIndexingExample -- Two full fetch cycles of commands using new scoring and indexing systems.
Other Resources
Doug's Weblog -- He's the one who originally wrote Lucene and Nutch.
Frutch Wiki -- French Nutch Wiki
The Old Wiki
Search_Theory Search Theory & White Papers
Tutorial Hadoop+Nutch 0.8 night build Roberto Navoni 24-07-06
FooFactory Nutch and Hadoop related posts
Spinn3r Open Source components (our contribution to the crawling OSS community with more to come).
Larger / better quality Nutch logos Re-created Nutch logos available in GIF, PNG & EPS in resolutions up to 1200 x 449
WhelanLabs SearchEngine Manager An all-in-one, bundled implementation of Nutch, Tomcat, and Cygwin, and JRE for Microsoft Windows. Includes an installer and a simplified administrative UI.