Archive and Legacy
This section includes all Pre Nutch 1.3 material
Frutch Wiki -- French Nutch Wiki
The Old Wiki
Experiences with the Nutch search engine author:Doug Cutting,"Video Lecture"
Instructions for running Bixo on EC2 (includes parts of Nutch)
Internal Nutch Documentation
Development and Old Nutch 2.0
MultiLingualSupport - In development.
Nutch2Architecture -- Discussions on the Nutch 2.0 architecture (old)
JavaDemoApplication - A simple demonstration of how to use the Nutch APIin a Java application
Pre-Nutch 1.3 Plugin Resources
Nutch <1.3 Tutorials
RunningNutchAndSolr - How to configure Nutch to crawl, but post to Solr for search/index
Tutorial -- A Step-by-Step guide to getting Nutch up and running (<=1.2).
Tutorial -- A Step-by-Step installation guide for dummies: Nutch 0.9.
Nutch_-_The_Java_Search_Engine (Builds on the basic tutorials. Includes index maintenance scripts)
RunNutchInEclipse for v0.8
RunNutchInEclipse0.9 for v0.9 (Linux and Windows)
RunNutchInEclipse1.0 for v1.0 (Linux and Windows)
Upgrading Hadoop Version in Nutch - Basic steps for upgrading Hadoop in Nutch.
Commandline options for 0.7.x
Commandline options for version 0.8
GettingNutchRunningWithUtf8 - For support of non-ASCII characters (Chinese, German, Japanese, Korean).
GettingNutchRunningWithResin - Resin is a JSP/Servlet/EJB application server (alternative to tomcat).
CreateNewFilter - for example to add a category metadata to your index and be able to search for it
Automating Fetches with Python - How to automatic the Nutch fetching process using Python
MonitoringNutchCrawls - techniques for keeping an eye on a nutch crawl's progress.
Crawl - script to crawl (and possible recrawl too)
IntranetRecrawl - script to recrawl a crawl
Whole-Web Crawling incremental script - crawled urls are searchable at each iteration after merging
MergeCrawl - script to merge 2 (or more) crawls