Archive and Legacy
This section includes all Pre Nutch 1.3 material
Contents
Reference Section
Frutch Wiki -- French Nutch Wiki
The Old Wiki
Experiences with the Nutch search engine author:Doug Cutting,"Video Lecture"
Instructions for running Bixo on EC2 (includes parts of Nutch)
General Information
OldFeatures - Pre Nutch 1.3
Internal Nutch Documentation
NutchFileFormats - some notes by LarsAronsson, 30 June 2004
Development and Old Nutch 2.0
MultiLingualSupport - In development.
Nutch2Architecture -- Discussions on the Nutch 2.0 architecture (old)
JavaDemoApplication - A simple demonstration of how to use the Nutch APIin a Java application
Pre-Nutch 1.3 Plugin Resources
Nutch <1.3 Tutorials
RunningNutchAndSolr - How to configure Nutch to crawl, but post to Solr for search/index
Tutorial -- A Step-by-Step guide to getting Nutch up and running (<=1.2).
Tutorial -- A Step-by-Step installation guide for dummies: Nutch 0.9.
Nutch_-_The_Java_Search_Engine (Builds on the basic tutorials. Includes index maintenance scripts)
RunNutchInEclipse for v0.8
RunNutchInEclipse0.9 for v0.9 (Linux and Windows)
RunNutchInEclipse1.0 for v1.0 (Linux and Windows)
Configuration
Upgrading Hadoop Version in Nutch - Basic steps for upgrading Hadoop in Nutch.
Commandline options for 0.7.x
Commandline options for version 0.8
GettingNutchRunningWithUtf8 - For support of non-ASCII characters (Chinese, German, Japanese, Korean).
GettingNutchRunningWithResin - Resin is a JSP/Servlet/EJB application server (alternative to tomcat).
CreateNewFilter - for example to add a category metadata to your index and be able to search for it
Script Administration
Automating Fetches with Python - How to automatic the Nutch fetching process using Python
MonitoringNutchCrawls - techniques for keeping an eye on a nutch crawl's progress.
Crawl - script to crawl (and possible recrawl too)
IntranetRecrawl - script to recrawl a crawl
Whole-Web Crawling incremental script - crawled urls are searchable at each iteration after merging
MergeCrawl - script to merge 2 (or more) crawls