Please contribute your knowledge about Nutch here! == General Information == * [[http://www.nutch.org|Nutch Website ]] * [[Features]] * PublicServers running Nutch * [[Presentations]] on Nutch * Press [[Articles]] * [[Evaluations]] of Search Quality * [[Help_Wanted]] organizations hiring Nutch expertise * Commercial [[Support]] and developers for hire * [[Mailing]] Lists * AcademicArticles that deal with Nutch == Nutch Administration == * DownloadingNutch * HardwareRequirements * '''[[http://peterpuwang.googlepages.com/NutchGuideForDummies.htm|Tutorial]] -- Latest step by Step Installation guide for dummies: Nutch 0.9.''' * [[http://lucene.apache.org/nutch/tutorial.html|Tutorial]] -- A Step-by-Step guide to getting Nutch up and running. * NutchTutorial ''on the wiki'' * [[Nutch_-_The_Java_Search_Engine]] (Builds on the basic tutorials. Includes index maintenance scripts) * [[NutchHadoopTutorial|Nutch Hadoop Tutorial]] - How to setup Nutch and Hadoop over a cluster of machines * [[Automating_Fetches_with_Python|Automating Fetches with Python]] - How to automatic the Nutch fetching process using Python * [[Upgrading_Hadoop|Upgrading Hadoop Version in Nutch]] - Basic steps for upgrading Hadoop in Nutch. * [[FAQ]] * [[07CommandLineOptions|Commandline]] options for 0.7.x * [[08CommandLineOptions|Commandline]] options for version 0.8 * Current CommandLineOptions * OverviewDeploymentConfigs * NutchConfigurationFiles * GettingNutchRunningWithUtf8 - For support of non-ASCII characters (Chinese, German, Japanese, Korean). * GettingNutchRunningWithResin - Resin is a JSP/Servlet/EJB application server (alternative to tomcat). * GettingNutchRunningWithJetty * GettingNutchRunningWithJboss * GettingNutchRunningWithUbuntu * GettingNutchRunningWithWindows * GettingNutchRunningWithMacOsx * GettingNutchRunningWithRedHatApplicationServer * GettingNutchRunningWithDebian * GettingNutchRunningWithSocksProxy * ErrorMessages -- What they mean and suggestions for getting rid of them. * SetupProxyForNutch - using Tinyproxy on Ubuntu * CreateNewFilter - for example to add a category metadata to your index and be able to search for it * HowToMakeCustomSearch * UpgradeFrom07To08 * [[Upgrading_from_0.8.x_to_0.9]] * RunNutchInEclipse for v0.8 * [[RunNutchInEclipse0.9]] for v0.9 (Linux and Windows) * [[RunNutchInEclipse1.0]] for v1.0 (Linux and Windows) * [[Crawl]] - script to crawl (and possible recrawl too) * IntranetRecrawl - script to recrawl a crawl * MergeCrawl - script to merge 2 (or more) crawls * SearchOverMultipleIndexes - configuring nutch to enable searching over multiple indexes * CrossPlatformNutchScripts * MonitoringNutchCrawls - techniques for keeping an eye on a nutch crawl's progress. * [[Nutch_0.9_Crawl_Script_Tutorial]] * HttpAuthenticationSchemes - How to enable Nutch to authenticate itself using NTLM, Basic or Digest authentication schemes. * NonDefaultIntranetCrawlingOptions - Desirable options to add to your intranet crawling configuration. * RunningNutchAndSolr - How to configure Nutch to crawl, but post to Solr for search/index * NutchWithChineseAnalyzer - References to some Chinese articles explaining how to setup Nutch with 3rd party Chinese analyzers == Nutch Development == * [[Becoming_A_Nutch_Developer|Becoming a Nutch Developer]] - Start developing and contributing to Nutch. * PluginCentral -- How to write your own plugins and use other people's. * InternalDocumentation -- How Nutch works. * [[http://lucene.apache.org/nutch/apidocs/index.html|JavaDocs]] -- The !JavaDocs for Nutch. * [[http://lucene.apache.org/nutch/version_control.html|Nutch Version Control]] * MultiLingualSupport - ''In development''. * FixingOpicScoring - ''In planning''. * HowToContribute * TaskList -- Tasks for Nutch developers. * [[Development]] -- More tasks for Nutch developers. * [[Committer's_Rules]] -- Committers should follow these guidelines when deciding, which branch to use for committing the patches and when to commit. * [[Release_HOWTO]] * [[Website_Update_HOWTO]] * [[Image_Search_Design]] * [[NutchOSGi]] * [[StrategicGoals]] * [[IndexStructure]] * [[Getting_Started]] * JavaDemoApplication - A simple demonstration of how to use the Nutch APIin a Java application * InstallingWeb2 * ApacheConUs2009MeetUp - List of topics for !MeetUp at !ApacheCon US 2009 in Oakland (Nov 2-6) == Nutch 2.0 == * [[Nutch2Architecture]] -- Discussions on the Nutch 2.0 architecture. * [[NewScoring]] -- New stable pagerank like webgraph and link-analysis jobs. * [[NewScoringIndexingExample]] -- Two full fetch cycles of commands using new scoring and indexing systems. == Other Resources == * [[http://nutch.sourceforge.net/blog/cutting.html|Doug's Weblog]] -- He's the one who originally wrote Lucene and Nutch. * [[http://wiki.media-style.com/display/nutchDocu/Home|Stefan's Nutch Documentation]] * [[http://frutch.free.fr/wikini/|Frutch Wiki]] -- French Nutch Wiki * The [[http://nutch.sourceforge.net/cgi-bin/twiki/view/Main/Nutch|Old Wiki]] * [[Search_Theory]] Search Theory & White Papers * [[http://wiki.apache.org/nutch-data/attachments/FrontPage/attachments/Hadoop-Nutch%200.8%20Tutorial%2022-07-06%20%3CNavoni%20Roberto%3E|Tutorial Hadoop+Nutch 0.8 night build Roberto Navoni 24-07-06]] * [[http://blog.foofactory.fi/|FooFactory]] Nutch and Hadoop related posts * [[http://spinn3r.com|Spinn3r]] [[http://spinn3r.com/opensource.php|Open Source components]] (our contribution to the crawling OSS community with more to come). * [[http://www.interadvertising.co.uk/blog/nutch_logos|Larger / better quality Nutch logos]] Re-created Nutch logos available in GIF, PNG & EPS in resolutions up to 1200 x 449 * [[http://www.whelanlabs.com/content/SearchEngineManager.htm|WhelanLabs SearchEngine Manager]] An all-in-one, bundled implementation of Nutch, Tomcat, and Cygwin, and JRE for Microsoft Windows. Includes an installer and a simplified administrative UI.