|Deletions are marked like this.||Additions are marked like this.|
|Line 49:||Line 49:|
|* TaskList -- Tasks for Nutch developers.||* TaskList -- Tasks for Nutch developers. /!\ :Severe update required: /!\|
|Line 56:||Line 56:|
|* IndexStructure||* IndexStructure /!\ :This page needs a slight update to provide more information on plugins and the data they send to Solr for indexing: /!\|
Welcome to the Apache Nutch Wiki
Please contribute your knowledge about Nutch here!
Nutch Version 1.3 Administration
RunningNutchAndSolr - How to configure Nutch 1.3 to crawl in local mode and post to Apache Solr for search/index.
RunningNutchInDeployMode - How to configure Nutch 1.3 to crawl in deploy mode. :TODO:This tutorial is in construction.
Hadoop Tutorial Nutch being based Hadoop, it helps to have a better understanding of Hadoop.
RunNutchInEclipse - How to configure, build, crawl and debug Nutch 1.3 within Eclipse
OverviewDeploymentConfigs :This full page requires a complete update to reflect Nutch 1.3 release:
HttpAuthenticationSchemes - How to enable Nutch to authenticate itself using NTLM, Basic or Digest authentication schemes.
NonDefaultIntranetCrawlingOptions - Desirable options to add to your Nutch 1.3 intranet crawling configuration.
OptimizingCrawls - How to optimize your crawling/fetching speed with Nutch.
ErrorMessages -- What they mean and suggestions for getting rid of them. :This requires extensive updating to reflect Nutch 1.3. In addition the legacy indexing and searching material should be archived.
SetupProxyForNutch - using Tinyproxy on Ubuntu
Features :TODO:This needs to be completely overhauled to reflect Nutch 1.3 features.
Current Nutch Gotchas
PublicServers running Nutch
Presentations on Nutch
Evaluations of Search Quality
Help_Wanted organizations hiring Nutch expertise
Commercial Support and developers for hire
AcademicArticles that deal with Nutch
FAQ :The Indexing and Searching section require update/archive to reflect new 1.3 release:
Becoming a Nutch Developer - Start developing and contributing to Nutch.
PluginCentral -- How to write your own plugins and use other people's.
InternalDocumentation -- How Nutch works.
MultiLingualSupport - In development.
FixingOpicScoring - In planning.
TaskList -- Tasks for Nutch developers. :Severe update required:
Committer's_Rules -- Committers should follow these guidelines when deciding, which branch to use for committing the patches and when to commit.
IndexStructure :This page needs a slight update to provide more information on plugins and the data they send to Solr for indexing:
ApacheConUs2009MeetUp - List of topics for MeetUp at ApacheCon US 2009 in Oakland (Nov 2-6)
Nutch2Roadmap -- Discussions on the architecture and features of Nutch 2.0
NewScoring -- New stable pagerank like webgraph and link-analysis jobs.
NewScoringIndexingExample -- Two full fetch cycles of commands using new scoring and indexing systems.
GORA_HBase -- Configuring Nutch 2.0 with GORA and HBASE
Build Nutch 2.0 in Eclipse -- How to setup your IDE environment comfortably.
ErrorMessagesInNutch2 -- What they mean and suggestions for getting rid of them.
Pre Nutch 1.3 and Archive
How to edit this Wiki
This Wiki is a collaborative site, anyone can contribute and share:
- Create an account by clicking the "Login" link at the top of any page, and picking a username and password.
Edit any page by pressing Edit at the top or the bottom of the page
There are some conventions used on the Nutch wiki:
:TODO: (/!\ :TODO: /!\ ) is used to denote sections that definitely need to be cleaned up.
Some general info on using this Wiki Software: