add some vedio resource
|Deletions are marked like this.||Additions are marked like this.|
|Line 23:||Line 23:|
| * [[http://user.qzone.qq.com/281032878/blog/1364233492|Chinese Video Tutorial: Nutch Relevant Framework]] - The first free video for Nutch in China.
* [[http://user.qzone.qq.com/281032878/blog/1362131478|Chinese Setting Up And Use Tutorial]] - The best guide of how to setting up and use nutch relevant framework in China.
|Line 29:||Line 32:|
|Line 81:||Line 85:|
|* [[NutchConfigurationFiles-2.x]] -- Configuration files that are specific to Nutch-2.x|
Welcome to the Apache Nutch Wiki
Please contribute your knowledge about Nutch here!
Nutch Version Administration
Current CommandLineOptions: Command line options for 1.X and 2.X
JavaDocs -- The JavaDocs for the most recent Nutch-1.X release.
JavaDocs -- The JavaDocs for the most recent Nutch-2.X release.
Nutch 1.X tutorial(s)
NutchTutorial - How to configure Nutch to crawl in local mode and post to Apache Solr for search/index.
Nutch 2.X tutorial(s)
Nutch2Tutorial -- How to get Nutch 2.X to use HBase as persistence layer for Gora
Setting up Nutch 2.0 with MySQL to handle UTF-8 - A step-by-step tutorial
Accumulo, Nutch, and Gora - A step-by-step tutorial
Chinese Video Tutorial: Nutch Relevant Framework - The first free video for Nutch in China.
Chinese Setting Up And Use Tutorial - The best guide of how to setting up and use nutch relevant framework in China.
Hadoop Tutorial Nutch being based Hadoop, it helps to have a better understanding of Hadoop.
Nutch Hadoop Tutorial - How to setup and run Nutch in deploy mode over a Hadoop cluster.
RunNutchInEclipse - How to configure, build, crawl and debug Nutch within Eclipse
Intranet Document Search - Index and search Microsoft Office, PDF etc. documents in a file system hierarchy with a Solr backend.
Recrawling with Nutch - How to re-crawl with Nutch.
Ajax-Solr Tutorial: Nutch - Quick and easy guide to getting a nice UI on top of your Nutch crawl data.
OverviewDeploymentConfigs :This full page requires a complete update to reflect recent Nutch releases:
NutchConfigurationFiles: An overview from Nutch developers.
NutchPropertiesCompleteList: A fine grained account of all Nutch property configuration.
HttpAuthenticationSchemes - How to enable Nutch to authenticate itself using NTLM, Basic or Digest authentication schemes.
NonDefaultIntranetCrawlingOptions - Desirable options to add to your Nutch intranet crawling configuration.
OptimizingCrawls - How to optimise your crawling/fetching speed with Nutch.
ErrorMessages -- What they mean and suggestions for getting rid of them. :This requires extensive updating to reflect recent Nutch releases. In addition the legacy indexing and searching material should be archived.
SetupProxyForNutch - using Tinyproxy on Ubuntu
IndexStructure :This page needs a slight update to provide more information on plugins and the data they send to Solr for indexing:
Features :TODO:This needs to be completely overhauled to reflect recent Nutch features.
Current Nutch Gotchas
PublicServers running Nutch
Presentations on Nutch
Evaluations of Search Quality
Commercial Support and developers for hire
AcademicArticles that deal with Nutch
Becoming a Nutch Developer - Start developing and contributing to Nutch.
PluginCentral -- How to write your own plugins and use other people's.
InternalDocumentation -- How Nutch works.
FixingOpicScoring - In planning.
TaskList -- Tasks for Nutch developers. :Severe update required:
Committer's_Rules -- Committers should follow these guidelines when deciding, which branch to use for committing the patches and when to commit.
NutchMeetUps - Records of previous Nutch community meetup, hackathons, barcamps etc.
Nutch2Crawling - A description of the crawling jobs and field to database mappings.
Nutch2Architecture - A high level overview of the new architecture and design
Nutch2Roadmap -- Discussions on the architecture and features of Nutch 2.0
NewScoring -- New stable pagerank like webgraph and link-analysis jobs.
NewScoringIndexingExample -- Two full fetch cycles of commands using new scoring and indexing systems.
Build Nutch 2.0 in Eclipse -- How to setup your IDE environment comfortably.
ErrorMessagesInNutch2 -- What they mean and suggestions for getting rid of them.
NutchConfigurationFiles-2.x -- Configuration files that are specific to Nutch-2.x
Pre Nutch 1.3 and Archive
How to edit this Wiki
This Wiki is a collaborative site, anyone can contribute and share:
- Create an account by clicking the "Login" link at the top of any page, and picking a username and password.
Edit any page by pressing Edit at the top or the bottom of the page
There are some conventions used on the Nutch wiki:
:TODO: (/!\ :TODO: /!\ ) is used to denote sections that definitely need to be cleaned up.
Some general info on using this Wiki Software: