|Deletions are marked like this.||Additions are marked like this.|
|Line 10:||Line 10:|
''userType'': this can be any metadata field which you then assign a value. In the example here we use userType to refer to the nature of Nutch as an open source project.
Inject is an alias for org.apache.nutch.crawl.Injector
This class takes a flat file of URLs and adds them to the of pages to be crawled. It is useful for bootstrapping the system. The URL files contain one URL per line, optionally followed by custom metadata separated by tabs with the metadata key separated from the corresponding value by '='.
Note that some metadata keys are reserved:
nutch.score: allows to set a custom score for a specific URL
nutch.fetchInterval: allows to set a custom fetch interval for a specific URL
userType: this can be any metadata field which you then assign a value. In the example here we use userType to refer to the nature of Nutch as an open source project.
e.g. http://www.xyz.org/ nutch.score=10 nutch.fetchInterval=2592000 userType=open_source
bin/nutch inject <crawldb> <url_dir>
<crawldb>: The directory containing the crawldb
<url_dir>: The directory containing our seed list (referred to above as 'flat file'), usually a text document containing URLs, one URL per line.