"inject" is an alias for "org.apache.nutch.crawl.Injector"

Injects new URLs into the Crawl Database

Usage

Configuration Files

Configuration Values

The following properties directly affect how the Injector injects URLs.

Other Files

Caveats and Notes

Example:

nutch-0.8-dev/bin/nutch inject /path/to/crawldb /path/to/url/dir

Files:
/path/to/url/dir/nutch.txt
/path/to/url/dir/hadoop.txt
/path/to/url/dir/wikis.txt

nutch.txt contents:
http://lucene.apache.org/nutch/
http://lucene.apache.org/nutch/tutorial.html

hadoop.txt contents:
http://lucene.apache.org/hadoop/
http://lucene.apache.org/hadoop/docs/api/

wikis.txt contents:
http://wiki.apache.org/hadoop/
http://wiki.apache.org/nutch/
http://wiki.apache.org/lucene/

In this case seven urls would be injected into the Crawl Database located at /path/to/crawldb by the Injector.

DevelopmentCommandLineOptions

nutch-0.8-dev/bin/nutch_inject (last edited 2009-09-20 23:09:41 by localhost)