"index" is an alias for "org.apache.nutch.indexer.Indexer"
Creates an index of a segment using information from the crawldb and the linkdb to score pages in the index.
Usage
nutch-0.8-dev/bin/nutch org.apache.nutch.indexer.Indexer <index> <crawldb> <linkdb> <segment> ...
<index>: Path to the directory where the index will be created.
<crawldb>: Path to the crawl database directory.
<linkdb>: Path to the link database directory.
<segment>: Path to the segment that has been fetched. More then one segment may be specified.
Configuration Files
hadoop-default.xml
hadoop-site.xml
nutch-default.xml
nutch-site.xml
Other Files
- None.
Caveats and Notes
- None.