Differences between revisions 1 and 2
Revision 1 as of 2006-03-06 23:20:20
Size: 848
Editor: JeffRitchie
Comment: new page added
Revision 2 as of 2009-09-20 23:09:48
Size: 848
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 8: Line 8:
  '''<index>:''' Path to the directory where the index will be created.[[BR]]
  '''<crawldb>:''' Path to the crawl database directory.[[BR]]
  '''<linkdb>:''' Path to the link database directory.[[BR]]
  '''<segment>:''' Path to the segment that has been fetched. More then one segment may be specified.[[BR]]
  '''<index>:''' Path to the directory where the index will be created.<<BR>>
  '''<crawldb>:''' Path to the crawl database directory.<<BR>>
  '''<linkdb>:''' Path to the link database directory.<<BR>>
  '''<segment>:''' Path to the segment that has been fetched. More then one segment may be specified.<<BR>>
Line 14: Line 14:
 hadoop-default.xml[[BR]]
 hadoop-site.xml[[BR]]
 nutch-default.xml[[BR]]
 nutch-site.xml[[BR]]
 hadoop-default.xml<<BR>>
 hadoop-site.xml<<BR>>
 nutch-default.xml<<BR>>
 nutch-site.xml<<BR>>

"index" is an alias for "org.apache.nutch.indexer.Indexer"

Creates an index of a segment using information from the crawldb and the linkdb to score pages in the index.

Usage

  • nutch-0.8-dev/bin/nutch org.apache.nutch.indexer.Indexer <index> <crawldb> <linkdb> <segment> ...

    • <index>: Path to the directory where the index will be created.
      <crawldb>: Path to the crawl database directory.
      <linkdb>: Path to the link database directory.
      <segment>: Path to the segment that has been fetched. More then one segment may be specified.

Configuration Files

  • hadoop-default.xml
    hadoop-site.xml
    nutch-default.xml
    nutch-site.xml

Other Files

  • None.

Caveats and Notes

  • None.

DevelopmentCommandLineOptions

nutch-0.8-dev/bin/nutch_index (last edited 2009-09-20 23:09:48 by localhost)