Differences between revisions 3 and 4
Revision 3 as of 2006-03-05 00:05:49
Size: 731
Editor: JeffRitchie
Comment:
Revision 4 as of 2009-09-20 23:09:32
Size: 731
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 8: Line 8:
  '''<linkdb>:''' Path to the linkdb directory.[[BR]]
  '''[-dump <out_dir>]:''' Exports the linkdb to a file in <out_dir>[[BR]]
  '''[-url <url>]:''' Prints statistics on <url> to System.out[[BR]]
  '''<linkdb>:''' Path to the linkdb directory.<<BR>>
  '''[-dump <out_dir>]:''' Exports the linkdb to a file in <out_dir><<BR>>
  '''[-url <url>]:''' Prints statistics on <url> to System.out<<BR>>
Line 13: Line 13:
 hadoop-default.xml[[BR]]
 hadoop-site.xml[[BR]]
 nutch-default.xml[[BR]]
 nutch-site.xml[[BR]]
 hadoop-default.xml<<BR>>
 hadoop-site.xml<<BR>>
 nutch-default.xml<<BR>>
 nutch-site.xml<<BR>>

"readlinkdb" is an alias for "org.apache.nutch.crawl.LinkDbReader"

Usage

  • nutch-0.8-dev/bin/nutch org.apache.nutch.crawl.LinkDbReader <linkdb> (-dump <out_dir> | -url <url>)

    • <linkdb>: Path to the linkdb directory.
      [-dump <out_dir>]: Exports the linkdb to a file in <out_dir>
      [-url <url>]: Prints statistics on <url> to System.out

Configuration Files

  • hadoop-default.xml
    hadoop-site.xml
    nutch-default.xml
    nutch-site.xml

Other Files

  • None.

Caveats and Notes

  • None.

DevelopmentCommandLineOptions

nutch-0.8-dev/bin/nutch_readlinkdb (last edited 2009-09-20 23:09:32 by localhost)