Differences between revisions 3 and 4
Revision 3 as of 2007-01-27 00:05:46
Size: 682
Comment:
Revision 4 as of 2009-09-20 23:09:36
Size: 682
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 8: Line 8:
  '''<crawldb>''': Path to the crawl database.[[BR]]
  '''<segment>''': Path to the segment that has been fetched.[[BR]]
  '''[-noadditions]''': Don't add new URL's to crawldb (is this ok?)[[BR]]
  '''<crawldb>''': Path to the crawl database.<<BR>>
  '''<segment>''': Path to the segment that has been fetched.<<BR>>
  '''[-noadditions]''': Don't add new URL's to crawldb (is this ok?)<<BR>>
Line 13: Line 13:
 hadoop-default.xml[[BR]]
 hadoop-site.xml[[BR]]
 nutch-default.xml[[BR]]
 nutch-site.xml[[BR]]
 hadoop-default.xml<<BR>>
 hadoop-site.xml<<BR>>
 nutch-default.xml<<BR>>
 nutch-site.xml<<BR>>

"updatedb" is an alias for "org.apache.nutch.crawl.CrawlDb"

Updates the Crawl DB with information obtained from the Fetcher.

Usage

  • nutch-0.8-dev/bin/nutch org.apache.nutch.crawl.CrawlDb <crawldb> <segment> [-noadditions]

    • <crawldb>: Path to the crawl database.
      <segment>: Path to the segment that has been fetched.
      [-noadditions]: Don't add new URL's to crawldb (is this ok?)

Configuration Files

  • hadoop-default.xml
    hadoop-site.xml
    nutch-default.xml
    nutch-site.xml

Other Files

  • None.

Caveats and Notes

  • None.

DevelopmentCommandLineOptions

nutch-0.8-dev/bin/nutch_updatedb (last edited 2009-09-20 23:09:36 by localhost)