"updatedb" is an alias for "org.apache.nutch.crawl.CrawlDb"
Updates the Crawl DB with information obtained from the Fetcher.
Usage
nutch-0.8-dev/bin/nutch org.apache.nutch.crawl.CrawlDb <crawldb> <segment> [-noadditions]
<crawldb>: Path to the crawl database.
<segment>: Path to the segment that has been fetched.
[-noadditions]: Don't add new URL's to crawldb (is this ok?)
Configuration Files
hadoop-default.xml
hadoop-site.xml
nutch-default.xml
nutch-site.xml
Other Files
- None.
Caveats and Notes
- None.