"dedup" is an alias for "org.apache.nutch.indexer.DeleteDuplicates"

Removes duplicate pages from a set of segment indexes.

Usage

Configuration Files

Other Files

Caveats and Notes

DevelopmentCommandLineOptions

nutch-0.8-dev/bin/nutch_dedup (last edited 2009-09-20 23:10:09 by localhost)