converted to 1.6 markup
|Deletions are marked like this.||Additions are marked like this.|
|Line 1:||Line 1:|
dedup is an alias for org.apache.nutch.indexer.DeleteDuplicates
Deletes duplicate documents in a set of Lucene indexes. Duplicates have either the same contents (via MD5 hash) or the same URL.
Usage: bin/nutch org.apache.nutch.indexer.DeleteDuplicates (-local | -ndfs <namenode:port>) [-workingdir <workingdir>] <segmentsDir>