Differences between revisions 1 and 2
Revision 1 as of 2006-03-07 00:08:58
Size: 537
Editor: 24-241-218-184
Comment: new page added
Revision 2 as of 2009-09-20 23:10:09
Size: 537
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 8: Line 8:
  '''<indexes>:''' Path to directories containing indexes.[[BR]]   '''<indexes>:''' Path to directories containing indexes.<<BR>>
Line 11: Line 11:

"dedup" is an alias for "org.apache.nutch.indexer.DeleteDuplicates"

Removes duplicate pages from a set of segment indexes.


  • nutch-0.8-dev/bin/nutch org.apache.nutch.indexer.DeleteDuplicates <indexes> ...

    • <indexes>: Path to directories containing indexes.

Configuration Files

  • hadoop-default.xml

Other Files

  • None.

Caveats and Notes

  • None.


nutch-0.8-dev/bin/nutch_dedup (last edited 2009-09-20 23:10:09 by localhost)