Differences between revisions 1 and 2
Revision 1 as of 2006-03-07 00:08:58
Size: 537
Editor: JeffRitchie
Comment: new page added
Revision 2 as of 2009-09-20 23:10:09
Size: 537
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 8: Line 8:
  '''<indexes>:''' Path to directories containing indexes.[[BR]]   '''<indexes>:''' Path to directories containing indexes.<<BR>>
Line 11: Line 11:
 hadoop-default.xml[[BR]]
 hadoop-site.xml[[BR]]
 nutch-default.xml[[BR]]
 nutch-site.xml[[BR]]
 hadoop-default.xml<<BR>>
 hadoop-site.xml<<BR>>
 nutch-default.xml<<BR>>
 nutch-site.xml<<BR>>

"dedup" is an alias for "org.apache.nutch.indexer.DeleteDuplicates"

Removes duplicate pages from a set of segment indexes.

Usage

  • nutch-0.8-dev/bin/nutch org.apache.nutch.indexer.DeleteDuplicates <indexes> ...

    • <indexes>: Path to directories containing indexes.

Configuration Files

  • hadoop-default.xml
    hadoop-site.xml
    nutch-default.xml
    nutch-site.xml

Other Files

  • None.

Caveats and Notes

  • None.

DevelopmentCommandLineOptions

nutch-0.8-dev/bin/nutch_dedup (last edited 2009-09-20 23:10:09 by localhost)