Differences between revisions 4 and 5
Revision 4 as of 2006-01-09 19:15:14
Size: 1020
Editor: JerryRussell
Comment: fixed usage
Revision 5 as of 2009-09-20 23:09:55
Size: 1020
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
Usage: bin/nutch segread [-fix] [-dump] [-dumpsort] [-list] [-nocontent] [-noparsedata] [-noparsetext] (-dir segments | seg1 seg2 ...)[[BR]]
NOTE: at least one segment dir name is required, or '-dir' option.[[BR]]
-fix[[BR]]
  automatically fix corrupted segments[[BR]]
-dump[[BR]]

  dump segment data in human-readable format[[BR]]
-dumpsort[[BR]]

  dump segment data in human-readable format, sorted by URL[[BR]]
-list[[BR]]

  print useful information about segments[[BR]]
-nocontent[[BR]]
  ignore content data[[BR]]
-noparsedata[[BR]]

  ignore parse_data data[[BR]]
-nocontent[[BR]]

  ignore parse_text data[[BR]]
-dir segments[[BR]]

  directory containing multiple segments[[BR]]
seg1 seg2 ...[[BR]]
  segment directories[[BR]]
Usage: bin/nutch segread [-fix] [-dump] [-dumpsort] [-list] [-nocontent] [-noparsedata] [-noparsetext] (-dir segments | seg1 seg2 ...)<<BR>>
NOTE: at least one segment dir name is required, or '-dir' option.<<BR>>
-fix<<BR>>
  automatically fix corrupted segments<<BR>>
-dump<<BR>>

  dump segment data in human-readable format<<BR>>
-dumpsort<<BR>>

  dump segment data in human-readable format, sorted by URL<<BR>>
-list<<BR>>

  print useful information about segments<<BR>>
-nocontent<<BR>>
  ignore content data<<BR>>
-noparsedata<<BR>>

  ignore parse_data data<<BR>>
-nocontent<<BR>>

  ignore parse_text data<<BR>>
-dir segments<<BR>>

  directory containing multiple segments<<BR>>
seg1 seg2 ...<<BR>>
  segment directories<<BR>>

segread is an alias for org.apache.nutch.segment.SegmentReader

This class holds together all data readers for an existing segment. Some convenience methods are also provided, to read from the segment and to reposition the current pointer.

Usage: bin/nutch segread [-fix] [-dump] [-dumpsort] [-list] [-nocontent] [-noparsedata] [-noparsetext] (-dir segments | seg1 seg2 ...)
NOTE: at least one segment dir name is required, or '-dir' option.
-fix

  • automatically fix corrupted segments

-dump

  • dump segment data in human-readable format

-dumpsort

  • dump segment data in human-readable format, sorted by URL

-list

  • print useful information about segments

-nocontent

  • ignore content data

-noparsedata

  • ignore parse_data data

-nocontent

  • ignore parse_text data

-dir segments

  • directory containing multiple segments

seg1 seg2 ...

  • segment directories

CommandLineOptions

bin/nutch_segread (last edited 2009-09-20 23:09:55 by localhost)