  '''<segment>:''' Path to the segment to read.
Line 11: Line 11:

"segread" is an alias for "org.apache.nutch.segment.SegmentReader"

Reads and Exports a Segments Data


  • nutch-0.8-dev/bin/nutch org.apache.nutch.segment.SegmentReader <segment>

    • <segment>: Path to the segment to read.

Configuration Files

  • hadoop-default.xml

Other Files

  • None.

Caveats and Notes

  • Creates a directory in <segment> called segdump. Within that directory a number of files are created. A dump file called dump and several other files prefixed part-. The dump file contains some readable information about the pages fetched and their parsed information. The part files are consolidated together to form the dump file and can be deleted. Do not 'cat' these files if in a term as it does contain some binary data that will corrupt your terminal (however, if you end up in such state, you can reset your terminal with 'stty sane' or if this fails with 'reset').


