"segread" is an alias for "org.apache.nutch.segment.SegmentReader"
Reads and Exports a Segments Data
Usage
nutch-0.8-dev/bin/nutch org.apache.nutch.segment.SegmentReader <segment>
<segment>: Path to the segment to read.
Configuration Files
hadoop-default.xml
hadoop-site.xml
nutch-default.xml
nutch-site.xml
Other Files
- None.
Caveats and Notes
Creates a directory in <segment> called segdump. Within that directory a number of files are created. A dump file called dump and several other files prefixed part-. The dump file contains some readable information about the pages fetched and their parsed information. The part files are consolidated together to form the dump file and can be deleted. Do not 'cat' these files if in a term as it does contain some binary data that will corrupt your terminal (however, if you end up in such state, you can reset your terminal with 'stty sane' or if this fails with 'reset').