converted to 1.6 markup
|Deletions are marked like this.||Additions are marked like this.|
|Line 8:||Line 8:|
|'''<segment>:''' Path to the segment to read.[[BR]]||'''<segment>:''' Path to the segment to read.<<BR>>|
|Line 11:||Line 11:|
"segread" is an alias for "org.apache.nutch.segment.SegmentReader"
Reads and Exports a Segments Data
nutch-0.8-dev/bin/nutch org.apache.nutch.segment.SegmentReader <segment>
<segment>: Path to the segment to read.
Caveats and Notes
Creates a directory in <segment> called segdump. Within that directory a number of files are created. A dump file called dump and several other files prefixed part-. The dump file contains some readable information about the pages fetched and their parsed information. The part files are consolidated together to form the dump file and can be deleted. Do not 'cat' these files if in a term as it does contain some binary data that will corrupt your terminal (however, if you end up in such state, you can reset your terminal with 'stty sane' or if this fails with 'reset').