Invertlinks is an alias for org.apache.nutch.crawl.LinkDb

This class maintains an inverted link map, listing incoming links for each url. Public class LinkDb extends Configured implements Tool, Mapper<Text, ParseData, Text, Inlinks>

Usage:


bin/nutch invertlinks <linkdb> (-dir <segmentsDir> | <seg1> <seg2> ...) [-force] [-noNormalize] [-noFilter]

<linkdb>: This should be the path the the output linkdb to create or update.

-dir <segmentsDir>: This corresponds to the parent directory containing several segments, OR

<seg1> <seg2> ...: A list of segment directories to create a inverted linkdb from.

[-force]: This arguement forces an update even if linkdb appears to be locked :(CAUTION advised:

[-noNormalize]: We pass this if we don't normalize link URLs. This obtains us a true representation of incoming links within the linkdb.

[-noFilter]: This parameter avoids and doesn't apply any of our current URLFilters to link URLs.

CommandLineOptions

  • No labels