Readlinkdb is an alias for org.apache.nutch.crawl.LinkDbReader

This reader class enables us to to obtain various information from within a linkdb. The two types of information we can retirieve is

/!\ :TODO: More could be added to the above e.g what is the nature and structure of the information we retieve from a dump of the linkdb and a specific URL. /!\


bin/nutch readlinkdb <linkdb> (-dump <out_dir> | -url <url>)

<linkdb>: This is the linkdb diretory we wish to read and obtain information from.

-dump <out_dir>: This parameter dumps the whole linkdb to a text file in any <out_dir> we wish to specify.

-url <url>: The -url arguement provides us with information about a specific <url>. This is written to System.out.


bin/nutch readlinkdb (last edited 2011-07-01 04:27:19 by LewisJohnMcgibbney)