Command Line Options of bin/nutch
The script bin/nutch is a helper which picks different java classes to "run".
See each entry for details of the command arguments and options.
command |
function |
{OLD/Removed} Web page and link database administration, including creation |
|
{OLD/Removed} Adjust database link-analysis scoring |
|
Perform complete crawling and indexing of a set of root urls |
|
{OLD/Removed} NDFS data node |
|
Deletes duplicate documents in a set of segment indexes |
|
Fetch a segment's pages |
|
{OLD/Removed}Print the fetchlist of a segment |
|
Generate new segments to fetch |
|
Run the indexer on a segment's fetcher output |
|
Inject new urls into the web page and link database |
|
Merge several segment indexes |
|
merge crawldb-s, with optional filtering |
|
Merges multiple segments & removes duplicates |
|
{OLD/Removed} NDFS name node |
|
{OLD/Removed} NDFS administrative access |
|
Parse contents in one segment |
|
Prunes existing Nutch indexes of unwanted content |
|
Read data from the web page and link db |
|
Read data in an existing segment |
|
Divide data from one segement into several segments |
|
Run a search server of IPC connections |
|
Deletes duplicate documents from solr |
|
Updates the web page and link db from the segment fetcher output |
|
|
|