Nutch 1.X Command Line Options of bin/nutch

The script bin/nutch is a helper which picks different java classes to "run".

Note: Most commands print help when invoked w/o parameters.

See each entry for details of the command arguments and options.

command

function

bin/nutch_crawl

One-step crawler for intranets

bin/nutch_readdb

Read / dump crawl db

bin/nutch mergedb

Merge crawldb-s, with optional filtering

bin/nutch readlinkdb

Read / dump link db

bin/nutch_inject

Inject new urls into the database

bin/nutch_generate

Generate new segments to fetch from crawldb

bin/nutch_freegen

Generate new segments to fetch from text files

bin/nutch_fetch

Fetch a segment's pages

bin/nutch_parse

Parse a segment's pages

bin/nutch_readseg

Read / dump segment data

bin/nutch_mergesegs

Merges multiple segments, with optional filtering and slicing

bin/nutch_updatedb

Update crawldb from segments after fetching

bin/nutch_invertlinks

Create a linkdb from parsed segments

bin/nutch_mergelinkdb

Merge's linkdb-s, with optional filtering

bin/nutch solrindex

Run the solr indexer on parsed segments and linkdb

bin/nutch solrdedup

Removes duplicate documents from solr

bin/nutch solrclean

Removes HTTP 301 and 404 documents from solr

bin/nutch parsechecker

Checks the parser for a given url

bin/nutch indexchecker

Checks the indexing filters for a given url

bin/nutch domainstats

Calculates domain statistics from crawldb

bin/nutch webgraph

Generates a web graph from existing segments

bin/nutch linkrank

Runs a link analysis program on the generated web graph

bin/nutch scoreupdater

Updates the crawldb with linkrank scores

bin/nutch nodedumper

Dumps the web graph's node scores

bin/nutch plugin

Loads a plugin and run one of its classes main()

bin/nutch junit

Runs the given JUnit test

or

bin/nutch CLASSNAME

run the class named CLASSNAME

Webgraph classes

Useful Plugin Classes

Other Classes

back to FrontPage

CommandLineOptions (last edited 2012-05-03 18:44:31 by LewisJohnMcgibbney)