Differences between revisions 4 and 5
Revision 4 as of 2006-08-11 16:10:13
Size: 2845
Editor: a84-231-72-32
Comment:
Revision 5 as of 2006-08-11 16:10:59
Size: 2841
Editor: a84-231-72-32
Comment:
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:
= Command Line Options of nutch-0.8-dev/bin/nutch = = Command Line Options of Nutch version 0.8.x =

Command Line Options of Nutch version 0.8.x

See each entry for datails of the command arguments and options.

command

function

[:nutch-0.8-dev/bin/nutch crawl:bin/nutch crawl]

One-step crawler for intranets.

[:nutch-0.8-dev/bin/nutch readdb:bin/nutch readdb]

Read / dump crawldb.

[:nutch-0.8-dev/bin/nutch readlinkdb:bin/nutch readlinkdb]

Read / dump linkdb.

[:nutch-0.8-dev/bin/nutch inject:bin/nutch inject]

Inject new urls into the crawldb.

[:nutch-0.8-dev/bin/nutch generate:bin/nutch generate]

Generate new segments to fetch.

[:nutch-0.8-dev/bin/nutch fetch:bin/nutch fetch]

Fetch a segment's pages.

[:nutch-0.8-dev/bin/nutch parse:bin/nutch parse]

Parse contents in one segment.

[:nutch-0.8-dev/bin/nutch segread:bin/nutch segread]

Read data in an existing segment.

[:nutch-0.8-dev/bin/nutch updatedb:bin/nutch updatedb]

Updates the crawldb from a segment.

[:nutch-0.8-dev/bin/nutch invertlinks:bin/nutch invertlinks]

Create or update a linkdb from a segment or segments.

[:nutch-0.8-dev/bin/nutch index:bin/nutch index]

Run the indexer on a segment's fetcher output.

[:nutch-0.8-dev/bin/nutch merge:bin/nutch merge]

Merge several segment indexes.

[:nutch-0.8-dev/bin/nutch mergedb:bin/nutch mergedb]

Merge several crawldb-s together. Can be used for filtering out specific content.

[:nutch-0.8-dev/bin/nutch mergelinkdb:bin/nutch mergelinkdb]

Merge several linkdb-s together. Can be used for filtering out specific content.

[:nutch-0.8-dev/bin/nutch mergesegs:bin/nutch mergesegs]

Merge several input segments into one or more output segments. Can be used for filtering out specific content.

[:nutch-0.8-dev/bin/nutch dedup:bin/nutch dedub]

Deletes duplicate documents in a set of segment indexes.

[:nutch-0.8-dev/bin/nutch plugin:bin/nutch plugin]

Load a plugin and run one of its classes main().

[:nutch-0.8-dev/bin/nutch server:"bin/nutch server]

Run a search server.

Other useful commands are also available

See each entry for datails of the command arguments and options.

command

function

[:nutch-08/bin/nutch org.apache.nutch.searcher.NutchBean:bin/nutch org.apache.nutch.searcher.NutchBean]

Commandline interface for doing searches.

[:nutch-08/bin/nutch org.apache.nutch.net.URLFilterChecker:bin/nutch org.apache.nutch.net.URLFilterChecker]

Utility for testing url filters.

[:nutch-08/bin/nutch org.apache.nutch.indexer.HighFreqTerms:bin/nutch org.apache.nutch.indexer.HighFreqTerms]

Lists the most frequent terms in an index

08CommandLineOptions (last edited 2011-07-18 14:56:29 by LewisJohnMcgibbney)