Differences between revisions 3 and 4
Revision 3 as of 2006-08-11 15:40:25
Size: 1848
Editor: a84-231-72-32
Comment:
Revision 4 as of 2006-08-11 16:10:13
Size: 2845
Editor: a84-231-72-32
Comment:
Deletions are marked like this. Additions are marked like this.
Line 7: Line 7:
||["nutch-0.8-dev/bin/nutch crawl"]||One-step crawler for intranets.||
||["nutch-0.8-dev/bin/nutch readdb"]||Read / dump crawldb.||
||["nutch-0.8-dev/bin/nutch readlinkdb"]||Read / dump linkdb.||
||["nutch-0.8-dev/bin/nutch inject"]||Inject new urls into the crawldb.||
||["nutch-0.8-dev/bin/nutch generate"]||Generate new segments to fetch.||
||["nutch-0.8-dev/bin/nutch fetch"]||Fetch a segment's pages.||
||["nutch-0.8-dev/bin/nutch parse"]||Parse contents in one segment.||
||["nutch-0.8-dev/bin/nutch segread"]||Read data in an existing segment.||
||["nutch-0.8-dev/bin/nutch updatedb"]||Updates the crawldb from a segment.||
||["nutch-0.8-dev/bin/nutch invertlinks"]||Create or update a linkdb from a segment or segments.||
||["nutch-0.8-dev/bin/nutch index"]||Run the indexer on a segment's fetcher output.||
||["nutch-0.8-dev/bin/nutch merge"]||Merge several segment indexes.||
||["nutch-0.8-dev/bin/nutch mergedb"]||Merge several crawldb-s together. Can be used for filtering out specific content.||
||["nutch-0.8-dev/bin/nutch mergelinkdb"]||Merge several linkdb-s together. Can be used for filtering out specific content.||
||["nutch-0.8-dev/bin/nutch mergesegs"]||Merge several input segments into one or more output segments. Can be used for filtering out specific content.||
||["nutch-0.8-dev/bin/nutch dedup"]||Deletes duplicate documents in a set of segment indexes.||
||["nutch-0.8-dev/bin/nutch plugin"]||Load a plugin and run one of its classes main().||
||["nutch-0.8-dev/bin/nutch server"]||Run a search server.||
||[:nutch-0.8-dev/bin/nutch crawl:bin/nutch crawl]||One-step crawler for intranets.||
||[:nutch-0.8-dev/bin/nutch readdb:bin/nutch readdb]||Read / dump crawldb.||
||[:nutch-0.8-dev/bin/nutch readlinkdb:bin/nutch readlinkdb]||Read / dump linkdb.||
||[:nutch-0.8-dev/bin/nutch inject:bin/nutch inject]||Inject new urls into the crawldb.||
||[:nutch-0.8-dev/bin/nutch generate:bin/nutch generate]||Generate new segments to fetch.||
||[:nutch-0.8-dev/bin/nutch fetch:bin/nutch fetch]||Fetch a segment's pages.||
||[:nutch-0.8-dev/bin/nutch parse:bin/nutch parse]||Parse contents in one segment.||
||[:nutch-0.8-dev/bin/nutch segread:bin/nutch segread]||Read data in an existing segment.||
||[:nutch-0.8-dev/bin/nutch updatedb:bin/nutch updatedb]||Updates the crawldb from a segment.||
||[:nutch-0.8-dev/bin/nutch invertlinks:bin/nutch invertlinks]||Create or update a linkdb from a segment or segments.||
||[:nutch-0.8-dev/bin/nutch index:bin/nutch index]||Run the indexer on a segment's fetcher output.||
||[:nutch-0.8-dev/bin/nutch merge:bin/nutch merge]||Merge several segment indexes.||
||[:nutch-0.8-dev/bin/nutch mergedb:bin/nutch mergedb]||Merge several crawldb-s together. Can be used for filtering out specific content.||
||[:nutch-0.8-dev/bin/nutch mergelinkdb:bin/nutch mergelinkdb]||Merge several linkdb-s together. Can be used for filtering out specific content.||
||[:nutch-0.8-dev/bin/nutch mergesegs:bin/nutch mergesegs]||Merge several input segments into one or more output segments. Can be used for filtering out specific content.||
||[:nutch-0.8-dev/bin/nutch dedup:bin/nutch dedub]||Deletes duplicate documents in a set of segment indexes.||
||[:nutch-0.8-dev/bin/nutch plugin:bin/nutch plugin]||Load a plugin and run one of its classes main().||
||[:nutch-0.8-dev/bin/nutch server:"bin/nutch server]||Run a search server.||
Line 26: Line 26:

= Other useful commands are also available =

See each entry for datails of the command arguments and options.

||'''command'''||'''function'''||
||[:nutch-08/bin/nutch org.apache.nutch.searcher.NutchBean:bin/nutch org.apache.nutch.searcher.NutchBean]||Commandline interface for doing searches.||
||[:nutch-08/bin/nutch org.apache.nutch.net.URLFilterChecker:bin/nutch org.apache.nutch.net.URLFilterChecker]||Utility for testing url filters.||
||[:nutch-08/bin/nutch org.apache.nutch.indexer.HighFreqTerms:bin/nutch org.apache.nutch.indexer.HighFreqTerms]||Lists the most frequent terms in an index||
|| || ||

Command Line Options of nutch-0.8-dev/bin/nutch

See each entry for datails of the command arguments and options.

command

function

[:nutch-0.8-dev/bin/nutch crawl:bin/nutch crawl]

One-step crawler for intranets.

[:nutch-0.8-dev/bin/nutch readdb:bin/nutch readdb]

Read / dump crawldb.

[:nutch-0.8-dev/bin/nutch readlinkdb:bin/nutch readlinkdb]

Read / dump linkdb.

[:nutch-0.8-dev/bin/nutch inject:bin/nutch inject]

Inject new urls into the crawldb.

[:nutch-0.8-dev/bin/nutch generate:bin/nutch generate]

Generate new segments to fetch.

[:nutch-0.8-dev/bin/nutch fetch:bin/nutch fetch]

Fetch a segment's pages.

[:nutch-0.8-dev/bin/nutch parse:bin/nutch parse]

Parse contents in one segment.

[:nutch-0.8-dev/bin/nutch segread:bin/nutch segread]

Read data in an existing segment.

[:nutch-0.8-dev/bin/nutch updatedb:bin/nutch updatedb]

Updates the crawldb from a segment.

[:nutch-0.8-dev/bin/nutch invertlinks:bin/nutch invertlinks]

Create or update a linkdb from a segment or segments.

[:nutch-0.8-dev/bin/nutch index:bin/nutch index]

Run the indexer on a segment's fetcher output.

[:nutch-0.8-dev/bin/nutch merge:bin/nutch merge]

Merge several segment indexes.

[:nutch-0.8-dev/bin/nutch mergedb:bin/nutch mergedb]

Merge several crawldb-s together. Can be used for filtering out specific content.

[:nutch-0.8-dev/bin/nutch mergelinkdb:bin/nutch mergelinkdb]

Merge several linkdb-s together. Can be used for filtering out specific content.

[:nutch-0.8-dev/bin/nutch mergesegs:bin/nutch mergesegs]

Merge several input segments into one or more output segments. Can be used for filtering out specific content.

[:nutch-0.8-dev/bin/nutch dedup:bin/nutch dedub]

Deletes duplicate documents in a set of segment indexes.

[:nutch-0.8-dev/bin/nutch plugin:bin/nutch plugin]

Load a plugin and run one of its classes main().

[:nutch-0.8-dev/bin/nutch server:"bin/nutch server]

Run a search server.

Other useful commands are also available

See each entry for datails of the command arguments and options.

command

function

[:nutch-08/bin/nutch org.apache.nutch.searcher.NutchBean:bin/nutch org.apache.nutch.searcher.NutchBean]

Commandline interface for doing searches.

[:nutch-08/bin/nutch org.apache.nutch.net.URLFilterChecker:bin/nutch org.apache.nutch.net.URLFilterChecker]

Utility for testing url filters.

[:nutch-08/bin/nutch org.apache.nutch.indexer.HighFreqTerms:bin/nutch org.apache.nutch.indexer.HighFreqTerms]

Lists the most frequent terms in an index

08CommandLineOptions (last edited 2011-07-18 14:56:29 by LewisJohnMcgibbney)