Nutch 1.X Command Line Options of bin/nutch
The script bin/nutch is a helper which picks different java classes to "run".
Note: Most commands print help when invoked w/o parameters.
See each entry for details of the command arguments and options.
command |
function |
One-step crawler for intranets |
|
Read / dump crawl db |
|
Merge crawldb-s, with optional filtering |
|
Read / dump link db |
|
Inject new urls into the database |
|
Generate new segments to fetch from crawldb |
|
Generate new segments to fetch from text files |
|
Fetch a segment's pages |
|
Parse a segment's pages |
|
Read / dump segment data |
|
Merges multiple segments, with optional filtering and slicing |
|
Update crawldb from segments after fetching |
|
Create a linkdb from parsed segments |
|
Merge's linkdb-s, with optional filtering |
|
Run the solr indexer on parsed segments and linkdb |
|
Removes duplicate documents from solr |
|
Removes HTTP 301 and 404 documents from solr |
|
Checks the parser for a given url |
|
Checks the indexing filters for a given url |
|
Calculates domain statistics from crawldb |
|
Generates a web graph from existing segments |
|
Runs a link analysis program on the generated web graph |
|
Updates the crawldb with linkrank scores |
|
Dumps the web graph's node scores |
|
Loads a plugin and run one of its classes main() |
|
Runs the given JUnit test |
or
run the class named CLASSNAME |
Webgraph classes
bin/nutch org.apache.nutch.scoring.webgraph.WebGraph
bin/nutch org.apache.nutch.scoring.webgraph.Loops
bin/nutch org.apache.nutch.scoring.webgraph.LinkRank
bin/nutch org.apache.nutch.scoring.webgraph.ScoreUpdater
bin/nutch org.apache.nutch.scoring.webgraph.NodeDumper
bin/nutch org.apache.nutch.scoring.webgraph.NodeReader
bin/nutch org.apache.nutch.scoring.webgraph.LoopReader
bin/nutch org.apache.nutch.scoring.webgraph.LinkDumper
Useful Plugin Classes
bin/nutch plugin urlnormalizer-regex org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer
bin/nutch plugin lib-http org.apache.nutch.protocol.http.api.RobotRulesParser
Other Classes
bin/nutch org.apache.nutch.net.URLFilterChecker
bin/nutch org.apache.nutch.net.URLNormalizerChecker
bin/nutch org.apache.nutch.tools.CrawlDBScanner
back to FrontPage