Table of Contents |
---|
Nutch Command Line Options of bin/nutch
...
See each entry for details of the command arguments and options.
command | function | version |
1.x | 2.x | ||
Read / dump crawl db | X | X | |
Merge crawldb-s, with optional filtering | X |
Read / dump link db | X |
Inject new urls into the database | X | X | |
Inject new urls into the hostdatabase
X
Generate new segments to fetch from crawldb | X | X | |
Generate new segments to fetch from text files | X |
Fetch a segment's pages | X | X | |
Parse a segment's pages | X | X | |
Read / dump segment data | X |
Merges multiple segments, with optional filtering and slicing | X |
Update crawldb (from segments if in 1.x) after fetching | X | X | |
Update hostdb after fetching |
X | ||
Create a linkdb from parsed segments | X |
Merge's linkdb-s, with optional filtering | X |
Run the elastic search indexer on parsed batches |
X | |||
Run the solr indexer on parsed segments and linkdb - DEPRECATED use the index command instead | X | X | |
Removes duplicate documents from solr - DEPRECATED use the dedup command instead | X | X | |
Removes HTTP 301 and 404 documents from solr - DEPRECATED use the clean command instead | X |
Run the plugin-based indexer on parsed segments and linkdb | X |
Deduplicate entries in the crawldb and give them a special status | X |
Remove HTTP 301 and 404 documents and duplicates from indexing backends configured via plugins | X |
Checks the parser for a given url | X | X | |
Checks the indexing filters for a given url | X |
bin/nutch normalizerchecker | Checks URL normalizers for given URLs | X |
Calculates domain statistics from crawldb | X |
Generates a web graph from existing segments | X |
Runs a link analysis program on the generated web graph | X |
Updates the crawldb with linkrank scores | X |
Dumps the web graph's node scores | X |
Loads a plugin and run one of its classes main() | X | X | |
run a (local) Nutch server on a user defined port |
X | |
run a (local) Nutch WebApp GUI on port 8080 |
X | |||
Runs the given JUnit test | X | X | |
Dump out Nutch segments into Common Crawl data format | X |
run the class named CLASSNAME | X | X |
Webgraph classes
- bin/nutch org.apache.nutch.scoring.webgraph.WebGraph
- bin/nutch org.apache.nutch.scoring.webgraph.Loops
- bin/nutch org.apache.nutch.scoring.webgraph.LinkRank
- bin/nutch org.apache.nutch.scoring.webgraph.ScoreUpdater
- bin/nutch org.apache.nutch.scoring.webgraph.NodeDumper
- bin/nutch org.apache.nutch.scoring.webgraph.NodeReader
- bin/nutch org.apache.nutch.scoring.webgraph.LoopReader
- bin/nutch org.apache.nutch.scoring.webgraph.LinkDumper
Useful Plugin Classes
- bin/nutch plugin urlnormalizer-regex org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer
Other Classes
- bin/nutch org.apache.nutch.net.URLFilterChecker
- bin/nutch org.apache.nutch.net.URLNormalizerChecker
- bin/nutch org.apache.nutch.tools.CrawlDBScanner
- bin/nutch org.apache.nutch.protocol.RobotRulesParser
back to FrontPage