...
command | function | version | |
1.x | 2.x | ||
Read / dump crawl db | X | X | |
Merge crawldb-s, with optional filtering | X | ||
Read / dump link db | X | ||
Inject new urls into the database | X | X | |
Generate new segments to fetch from crawldb | X | X | |
Generate new segments to fetch from text files | X | ||
Fetch a segment's pages | X | X | |
Parse a segment's pages | X | X | |
Read / dump segment data | X | ||
Merges multiple segments, with optional filtering and slicing | X | ||
Update crawldb (from segments if in 1.x) after fetching | X | X | |
Update hostdb after fetching | X | ||
Create a linkdb from parsed segments | X | ||
Merge's linkdb-s, with optional filtering | X | ||
Run the elastic search indexer on parsed batches | X | ||
Run the solr indexer on parsed segments and linkdb - DEPRECATED use the index command instead | X | X | |
Removes duplicate documents from solr - DEPRECATED use the dedup command instead | X | X | |
Removes HTTP 301 and 404 documents from solr - DEPRECATED use the clean command instead | X | ||
Run the plugin-based indexer on parsed segments and linkdb | X | ||
Deduplicate entries in the crawldb and give them a special status | X | ||
Remove HTTP 301 and 404 documents and duplicates from indexing backends configured via plugins | X | ||
Checks the parser for a given url | X | X | |
Checks the indexing filters for a given url | X | ||
bin/nutch normalizerchecker | Checks URL normalizers for given URLs | X | |
Calculates domain statistics from crawldb | X | ||
Generates a web graph from existing segments | X | ||
Runs a link analysis program on the generated web graph | X | ||
Updates the crawldb with linkrank scores | X | ||
Dumps the web graph's node scores | X | ||
Loads a plugin and run one of its classes main() | X | X | |
run a (local) Nutch server on a user defined port | X | X | |
run a (local) Nutch WebApp GUI on port 8080 | X | ||
Runs the given JUnit test | X | X | |
Dump out Nutch segments into Common Crawl data format | X | ||
run the class named CLASSNAME | X | X |
...