Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

command

function

version



1.x

2.x

bin/nutch readdb

Read / dump crawl db

X

X

bin/nutch mergedb

Merge crawldb-s, with optional filtering

X


bin/nutch readlinkdb

Read / dump link db

X


bin/nutch inject

Inject new urls into the database

X

X

bin/nutch generate

Generate new segments to fetch from crawldb

X

X

bin/nutch freegen

Generate new segments to fetch from text files

X


bin/nutch fetch

Fetch a segment's pages

X

X

bin/nutch parse

Parse a segment's pages

X

X

bin/nutch readseg

Read / dump segment data

X


bin/nutch mergesegs

Merges multiple segments, with optional filtering and slicing

X


bin/nutch updatedb

Update crawldb (from segments if in 1.x) after fetching

X

X

bin/nutch updatehostdb

Update hostdb after fetching


X

bin/nutch invertlinks

Create a linkdb from parsed segments

X


bin/nutch mergelinkdb

Merge's linkdb-s, with optional filtering

X


bin/nutch elasticindex

Run the elastic search indexer on parsed batches


X

bin/nutch solrindex

Run the solr indexer on parsed segments and linkdb - DEPRECATED use the index command instead

X

X

bin/nutch solrdedup

Removes duplicate documents from solr - DEPRECATED use the dedup command instead

X

X

bin/nutch solrclean

Removes HTTP 301 and 404 documents from solr - DEPRECATED use the clean command instead

X


bin/nutch index

Run the plugin-based indexer on parsed segments and linkdb

X


bin/nutch dedup

Deduplicate entries in the crawldb and give them a special status

X


bin/nutch clean

Remove HTTP 301 and 404 documents and duplicates from indexing backends configured via plugins

X


bin/nutch parsechecker

Checks the parser for a given url

X

X

bin/nutch indexchecker

Checks the indexing filters for a given url

X


bin/nutch normalizercheckerChecks URL normalizers for given URLsX

bin/nutch domainstats

Calculates domain statistics from crawldb

X


bin/nutch webgraph

Generates a web graph from existing segments

X


bin/nutch linkrank

Runs a link analysis program on the generated web graph

X


bin/nutch scoreupdater

Updates the crawldb with linkrank scores

X


bin/nutch nodedumper

Dumps the web graph's node scores

X


bin/nutch plugin

Loads a plugin and run one of its classes main()

X

X

bin/nutch nutchserver

run a (local) Nutch server on a user defined port

X

X

bin/nutch webapp

run a (local) Nutch WebApp GUI on port 8080


X

bin/nutch junit

Runs the given JUnit test

X

X

bin/nutch commoncrawldump

Dump out Nutch segments into Common Crawl data format

X


bin/nutch CLASSNAME

run the class named CLASSNAME

X

X

...