The bin/crawl script gives more command during a crawl. It uses individual steps (inject->generate->fetch->parse->updatedb) during a crawl.


Nutch 1.X

     Usage: crawl [-i|--index] [-D "key=value"] <Seed Dir> <Crawl Dir> <Num Rounds>
        -i|--index      Indexes crawl results into a configured indexer
        -D              A Java property to pass to Nutch calls
        Seed Dir        Directory in which to look for a seeds file
        Crawl Dir       Directory where the crawl/link/segments dirs are saved
        Num Rounds      The number of rounds to run this crawl for
     Example: bin/crawl -i -D solr.server.url=http://localhost:8983/solr/ urls/ TestCrawl/  2

Nutch 2.x

Need Assistance ?

Please message us in the user-mailing list if you find any issues

bin/crawl (last edited 2015-06-13 18:24:26 by LewisJohnMcgibbney)