generate is an alias for org.apache.nutch.tools.FetchListTool
The generate command is used to create a new fetchlist from the webdb which contains urls which can be fetched using the fetch tool.
Usage: bin/nutch org.apache.nutch.tools.FetchListTool (-local | -ndfs <namenode:port>)
<db> <segment_dir> [-refetchonly] [-anchoroptimize linkdb] [-topN N]
[-cutoff cutoffscore] [-numFetchers numFetchers] [-adddays numDays]
Command line parameters:
-topN N where N is a number of pages.
Normally, the "generate" command prepares a fetchlist out of all unfetched pages, or the ones where fetch interval already expired. But if you use -topN, then instead of all unfetched urls you only get N urls with the highest score - potentially the most interesting ones, which should be prioritized in fetching.
- Juho Mäkinen