Upgrade From Nutch 0.7 To Nutch 0.8

Configuration changes

  • see the Tutorial
    • put your root urls in urls/whatever_name instead of urls
    • make sure you set up http.agent.name

Index migration

Unfortunately, the data is not portable between these versions. The only thing you could do to preserve your webdb is to dump it into a text file, and then inject into a 0.8 crawldb. As for the segments, you will have to refetch them.

  • No labels