You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

This page is a quick HOWTO explaining how to easily contribute patches to Nutch. It assumes you are using some form of Unix.

Getting the source code

First of all, you need the Nutch source code.

Create a directory in which you want to store the Nutch source code on your local drive,

> cd somewhereOnYourDisk
> mkdir nutch
> cd nutch

then get the source code on your local drive using SVN.

svn checkout http://svn.apache.org/repos/asf/nutch/

Working time

Now it is time to work.

Feel free to modify the source code and add some (very) nice features using your favorite IDE.

But take care about the following points

  • All public classes and methods should have informative javadoc.
  • Unit tests are encouraged (http://www.junit.org).

Building a patch

First of all, please perform some minimal non-regression tests by:

  • rebuilding the whole Nutch code
  • executing the whole unit tests.

Building Nutch

> cd somewhereOnYourDisk/nutch
> ant

After a while, if you see

BUILD SUCCESSFUL

all is ok, but if you see

BUILD FAILED

please, read carefully the errors messages and check your code.

Unit Tests

> cd somewhereOnYourDisk/nutch
> ant test

After a while, if you see

BUILD SUCCESSFUL

all is ok, but if you see

BUILD FAILED

please, read carefully the errors messages and check your code.

It is possible to run individual unit tests (useful during development), see bin/nutch junit and WritingPluginExample-1.2#Unit_testing.

Functional Tests

If you are perfectionist you can also perform some functional tests by running Nutch. Please refer to the NutchTutorial

Creating a patch

In order to create a patch, just type from the root of the Nutch directory :

svn diff > myBeautifulPatch.patch
vi myBeautifulPatch.patch

or

git diff --no-prefix > myBeautifulPatch.patch
vi myBeautifulPatch.patch

if you are generating it from a Git repository.

This will report all modifications done on Nutch sources on your local disk and save them into the myBeautifulPath.patch file. Then edit the patch file in order to check that it includes ONLY the modifications you want to add to the Nutch SVN repository.

Remember to generate a patch against a live branch, i.e. trunk for Nutch 1.x and 2.x for Nutch 2.x. The other branches are snapshots of past releases and the code might have evolved since then.

Proposing your work

Finally, patches can either be attached to a message sent to nutch-dev mailing list or to a bug report in Jira (my prefered way in order to easily keep trace of contributions. But it is a very personal point of view).

Testing and reviewing patches

Patches need careful testing and review to avoid regressions. Reviewing patches is mainly the task of committers. But you are welcome to help, esp. if you run into the same problem and found an issue in Jira, yet unresolved but with a patch attached. Review and testing needs time and should include the following steps:

  1. try to reproduce the problem. If you're not able to reproduce a problem, it's impossible to test whether a patch really resolves it.
  2. try to get a clear understanding of the problem
  3. have a look at the patch file: Ideally, you'll get and understanding about the solution proposed by the patch (no problem if not)
  4. apply the patch, see below #Applying_patches
  5. build Nutch and test whether the problem disappeared
  6. does the patch break other things (add regressions)?
  • run the unit tests
  • test situations which you feel they may be affected by the patch
  1. report your finding in Jira or on nutch-dev. It's always better to have one review more, than to introduce a regression because of insufficient testing.

Applying patches

A properly generated patch can be automatically applied to the source tree. The patch utility is one tool to apply patches. Change into the Nutch root folder and run:

> patch -p0 <path_to.patch

Do not ignore the output of patch, it may indicate errors. Applying a patch may (partially) fail, if the source code has changed meanwhile. A good starting point to learn more about patches it the Wikipedia article Patch.

  • No labels