How to Contribute to Hadoop Common
This page describes the mechanics of how to contribute software to Hadoop Common. For ideas about what you might contribute, please see the ProjectSuggestions page.
Getting the source code
First of all, you need the Hadoop source code. The official location for Hadoop is the Apache SVN repository; Git is also supported, and useful if you want to make lots of local changes -and keep those changes under some form or private or public revision control.
SVN Access
Get the source code on your local drive using SVN. Most development is done on the "trunk":
svn checkout http://svn.apache.org/repos/asf/hadoop/common/trunk/ hadoop-common-trunk
You may also want to develop against a specific release. To do so, visit http://svn.apache.org/repos/asf/hadoop/common/tags/ and find the release that you are interested in developing against. To checkout this release, run:
svn checkout http://svn.apache.org/repos/asf/hadoop/common/tags/release-X.Y.Z/ hadoop-common-X.Y.Z
If you prefer to use Eclipse for development, there are instructions for setting up SVN access from within Eclipse at EclipseEnvironment.
The Hadoop system is split into three separate projects: common, hdfs, and mapreduce. You'll also need to check out the other subprojects:
svn checkout http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/ hadoop-hdfs-trunk svn checkout http://svn.apache.org/repos/asf/hadoop/mapreduce/trunk/ hadoop-mapred-trunk
Git Access
See GitAndHadoop
Making Changes
Before you start, send a message to the Hadoop developer mailing list, or file a bug report in Jira. Describe your proposed changes and check that they fit in with what others are doing and have planned for the project. Be patient, it may take folks a while to understand your requirements.
Modify the source code and add some (very) nice features using your favorite IDE.
But take care about the following points
All public classes and methods should have informative Javadoc comments.
- Do not use @author tags.
Code should be formatted according to Sun's conventions, with one exception:
- Indent two spaces per level, not four.
- Contributions should pass existing unit tests.
New unit tests should be provided to demonstrate bugs and fixes. JUnit is our test framework:
You must implement a class that extends junit.framework.TestCase and whose class name starts with Test.
Define methods within your class whose names begin with test, and call JUnit's many assert methods to verify conditions; these methods will be executed when you run ant test.
By default, do not let tests write any temporary files to /tmp. Instead, the tests should write to the location specified by the test.build.data system property.
If a HDFS cluster or a MapReduce cluster is needed by your test, please use org.apache.hadoop.dfs.MiniDFSCluster and org.apache.hadoop.mapred.MiniMRCluster, respectively. TestMiniMRLocalFS is an example of a test that uses MiniMRCluster.
Place your class in the src/test tree.
TestFileSystem.java and TestMapRed.java are examples of standalone MapReduce-based tests.
TestPath.java is an example of a non MapReduce-based test.
You can run all the unit test with the command ant test, or you can run a specific unit test with the command ant -Dtestcase=<class name without package prefix> test (for example ant -Dtestcase=TestFileSystem test)
Using Ant
Hadoop is built by Ant, a Java building tool. This section will eventually describe how Ant is used within Hadoop. To start, simply read a good Ant tutorial. The following is a good tutorial, though keep in mind that Hadoop isn't structured according to the ways outlined in the tutorial. Use the tutorial to get a basic understand of Ant but not to understand how Ant is used for Hadoop:
Good Ant tutorial: http://i-proving.ca/space/Technologies/Ant+Tutorial
Although most Java IDEs ship with a version of Ant, having a command line version installed is invaluable. You can download a version from http://ant.apache.org/.
After installing Ant, you must make sure that it's networking support is configured for any proxy you have. Without that the build will not work, as the Hadoop builds will not be able to download their dependencies using Ivy.
Tip: to see how Ant is set up, run
ant -diagnostics
Generating a patch
Unit Tests
Please make sure that all unit tests succeed before constructing your patch and that no new javac compiler warnings are introduced by your patch.
> cd hadoop-common-trunk > ant -Djavac.args="-Xlint -Xmaxwarns 1000" clean test tar
After a while, if you see
BUILD SUCCESSFUL
all is ok, but if you see
BUILD FAILED
then please examine error messages in build/test and fix things before proceeding.
Unit tests development guidelines HowToDevelopUnitTests
Javadoc
Please also check the javadoc.
> ant javadoc > firefox build/docs/api/index.html
Examine all public classes you've changed to see that documentation is complete, informative, and properly formatted. Your patch must not generate any javadoc warnings.
Creating a patch
Check to see what files you have modified with:
svn stat
Add any new files with:
svn add src/.../MyNewClass.java svn add src/.../TestMyNewClass.java
In order to create a patch, type (from the base directory of hadoop):
svn diff > HADOOP-1234.patch
This will report all modifications done on Hadoop sources on your local disk and save them into the HADOOP-1234.patch file. Read the patch file. Make sure it includes ONLY the modifications required to fix a single issue.
Please do not:
- reformat code unrelated to the bug being fixed: formatting changes should be separate patches/commits.
- comment out code that is now obsolete: just remove it.
- insert comments around each change, marking the change: folks can use subversion to figure out what's changed and by whom.
- make things public which are not required by end users.
Please do:
- try to adhere to the coding style of files you edit;
- comment code whose function or rationale is not obvious;
update documentation (e.g., package.html files, this wiki, etc.)
If you need to rename files in your patch:
- Write a shell script that uses 'svn mv' to rename the original files.
- Edit files as needed (e.g., to change package names).
- Create a patch file with 'svn diff --no-diff-deleted --notice-ancestry'.
- Submit both the shell script and the patch file.
This way other developers can preview your change by running the script and then applying the patch.
Testing your patch
Before submitting your patch, you are encouraged to run the same tools that the automated Hudson patch test system will run on your patch. This enables you to fix problems with your patch before you submit it. The test-patch Ant target will run your patch through the same checks that Hudson currently does except for executing the core and contrib unit tests.
To use this target, you must run it from a clean workspace (ie svn stat shows no modifications or additions). From your clean workspace, run:
ant \ -Dpatch.file=/patch/to/my.patch \ -Dforrest.home=/path/to/forrest/ \ -Dfindbugs.home=/path/to/findbugs \ -Dscratch.dir=/path/to/a/temp/dir \ (optional) -Dsvn.cmd=/path/to/subversion/bin/svn \ (optional) -Dgrep.cmd=/path/to/grep \ (optional) -Dpatch.cmd=/path/to/patch \ (optional) test-patch
At the end, you should get a message on your console that is similar to the comment added to Jira by Hudson's automated patch test system. The scratch directory (which defaults to the value of ${user.home}/tmp) will contain some output files that will be useful in determining what issues were found in the patch.
Some things to note:
the optional cmd parameters will default to the ones in your PATH environment variable
the grep command must support the -o flag (GNU does)
the patch command must support the -E flag
you may need to explicitly set ANT_HOME. Running ant -diagnostics will tell you the default value on your system.
Applying a patch
To apply a patch either you generated or found from JIRA, you can issue
patch -p0 < cool_patch.patch
if you just want to check whether the patch applies you can run patch with --dry-run option
patch -p0 --dry-run < cool_patch.patch
If you are an Eclipse user, you can apply a patch by : 1. Right click project name in Package Explorer , 2. Team -> Apply Patch
Changes that span projects
You may find that you need to modify both the common project and MapReduce or HDFS. Or perhaps you have changed something in common, and need to verify that these changes do not break the existing unit tests for HDFS and MapReduce. Hadoop's build system integrates with a local maven repository to support cross-project development. Use this general workflow for your development:
- Make your changes in common
- Run any unit tests there (e.g. 'ant test')
Publish your new common jar to your local mvn repository:
common$ ant clean jar mvn-install
A word of caution: mvn-install pushes the artifacts into your local Maven repository which is shared by all your projects.
- Switch to the dependent project and make any changes there (e.g., that rely on a new API you introduced in common).
When you are ready, recompile and test this -- using the local mvn repository instead of the public Hadoop repository:
mapred$ ant veryclean test -Dresolvers=internal
- The 'veryclean' target will clear the ivy cache used by any previous builds and force the build to query the upstream repository. Setting -Dresolvers=internal forces Hadoop to check your local build before going outside
- Finally, create separate patches for your common and hdfs/mapred changes, and file them as separate JIRA issues associated with the appropriate projects.
Contributing your work
Finally, patches should be attached to an issue report in Jira via the Attach File link on the issue's Jira. Please add a comment that asks for a code review following our code review checklist. Please note that the attachment should be granted license to ASF for inclusion in ASF works (as per the Apache License ยง5).
When you believe that your patch is ready to be committed, select the Submit Patch link on the issue's Jira. Submitted patches will be automatically tested against "trunk" by Hudson, the project's continuous integration engine. Upon test completion, Hudson will add a success ("+1") message or failure ("-1") to your issue report in Jira. If your issue contains multiple patch versions, Hudson tests the last patch uploaded.
Folks should run ant clean test javadoc checkstyle before selecting Submit Patch. Tests should all pass. Javadoc should report no warnings or errors. Checkstyle's error count should not exceed that listed at Checkstyle Errors Hudson's tests are meant to double-check things, and not be used as a primary patch tester, which would create too much noise on the mailing list and in Jira. Submitting patches that fail Hudson testing is frowned on, (unless the failure is not actually due to the patch).
If your patch involves performance optimizations, they should be validated by benchmarks that demonstrate an improvement.
If your patch creates an incompatibility with the latest major release, then you must set the Incompatible change flag on the issue's Jira 'and' fill in the Release Note field with an explanation of the impact of the incompatibility and the necessary steps users must take.
If your patch implements a major feature or improvement, then you must fill in the Release Note field on the issue's Jira with an explanation of the feature that will be comprehensible by the end user.
Once a "+1" comment is received from the automated patch testing system and a code reviewer has set the Reviewed flag on the issue's Jira, a committer should then evaluate it within a few days and either: commit it; or reject it with an explanation.
Please be patient. Committers are busy people too. If no one responds to your patch after a few days, please make friendly reminders. Please incorporate other's suggestions into your patch if you think they're reasonable. Finally, remember that even a patch that is not committed is useful to the community.
Should your patch receive a "-1" from the Hudson testing, select the Resume Progress on the issue's Jira, upload a new patch with necessary fixes, and then select the Submit Patch link again.
Committers: for non-trivial changes, it is best to get another committer to review your patches before commit. Use Submit Patch link like other contributors, and then wait for a "+1" from another committer before committing. Please also try to frequently review things in the patch queues:
Jira Guidelines
Please comment on issues in Jira, making their concerns known. Please also vote for issues that are a high priority for you.
Please refrain from editing descriptions and comments if possible, as edits spam the mailing list and clutter Jira's "All" display, which is otherwise very useful. Instead, preview descriptions and comments using the preview button (on the right) before posting them. Keep descriptions brief and save more elaborate proposals for comments, since descriptions are included in Jira's automatically sent messages. If you change your mind, note this in a new comment, rather than editing an older comment. The issue should preserve this history of the discussion.
Stay involved
Contributors should join the Hadoop mailing lists. In particular, the commit list (to see changes as they are made), the dev list (to join discussions of changes) and the user list (to help others).