Guide for Hadoop Core Committers

This page contains Hadoop Core-specific guidelines for committers.

New committers

New committers are encouraged to first read Apache's generic committer documentation:

The first act of a new core committer is typically to add their name to the credits page. This requires changing the XML source in Once done, update the Hadoop website as described here.


Hadoop committers should, as often as possible, attempt to review patches submitted by others. Ideally every submitted patch will get reviewed by a committer within a few days. If a committer reviews a patch they've not authored, and believe it to be of sufficient quality, then they can commit the patch, otherwise the patch should be cancelled with a clear explanation for why it was rejected.

The list of submitted patches is in the Hadoop Review Queue. This is ordered by time of last modification. Committers should scan the list from top-to-bottom, looking for patches that they feel qualified to review and possibly commit.

For non-trivial changes, it is best to get another committer to review your own patches before commit. Use "Submit Patch" like other contributors, and then wait for a "+1" from another committer before committing.


Patches should be rejected which do not adhere to the guidelines in HowToContribute and to the CodeReviewChecklist. Committers should always be polite to contributors and try to instruct and encourage them to contribute better patches. If a committer wishes to improve an unacceptable patch, then it should first be rejected, and a new patch should be attached by the committer for review.


Hadoop uses git for version control. The writable repo is at - TODO

Initial setup

We try to keep our history all linear and avoid merge commits. To this end, we highly recommend using git pull --rebase. In general, it is a good practice to have this always turned on. If you haven't done so already, you should probably run the following:

$ git config --global branch.autosetuprebase always

Committing a patch

When you commit a patch, please follow these steps:

  1. CHANGES.txt: Add an entry in CHANGES.txt, at the end of the appropriate section. This should include the JIRA issue ID, and the name of the contributor.

  2. Commit locally: Commit the change locally to the appropriate branch (should be master if it is not a feature branch) using git commit -a -m <commit-message>. The commit message should include the JIRA issue id, along with a short description of the change and the name of the contributor if it is not you. Note: Be sure to get the issue id right, as this causes JIRA to link to the change in git (use the issue's "All" tab to see these). Verify all the changes are included in the commit using git status. If there are any remaining changes (previously missed files), please commit them and squash these commits into one using git rebase -i.

  3. Pull latest changes from remote repo: Pull in the latest changes from the remote branch using git pull --rebase (--rebase is not required if you have setup git pull to always --rebase). Verify this didn't cause any merge commits using git log [--pretty=oneline]

  4. Push changes to remote repo: Build and run a test to ensure it is all still kosher. Push the changes to the remote (main) repo using git push <remote> <branch>.

  5. If the changes were to master, cherry-pick the changes to other appropriate branches via git cherry-pick -x <commit>. The -x option records the source commit. Make sure to resolve any conflicts.

  6. Resolve the issue as fixed, thanking the contributor. Always set the "Fix Version" at this point, but please only set a single fix version, the earliest release in which the change will appear. Special case- when committing to a non-mainline branch (such as branch-0.22 or branch-0.23 ATM), please set fix-version to either 2.x.x or 3.x.x appropriately too.

This How-to-commit video has guidance on the commit process, albeit using svn. Most of the process is still the same, except that we now use git instead.

Committing Documentation

Hadoop's official documentation is authored using Forrest. To commit documentation changes you must have Forrest installed and the forrest executable on your $PATH. Note that the current version (0.8) doesn't work properly with Java 6, use Java 5 instead. Documentation is of two types:

  1. End-user documentation, versioned with releases; and,
  2. The website. This is maintained separately in subversion, republished as it is changed.

To commit end-user documentation changes to trunk or a branch, ask the user to submit only changes made to the *.xml files in src/docs. Apply that patch, run ant docs to generate the html, and then commit. End-user documentation is only published to the web when releases are made, as described in HowToRelease.

To commit changes to the website and re-publish them:

svn co
cd site
firefox publish/index.html # preview the changes
svn stat                   # check for new pages
svn add                    # add any new pages
svn commit
cd /www/
svn up

Changes to website (via svn up) might take up to an hour to be reflected on Apache Hadoop site.

Backporting commits to previous branches

If a patch needs to be backported to previous branches, follow these steps.

  1. Commit the changes to trunk and note down the revision number, say 4001. (Revision number is displayed as response to your svn commit command).

  2. Check out the desired branch and execute this command from the root directory.

    svn merge -r 4000:4001 .
    # Now resolve any merge conflicts.  
    # If major edits are needed, produce a new patch and upload it to the JIRA.
    svn diff CHANGES.txt # get all JIRA numbers included in this merge
    svn commit -m "merge <list all JIRA numbers here>"

Please be sure to include JIRA number(s) in the commit message for merge commits. Sometimes developers just put something like "merge -r 4000:4001" in the merge message, which fails to trigger the JIRA/Subversion integration, so the JIRA doesn't record the branch commit. It is important to link to the JIRA number, so that when looking at the JIRA it will be clear that this patch has been merged to this branch.

Patches that break HDFS, YARN and MapReduce

In general, the process flow is that Jenkins notices the checkin and automatically builds the new versions of the common libraries and pushes them to Nexus, the Apache Maven repository.

However, to speed up the process or if Jenkins is not working properly, developers can push builds manually to Nexus. To do so, they need to create a file in ~/.m2/settings.xml that looks like:

    <!-- To publish a snapshot of some part of Maven -->
      <username> <!-- YOUR APACHE SVN USERNAME --> </username>
      <password> <!-- YOUR APACHE SVN PASSWORD --> </password>
    <!-- To publish a website of some part of Maven -->
      <username> <!-- YOUR APACHE SSH USERNAME --> </username>
    <!-- To stage a release of some part of Maven -->
      <username> <!-- YOUR APACHE SVN USERNAME --> </username>
      <password> <!-- YOUR APACHE SVN PASSWORD --> </password>
    <!-- To stage a website of some part of Maven -->
      <!-- must match hard-coded repository identifier in site:stage-deploy -->
      <username> <!-- YOUR APACHE SSH USERNAME --> </username>

After you have committed the change to Common, do an "ant mvn-publish" to publish the new jars.

As a security note, since the settings.xml file contains your Apache svn password in the clear, I prefer to leave the settings file encrypted using gpg when I'm not using it. I also don't ever publish from a shared machine, but that is just me being paranoid. :)


Committers should hang out in the #hadoop room on for real-time discussions. However any substantive discussion (as with any off-list project-related discussion) should be re-iterated in JIRA or on the developer list.