Guide for Hadoop Core Committers
This page contains Hadoop Core-specific guidelines for committers.
New committers are encouraged to first read Apache's generic committer documentation:
The first act of a new core committer is typically to add their name to the credits page. This requires changing the XML source in http://svn.apache.org/repos/asf/hadoop/common/site/main/author/src/documentation/content/xdocs/who.xml. Once done, update the Hadoop website as described here.
Hadoop committers should, as often as possible, attempt to review patches submitted by others. Ideally every submitted patch will get reviewed by a committer within a few days. If a committer reviews a patch they've not authored, and believe it to be of sufficient quality, then they can commit the patch, otherwise the patch should be cancelled with a clear explanation for why it was rejected.
The list of submitted patches is in the Hadoop Review Queue. This is ordered by time of last modification. Committers should scan the list from top-to-bottom, looking for patches that they feel qualified to review and possibly commit.
For non-trivial changes, it is best to get another committer to review your own patches before commit. Use "Submit Patch" like other contributors, and then wait for a "+1" from another committer before committing.
Patches should be rejected which do not adhere to the guidelines in HowToContribute and to the CodeReviewChecklist. Committers should always be polite to contributors and try to instruct and encourage them to contribute better patches. If a committer wishes to improve an unacceptable patch, then it should first be rejected, and a new patch should be attached by the committer for review.
Commit individual patches
Hadoop uses git for the main source. The writable repo is at - https://git-wip-us.apache.org/repos/asf/hadoop.git
We try to keep our history all linear and avoid merge commits. To this end, we highly recommend using git pull --rebase. In general, it is a good practice to have this always turned on. If you haven't done so already, you should probably run the following:
$ git config --global branch.autosetuprebase always
Also, we highly recommend setting username and email for git to use:
$ git config [--global] user.name <real-name> $ git config [--global] user.email <email>@apache.org
More recommendations on how to use git with ASF projects can be found here
Committing a patch
When you commit a patch, please follow these steps:
CHANGES.txt: Add an entry in CHANGES.txt, at the end of the appropriate section. This should include the JIRA issue ID, and the name of the contributor.
Commit locally: Commit the change locally to the appropriate branch (should be trunk if it is not a feature branch) using git commit -a -m <commit-message>. The commit message should include the JIRA issue id, along with a short description of the change and the name of the contributor if it is not you. Note: Be sure to get the issue id right, as this causes JIRA to link to the change in git (use the issue's "All" tab to see these). Verify all the changes are included in the commit using git status. If there are any remaining changes (previously missed files), please commit them and squash these commits into one using git rebase -i.
Pull latest changes from remote repo: Pull in the latest changes from the remote branch using git pull --rebase (--rebase is not required if you have setup git pull to always --rebase). Verify this didn't cause any merge commits using git log [--pretty=oneline]
Push changes to remote repo: Build and run a test to ensure it is all still kosher. Push the changes to the remote (main) repo using git push <remote> <branch>.
Backporting to other branches: If the changes were to trunk, we might want to apply them to other appropriate branches.
Cherry-pick the changes to other appropriate branches via git cherry-pick -x <commit-hash>. The -x option records the source commit, and reuses the original commit message. Resolve any conflicts.
- If the conflicts are major, it is preferable to produce a new patch for that branch, review it separately and commit it. When committing an edited patch to other branches, please follow the same steps and make sure to include the JIRA number and description of changes in the commit message.
Resolve the issue as fixed, thanking the contributor. Always set the "Fix Version" at this point, but please only set a single fix version, the earliest release in which the change will appear. Special case- when committing to a non-mainline branch (such as branch-0.22 or branch-0.23 ATM), please set fix-version to either 2.x.x or 3.x.x appropriately too.
This How-to-commit video has guidance on the commit process, albeit using svn. Most of the process is still the same, except that we now use git instead.
Hadoop's official documentation is authored using Forrest. To commit documentation changes you must have Forrest installed and the forrest executable on your $PATH. Note that the current version (0.8) doesn't work properly with Java 6, use Java 5 instead. Documentation is of two types:
- End-user documentation, versioned with releases; and,
- The website. This is maintained separately in subversion, republished as it is changed.
To commit end-user documentation changes to trunk or a branch, ask the user to submit only changes made to the *.xml files in src/docs. Apply that patch, run ant docs to generate the html, and then commit. End-user documentation is only published to the web when releases are made, as described in HowToRelease.
To commit changes to the website and re-publish them:
svn co https://svn.apache.org/repos/asf/hadoop/common/site cd site ant firefox publish/index.html # preview the changes svn stat # check for new pages svn add # add any new pages svn commit ssh people.apache.org cd /www/hadoop.apache.org/common svn up
Changes to website (via svn up) might take up to an hour to be reflected on Apache Hadoop site.
Patches that break HDFS, YARN and MapReduce
In general, the process flow is that Jenkins notices the checkin and automatically builds the new versions of the common libraries and pushes them to Nexus, the Apache Maven repository.
However, to speed up the process or if Jenkins is not working properly, developers can push builds manually to Nexus. To do so, they need to create a file in ~/.m2/settings.xml that looks like:
<settings> <servers> <!-- To publish a snapshot of some part of Maven --> <server> <id>apache.snapshots.https</id> <username> <!-- YOUR APACHE SVN USERNAME --> </username> <password> <!-- YOUR APACHE SVN PASSWORD --> </password> </server> <!-- To publish a website of some part of Maven --> <server> <id>apache.website</id> <username> <!-- YOUR APACHE SSH USERNAME --> </username> <filePermissions>664</filePermissions> <directoryPermissions>775</directoryPermissions> </server> <!-- To stage a release of some part of Maven --> <server> <id>apache.releases.https</id> <username> <!-- YOUR APACHE SVN USERNAME --> </username> <password> <!-- YOUR APACHE SVN PASSWORD --> </password> </server> <!-- To stage a website of some part of Maven --> <server> <id>stagingSite</id> <!-- must match hard-coded repository identifier in site:stage-deploy --> <username> <!-- YOUR APACHE SSH USERNAME --> </username> <filePermissions>664</filePermissions> <directoryPermissions>775</directoryPermissions> </server> </servers> </settings>
After you have committed the change to Common, do an "ant mvn-publish" to publish the new jars.
As a security note, since the settings.xml file contains your Apache svn password in the clear, I prefer to leave the settings file encrypted using gpg when I'm not using it. I also don't ever publish from a shared machine, but that is just me being paranoid.
Committers should hang out in the #hadoop room on irc.freenode.net for real-time discussions. However any substantive discussion (as with any off-list project-related discussion) should be re-iterated in JIRA or on the developer list.