Apache Hadoop Hackathon, May 18, 2011
Hosted at Cloudera's San Francisco and Palo Alto offices.
This page is aliased at: http://bit.ly/hadoop-hack-may18
Useful resources
Previous hackathon notes: http://bit.ly/hadoop-hack-may11
Eli's build scripts: https://github.com/elicollins/hadoop-dev
Quick Start
Checking out Hadoop: Git:
mkdir hadoop-git ; cd hadoop-git git clone https://github.com/apache/hadoop-common.git git clone https://github.com/apache/hadoop-hdfs.git git clone https://github.com/apache/hadoop-mapreduce.git (or if we fix ssh: #git clone git://git.apache.org/hadoop-common.git #git clone git://git.apache.org/hadoop-mapreduce.git #git clone git://git.apache.org/hadoop-hdfs.git )
svn:
mkdir hadoop-svn ; cd hadoop-svn svn co https://svn.apache.org/repos/asf/hadoop/common/trunk svn co https://svn.apache.org/repos/asf/hadoop/mapreduce/trunk svn co https://svn.apache.org/repos/asf/hadoop/hdfs/trunk (for trunk -- for branches, use /repos/asf/hadoop/common/branches/branch-0.22 )
Running tests
ant test-core -Dtest.output=yes -Dtestcase=TestEditLog
test.output will print output to console, useful for hanging tests
Eclipse: see EclipseEnvironment
Submitting a patch
Open a jira Make change Run tests git diff --no-prefix > /tmp/HADOOP-1234.txt
Review queues
Common: https://issues.apache.org/jira/secure/IssueNavigator.jspa?requestId=12311124&mode=hide
HDFS: https://issues.apache.org/jira/secure/IssueNavigator.jspa?requestId=12313301&mode=hide
MapReduce: https://issues.apache.org/jira/secure/IssueNavigator.jspa?requestId=12313302&mode=hide
Suggestions for what to work on
Infrastructure improvements
Create a Hudson job that produces a release tarball: https://builds.apache.org/hudson/view/G-L/view/Hadoop/job/Hadoop-22-Build/
Include 32-bit and 64-bit native libraries in Jenkins tarball builds: https://issues.apache.org/jira/browse/HADOOP-7283
Make it easier for others to contribute
Improve documentation at HowToContribute, EclipseEnvironment
- Write instructions for other IDEs
- What's the most confusing thing you found about the contribution process? How can we improve it?
Help get 0.22 out the door
Close out 0.22 blockers. Perhaps more appropriate for people with context.
If those are too hard check out the other jiras for 0.22 Common, HDFS, MapReduce
Try to use the release (or build from trunk)
- Work on the documentation
- Try out the current documentation
- File jiras and submit fixes for bugs and improvements.
- Eg config options that should be in the docs but are not..
- Or have been deprecated and should be removed or updated.
- Write new documentation that's needed (eg on FS config)
Setup a small cluster on your laptop or in VMs or using Apache Whirr and bang on it.
Help get trunk in shape
Help out with the SVN unsplit: https://issues.apache.org/jira/browse/HADOOP-7106. Git expertise is welcome!
- Review/commit patches in the review queues:
- Work out the kinks of HBase trunk on HDFS trunk
- Eg HDFS-1103, HDFS-1152, HDFS-1139, HDFS-1056, HDFS-1060.
- Improve error and log messages
- Improve command line usability (eg error messages)