ProjectSuggestions

Here are some suggestions for some interesting Hadoop projects. For more information, please inquire on the [WWW] Hadoop mailing lists. Also, please update and add to these lists.

  1. Test Projects

  2. Research Projects

  3. Tool Investigations

  4. Random Ideas

Test Projects

Rough estimates are given in hours. These estimates assume an existing understanding of Hadoop.

Description

Estimate

Links

write Junit test cases that run the Hadoop examples

16

MapRed reliability tests

40

[WWW] HADOOP-2483

HDFS reliability tests

40

[WWW] HADOOP-2483

refactor TestDFSUpgradeFromImage (once HADOOP-1622 is committed) to auto zip and unzip the supporting DFS image

6

write compatibility tests for reading the same data set from different HDFS versions

24

re-write (or drop) flaky TestMiniMRWithDFS unit test

8

write the "system tests" for DFS Upgrade

20

[WWW] DFS Upgrade Test Plan

write new unit tests based on code coverage

40

[WWW] Nightly Code Coverage Report

pipes and libhddfs benchmark tests

30

review Findbugs warnings and fix the reasonable warnings

18

[WWW] Nightly Findbugs Warnings

create a distributed JUnit runner on top of Hadoop

80

[WWW] HADOOP-1257

implement a Map-Reduce application which can be used to reliably launch speculative tasks

40

[WWW] HADOOP 2214

Research Projects

Here are some research project ideas, engineering ideas for new participants, and areas where domain experts from other fields might add a lot of value by bringing their perspective into the Hadoop discussion.

Tool Investigations

We are always looking for open source testing tools that add value to our development and build process. Here are some that need to be investigated.

Description

Links

evaluate new unit test frameworks (rewriting some existing test in the new framework to show benefits)

[WWW] Junit4, [WWW] TestNG

evaluate mock object frameworks for unit testing

[WWW] JMockit, [WWW] EasyMock, [WWW] JMock

evaluate PMD

[WWW] PMD

evaluate concurrency test tools

[WWW] PathFinder,
[WWW] ConTest,
[WWW] CheckUncontendedLock,
[WWW] MultithreadedTC

evaluate Fortify

[WWW] Fortify Tools

evaluate dashboards like QALab and Panopticode

[WWW] QALab, [WWW] Panopticode

evaluate Faban

[WWW] Faban, [WWW] Blog

evaluate NCSS, a source code metrics suite

[WWW] JavaNCSS

evaluate Java PathFinder, a software model checker

[WWW] JavaPathFinder

evaluate SA4J, a structural dependency analysis

[WWW] SA4J

evaluate JDepend, generates design quality metrics

[WWW] JDepend

evaluate Dependency Finder, generates design quality metrics and dependency graphs

[WWW] DepFind

evaluate Classycle, finds class and package cyclic dependencies

[WWW] Classycle

evaluate XRadar, an extensible code report tool

[WWW] XRadar

evaluate Crap4j, combines cyclomatic complexity and code coverage

[WWW] Crap4j

evaluate Eclipse TPTP, a test and performance tools platform

[WWW] TPTP

evaluate JUnitFactory

[WWW] JUnit Factory

evaluate code review applications like Codestriker and Review Board

[WWW] Codestriker, [WWW] Review Board

evaluate test automation frameworks

[WWW] STAF

Random Ideas

Description

Estimate

Links

Implement a advanced job control framework to help chain multiple Map-Reduce jobs i.e. investigate/improve upon existing [WWW] org.apache.hadoop.mapred.jobcontrol package.

tbd

Implement a library/framework to support [WWW] Genetic Algorithms on Hadoop Map-Reduce.

tbd

last edited 2008-03-22 06:28:38 by NigelDaley