Sound and dynamic concurrency analysis of Hadoop

The purpose of this page is to document how developers can obtain and leverage these tools in Hadoop.

Recently, SureLogic a company offering concurrency analysis tools for Java applications, has donated a license for the following products to the ASF.

It happened as a result of Yahoo! Hadoop team engaging with SureLogic to evaluate their products for Hadoop development. HDFS, MapReduce, and ZooKeeper teams were assessing the tools for three day back in October, 2009. Overall, the results were quite promising and produced some findings already.

Concurrency analysis tools

How to get and install the tools

First of all you need to download and install the tools. Here's the instruction on how to do this. The good news - it shouldn't take more than 5 minutes to get you going. The bad news for non-Eclipse users - you need to get one first. Please refer to the page to find out how to run the tools.

All SureLogic tools have built-in feedback submission feature, so you might consider to share your comments/suggestions with the development team. If you've came across an issue with any of SureLogic tools you might want to file a ticket and monitor its progress. SureLogic has their own issue tracking system which is available here. Also, the discussion group is available.

Now, you'll need a license. Apache committers can download it from here. All you need is your Apache SVN password. Apache contributors, who don't posses SVN password (i.e. non-committers) need to email the request for a copy of the license to 'cos (at) apache (dot) org' or Tim Halloran from SureLogic and the license will be send to you. This slightly cumbersome process will be streamlined in the future.

Hadoop's projects will eventually automatically pull down annotations jar (aka "Promises") at the build time thus the source code with annotations should be compiled just fine. The patches like this are ready but will be applied later.

If you'd like to add SureLogic analysis to your Apache's project, you will need to configure your build to automatically download and use annotation jar file to make your compiler aware about these. It is simple enough to do thanks to Maven. You can take a look to to use its patch for your project as soon as it's committed.

Some examples of annotations and how to use them

HDFS, MapReduce, and ZooKeeper already has some SureLogic annotations in their source code. Here's some examples from HDFS.

First, you need to declare abstract annotation Region and declare all necessary RegionLocks:

@Regions({@Region("HeartbeatState"),
	@Region("DatanodeState")})
@RegionLocks({
	@RegionLock("HeartbeatLock is heartbeats protects HeartbeatState"),
	@RegionLock("DatanodeLock is datanodeMap protects DatanodeState")
})

Then you can start adding the specific lock annotations:

@RequiresLock("DatanodeLock")
private String newStorageID() {
  String newID = null;
  while(newID == null) {
    newID = "DS" + Integer.toString(r.nextInt());
    if (datanodeMap.get(newID) != null)
      newID = null;
  }
  return newID;
}

the annotation @RequireLock is telling the analyzer that the developer is aware about potential issue and he or she is pretty sure that the caller of the method holds named DatanodeLock. This also tells a uninitiated contributor about the pre-requisites which have to be satisfied before the method can be safely called.

Sometimes different methods might have different assumptions of invocation context. Thus one can use as many annotations as needed. E.g. further in the same class

@RequiresLock("HeartbeatLock")
private void updateStats(DatanodeDescriptor node, boolean isAdded) {
...

For more information about how to annotate your code check annotations JavaDoc

How it affects Hadoop development/QA process

SureLogic annotations are used in the source code of core sub-projects at the moment. It is not mandatory for contrib projects this it's up to contributors and committers to decide if they want to use concurrency analysis tools for their components.

SureLogic analysis is going to be included to the test-patch process. This said new patches are required not to raise SureLogic warnings level (similar to the requirements about FindBugs or javac).

Annotations retention policy

Annotations are part of the source code and needed for analysis purpose only. Thus, the promises will be needed only during compile time and for development. A production code won't include this jar file thus there won't be any impact on the final project's bits.

JIRA filing guidelines

If you have found and issue with your application using one of the SureLogic tools above please don't forget to add tag 'surelogic' to the JIRA. It will help to trace concurrency related issues in the future.