How to use injection framework

This page will be getting more details about Hadoop development and testing using AspectJ based Injection Framework.

Introduction

The idea of code injection is fairly simple: it is an infusion of new or modification of existing behavior into a code of a software application. Fault Injection, which will be discussed later, is a similar mechanism for adding errors and exceptions into an application's logic to achieve a higher coverage and fault tolerance of the system. Different implementations of this idea are available today. Hadoop's inject framework is built on top of Aspect Oriented Paradigm (AOP) implemented on top of AspectJ toolkit.

The document below relates to injection technique with regards to code/behavior modifications and to injection of programmatic faults (fault-injection). I'll refer to both injection and fault injection aspects of the framework unless narrower cases are explicitly mentioned.

Assumptions

For the sake of production code clearness and to avoid undesirable effects an instrumented code is kept separate from production code. A set of extra build target is used to produced injection related artifacts. They are easily distinguishable by '-fi' suffix as in 'Framework Injection' (not to confuse with 'if' keyword of many programming languages).

The current implementation of the FI framework assumes that the faults it will be emulating are of non-deterministic nature. That is, the moment of a fault's happening isn't known in advance and is a coin-flip based.

Architecture of the Injection Framework

arch-view.gif

Configuration Management

Currently only configuration for injected faults is available. Configuration management allows you to set expectations for faults to happen. The settings can be applied either statically (in advance) or in runtime. The desired level of faults in the framework can be configured two ways:

Probability Model

This is essentially a coin flipper to regulate faults occurrence. The methods of this class are getting a random number between 0.0 and 1.0 and then checking if a new number has happened to be in the range of 0.0 and a configured level for the fault in question. If that condition is true then the fault will occur.

Thus, to guarantee the happening of a fault one needs to set an appropriate level to 1.0. To completely prevent a fault from happening its probability level has to be set to 0.0. The default probability level is set to 0 unless the level is changed explicitly through the configuration file or in the runtime. The name of the default level's configuration parameter is fi.*

Injection mechanism: AOP and AspectJ

The foundation of Hadoop's FI includes a cross-cutting concept implemented by AspectJ. The following basic terms are important to remember:

Predefined Join Points

The following readily available join points are provided by AspectJ:

Aspect Example

This is fault injection example:

   1 package org.apache.hadoop.hdfs.server.datanode;
   2 
   3 import org.apache.commons.logging.Log;
   4 import org.apache.commons.logging.LogFactory;
   5 import org.apache.hadoop.fi.ProbabilityModel;
   6 import org.apache.hadoop.hdfs.server.datanode.DataNode;
   7 import org.apache.hadoop.util.DiskChecker.*;
   8 
   9 import java.io.IOException;
  10 import java.io.OutputStream;
  11 import java.io.DataOutputStream;
  12 
  13 /**
  14 * This aspect takes care about faults injected into datanode.BlockReceiver
  15 * class
  16 */
  17 public aspect BlockReceiverAspects {
  18   public static final Log LOG = LogFactory.getLog(BlockReceiverAspects.class);
  19 
  20   public static final String BLOCK_RECEIVER_FAULT="hdfs.datanode.BlockReceiver";
  21     pointcut callReceivePacket() : call (* OutputStream.write(..))
  22       withincode (* BlockReceiver.receivePacket(..))
  23     // to further limit the application of this aspect a very narrow 'target' can be used as follows
  24     // target(DataOutputStream)
  25       !within(BlockReceiverAspects +);
  26 
  27   before () throws IOException : callReceivePacket () {
  28     if (ProbabilityModel.injectCriteria(BLOCK_RECEIVER_FAULT)) {
  29       LOG.info("Before the injection point");
  30       Thread.dumpStack();
  31       throw new DiskOutOfSpaceException ("FI: injected fault point at " +
  32       thisJoinPoint.getStaticPart( ).getSourceLocation());
  33     }
  34   }
  35 }

The aspect has two main parts:

The pointcut identifies an invocation of class' java.io.OutputStream write() method with any number of parameters and any return type. This invoke should take place within the body of method receivepacket() from class BlockReceiver. The method can have any parameters and any return type. Possible invocations of write() method happening anywhere within the aspect BlockReceiverAspects or its heirs will be ignored.

Note 1: This short example doesn't illustrate the fact that you can have more than a single injection point per class. In such a case the names of the faults have to be different if a developer wants to trigger them separately.

Note 2: After the injection step (see Putting It All Together below) you can verify that the faults were properly injected by searching for ajc keywords in a disassembled class file.

Here's code injection example

   1 package org.apache.hadoop.security;
   2 
   3 import java.io.ByteArrayInputStream;
   4 import java.io.DataInputStream;
   5 import java.io.IOException;
   6 
   7 import org.apache.hadoop.io.WritableUtils;
   8 
   9 privileged aspect AccessTokenHandlerAspects {
  10   /** check if a token is expired. for unit test only.
  11    *  return true when token is expired, false otherwise */
  12   static boolean AccessTokenHandler.isTokenExpired(AccessToken token) throws IOException {
  13     ByteArrayInputStream buf = new ByteArrayInputStream(token.getTokenID()
  14         .getBytes());
  15     DataInputStream in = new DataInputStream(buf);
  16     long expiryDate = WritableUtils.readVLong(in);
  17     return isExpired(expiryDate);
  18   }
  19   
  20   /** set token lifetime. for unit test only */
  21   synchronized void AccessTokenHandler.setTokenLifetime(long tokenLifetime) {
  22     this.tokenLifetime = tokenLifetime;
  23   }
  24 }

The great thing about this is the fact that injected methods needed for a testing will exist in an instrumented build only and will never pollute the production code.

Fault Naming Convention and Namespaces

For the sake of a unified naming convention the following two types of names are recommended for a new aspects development:

Development Tools

Putting It All Together

Aspects (faults) have to injected or woven into the code before they can be used. Follow these instructions:

% ant injectfaults

[iajc] warning at
src/test/aop/org/apache/hadoop/hdfs/server/datanode/ \
          BlockReceiverAspects.aj:44::0
advice defined in org.apache.hadoop.hdfs.server.datanode.BlockReceiverAspects
has not been applied [Xlint:adviceDidNotMatch]

It isn't an error from AspectJ point of view, however Hadoop's build will fail to preserve the integrity of the source code.

% ant jar-fault-inject

% ant jar-test-fault-inject

% ant run-test-hdfs-fault-inject

How to Use the Fault Injection Framework

Faults can be triggered as follows:

% ant run-test-hdfs -Dfi.hdfs.datanode.BlockReceiver=0.12

To set a certain level, for example 25%, of all injected faults use:

% ant run-test-hdfs-fault-inject -Dfi.*=0.25

   1 package org.apache.hadoop.fs;
   2 
   3 import org.junit.Test;
   4 import org.junit.Before;
   5 import org.junit.After;
   6 
   7 public class DemoFiTest {
   8   public static final String BLOCK_RECEIVER_FAULT="hdfs.datanode.BlockReceiver";
   9   @Override
  10   @Before
  11   public void setUp(){
  12     //Setting up the test's environment as required
  13   }
  14 
  15   @Test
  16   public void testFI() {
  17     // It triggers the fault, assuming that there's one called 'hdfs.datanode.BlockReceiver'
  18     System.setProperty("fi." + BLOCK_RECEIVER_FAULT, "0.12");
  19     //
  20     // The main logic of your tests goes here
  21     //
  22     // Now set the level back to 0 (zero) to prevent this fault from happening again
  23     System.setProperty("fi." + BLOCK_RECEIVER_FAULT, "0.0");
  24     // or delete its trigger completely
  25     System.getProperties().remove("fi." + BLOCK_RECEIVER_FAULT);
  26   }
  27 
  28   @Override
  29   @After
  30   public void tearDown() {
  31     //Cleaning up test test environment
  32   }
  33 }

As you can see above these two methods do the same thing. They are setting the probability level of hdfs.datanode.BlockReceiver at 12%. The difference, however, is that the program provides more flexibility and allows you to turn a fault off when a test no longer needs it.

It should be clear that random faults aren't the only possible usage scenario of fault injection. Faults might be initialized by other means as well, i.e. setting certain static variables; instantiation of some objects, etc.

Couple of hints

Eclipse provides very convenient environment for AspectJ development. However, as everything with Eclipse, it has some issues. Here's three simple steps how to make your project AspectJ aware:

After that you suppose to have type completion, syntax highlighting, and cross-references working for both .java and .aj files.

Additional Information and Contacts

These two sources of information are particularly interesting and worth reading:

If you have additional comments or questions for the author check HDFS-435.

Recent presentation (2/11/2011) about code and fault injection could be found here. Here are the slides

HowToUseInjectionFramework (last edited 2011-02-14 19:12:16 by KonstantinIBoudnik)