How to Debug BSP Programs
Debugging distributed programs is always difficult, because very few debuggers will let you connect to a remote program that wasn't run with the proper command line arguments.
- Start by getting everything running (likely on a small input) in the local runner.
You do this by setting your BSP Master to "local" in your config. The local runner can run
under the debugger and runs on your development machine. A very quick and easy way to set this
config variable is to include the following line just before you run the job:
conf.set("bsp.master.address", "local");
Running in local mode makes the job run within 20 threads by default. Since this isn't always very convenient to debug, you can decrease the number of tasks with this line:
conf.set("bsp.local.tasks.maximum", "2")
Obviously, this sets the number of tasks used to 2.
You may also want to do this to make the input and output files be in the local file system rather than in the Hadoop
distributed file system (HDFS):
conf.set("fs.default.name", "local");
You can also set these configuration parameters in hama-site.xml
. The configuration files should appear somewhere in your program's
class path when the program runs.
2. Run the small input on a 1 node cluster. This will smoke out all of the issues that happen with
distribution and the "real" task runner, but you only have a single place to look at logs. Besides the task logs, the most
useful ones are the grooms and bspmaster logs. Make sure you are logging at the INFO level or you will
miss clues like the output of your tasks.
Use of Log4J in BSP applications
First of all, you should import the classes of Log4J client API by adding the following import statements at the beginning of your BSP application.
import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory;
The below example logs INFO level messages by adding line: LOG.info(peer.getPeerName() + ": Logging test: " + data);
within bsp() method.
@Override public void bsp( BSPPeer<NullWritable, NullWritable, Text, DoubleWritable, DoubleWritable> peer) throws IOException, SyncException, InterruptedException { int in = 0; for (int i = 0; i < iterations; i++) { double x = 2.0 * Math.random() - 1.0, y = 2.0 * Math.random() - 1.0; if ((Math.sqrt(x * x + y * y) < 1.0)) { in++; } } double data = 4.0 * in / iterations; LOG.info(peer.getPeerName() + ": Logging test: " + data); peer.send(masterTask, new DoubleWritable(data)); peer.sync(); }
In local mode of Apache Hama, you'll see the INFO messages on console:
$ bin/hama jar examples/target/hama-examples-0.7.0-SNAPSHOT.jar pi 13/05/14 16:02:32 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 13/05/14 16:02:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 13/05/14 16:02:32 INFO bsp.BSPJobClient: Running job: job_localrunner_0001 13/05/14 16:02:32 INFO bsp.LocalBSPRunner: Setting up a new barrier for 10 tasks! 13/05/14 16:02:32 INFO examples.PiEstimator$MyEstimator: local:6Logging test: 3.1412 13/05/14 16:02:32 INFO examples.PiEstimator$MyEstimator: local:9Logging test: 3.1308 13/05/14 16:02:32 INFO examples.PiEstimator$MyEstimator: local:5Logging test: 3.1304 13/05/14 16:02:32 INFO examples.PiEstimator$MyEstimator: local:4Logging test: 3.1756 13/05/14 16:02:32 INFO examples.PiEstimator$MyEstimator: local:3Logging test: 3.1444 13/05/14 16:02:32 INFO examples.PiEstimator$MyEstimator: local:8Logging test: 3.1452 13/05/14 16:02:32 INFO examples.PiEstimator$MyEstimator: local:0Logging test: 3.1468 13/05/14 16:02:32 INFO examples.PiEstimator$MyEstimator: local:1Logging test: 3.1684 13/05/14 16:02:32 INFO examples.PiEstimator$MyEstimator: local:2Logging test: 3.1256 13/05/14 16:02:32 INFO examples.PiEstimator$MyEstimator: local:7Logging test: 3.114 13/05/14 16:02:35 INFO bsp.BSPJobClient: Current supersteps number: 0 13/05/14 16:02:35 INFO bsp.BSPJobClient: The total number of supersteps: 0 13/05/14 16:02:35 INFO bsp.BSPJobClient: Counters: 7 13/05/14 16:02:35 INFO bsp.BSPJobClient: org.apache.hama.bsp.JobInProgress$JobCounter 13/05/14 16:02:35 INFO bsp.BSPJobClient: SUPERSTEPS=0 13/05/14 16:02:35 INFO bsp.BSPJobClient: LAUNCHED_TASKS=10 13/05/14 16:02:35 INFO bsp.BSPJobClient: org.apache.hama.bsp.BSPPeerImpl$PeerCounter 13/05/14 16:02:35 INFO bsp.BSPJobClient: SUPERSTEP_SUM=10 13/05/14 16:02:35 INFO bsp.BSPJobClient: TIME_IN_SYNC_MS=59 13/05/14 16:02:35 INFO bsp.BSPJobClient: TOTAL_MESSAGES_SENT=10 13/05/14 16:02:35 INFO bsp.BSPJobClient: TOTAL_MESSAGES_RECEIVED=10 13/05/14 16:02:35 INFO bsp.BSPJobClient: TASK_OUTPUT_RECORDS=1 Estimated value of PI is 3.14224 Job Finished in 3.141 seconds