Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

How to

...

Debug BSP

...

Programs

Debugging distributed programs is always difficult, because very few debuggers will let you connect to a remote program that wasn't run with the proper command line arguments.

  1. Start by getting everything running (likely on a small input) in the local runner.
    You do this by setting your BSP Master to "local" in your config. The local runner can run
    under the debugger and runs on your development machine. A very quick and easy way to set this
    config variable is to include the following line just before you run the job:
    conf.set("bsp.master.address", "local");

...

Running in local mode makes the job run within 20 threads by default. Since this isn't always very convenient to debug, you can decrease the number of tasks with this line:

conf.set("bsp.local.tasks.maximum", "2")

...


  1. You may also want to do this to make the input and output files be in the local file system rather than in the Hadoop
    distributed file system (HDFS):
    conf.set("fs.default.name", "local");
    You can also set these configuration parameters in hama-site.xml. The configuration files should appear somewhere in your program's
    class path when the program runs.

    2. Run the small input on a 1 node cluster. This will smoke out all of the issues that happen with
    distribution and the "real" task runner, but you only have a single place to look at logs. Besides the task logs, the most
    useful ones are the grooms and bspmaster logs. Make sure you are logging at the INFO level or you will
    miss clues like the output

...

Use of Log4J in BSP Applications

First of all, you should import the classes of Log4J client API by adding the following import statements at the beginning of your BSP application.

Code Block
languagejava

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

The below example logs INFO level messages by adding line: LOG.info(peer.getPeerName() + ": Logging test: " + data); within bsp() method of PiEstimator example.

Code Block
languagejava

  public static class MyEstimator extends
      BSP<NullWritable, NullWritable, Text, DoubleWritable, DoubleWritable> {
   
    ...
    public static final Log LOG = LogFactory.getLog(MyEstimator.class);
    ...

    @Override
    public void bsp(
        BSPPeer<NullWritable, NullWritable, Text, DoubleWritable, DoubleWritable> peer)
        throws IOException, SyncException, InterruptedException {

      int in = 0;
      for (int i = 0; i < iterations; i++) {
        double x = 2.0 * Math.random() - 1.0, y = 2.0 * Math.random() - 1.0;
        if ((Math.sqrt(x * x + y * y) < 1.0)) {
          in++;
        }
      }

      double data = 4.0 * in / iterations;

      LOG.info(peer.getPeerName() + ": Logging test: " + data);
      peer.send(masterTask, new DoubleWritable(data));
      peer.sync();
    }

...

  1. of

...

  1. your

...

  1. tasks

...

  1. .

...

In distributed mode of Apache Hama, each BSP task processor creates their own log file under {{$HAMA_HOME}/logs/tasklogs} directory.

...