Table of Contents |
---|
How to
...
Debug BSP
...
Programs
Debugging distributed programs is always difficult, because very few debuggers will let you connect to a remote program that wasn't run with the proper command line arguments.
- Start by getting everything running (likely on a small input) in the local runner.
You do this by setting your BSP Master to "local" in your config. The local runner can run
under the debugger and runs on your development machine. A very quick and easy way to set this
config variable is to include the following line just before you run the job:conf.set("bsp.master.address", "local");
...
Running in local mode makes the job run within 20 threads by default. Since this isn't always very convenient to debug, you can decrease the number of tasks with this line:
conf.set("bsp.local.tasks.maximum", "2")
...
You may also want to do this to make the input and output files be in the local file system rather than in the Hadoop
distributed file system (HDFS):
conf.set("fs.default.name", "local");
You can also set these configuration parameters inhama-site.xml
. The configuration files should appear somewhere in your program's
class path when the program runs.
2. Run the small input on a 1 node cluster. This will smoke out all of the issues that happen with
distribution and the "real" task runner, but you only have a single place to look at logs. Besides the task logs, the most
useful ones are the grooms and bspmaster logs. Make sure you are logging at the INFO level or you will
miss clues like the output
...
Use of Log4J in BSP Applications
First of all, you should import the classes of Log4J client API by adding the following import statements at the beginning of your BSP application.
Code Block | ||
---|---|---|
| ||
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
|
The below example logs INFO level messages by adding line: LOG.info(peer.getPeerName() + ": Logging test: " + data);
within bsp() method of PiEstimator example.
Code Block | ||
---|---|---|
| ||
public static class MyEstimator extends
BSP<NullWritable, NullWritable, Text, DoubleWritable, DoubleWritable> {
...
public static final Log LOG = LogFactory.getLog(MyEstimator.class);
...
@Override
public void bsp(
BSPPeer<NullWritable, NullWritable, Text, DoubleWritable, DoubleWritable> peer)
throws IOException, SyncException, InterruptedException {
int in = 0;
for (int i = 0; i < iterations; i++) {
double x = 2.0 * Math.random() - 1.0, y = 2.0 * Math.random() - 1.0;
if ((Math.sqrt(x * x + y * y) < 1.0)) {
in++;
}
}
double data = 4.0 * in / iterations;
LOG.info(peer.getPeerName() + ": Logging test: " + data);
peer.send(masterTask, new DoubleWritable(data));
peer.sync();
}
|
...
- of
...
- your
...
- tasks
...
- .
...
In distributed mode of Apache Hama, each BSP task processor creates their own log file under {{$HAMA_HOME}/logs/tasklogs
} directory.
...