Getting Started with Hama on YARN (Hadoop 0.23.x)

Requirements

Current Hama and Hadoop requires JRE 1.6 or higher and ssh to be set up between nodes in the cluster:

For additional information consult our CompatibilityTable.

This tutorial requires Hadoop 0.23.0 already correctly installed. If you haven't done this yet, please follow the official documentation http://hadoop.apache.org/common/docs/r0.23.0/

How to run the Hama-YARN example

TODO this must be revised once the example has moved the jar.

bin/yarn jar hama-yarn-0.4.0-incubating.jar org.apache.hama.bsp.YarnSerializePrinting

Once running, you should see in the spawned application master log that it is launching containers. When the containers launched, you can see in the logs that there is a little "Hello World" from the other tasks.

How to write a Hama-YARN job

The BSPModel hasn't changed, but the way to submit a job has.

Basically you just need the following code to submit a Hama-YARN job

    HamaConfiguration conf = new HamaConfiguration();
    conf.set("yarn.resourcemanager.address", "0.0.0.0:8040");

    YARNBSPJob job = new YARNBSPJob(conf);
    job.setBspClass(HelloBSP.class);
    job.setJarByClass(HelloBSP.class);
    job.setJobName("Serialize Printing");
    job.setMemoryUsedPerTaskInMb(50);
    job.setNumBspTask(2);
    job.waitForCompletion(false);

As you can see, instead of a BSPJob you are starting a YARNBSPJob.

The YARNBSPJob offers an extended API for running on YARN. For example you can set the amount of memory used by a task with

job.setMemoryUsedPerTaskInMb(50);

How to configure a job

There are some configuration values that the job needs to have in order to submit sucessfully to YARN infrastructure.

The importantest configuration is the yarn.resourcemanager.address. This should point to the address (hostname+port) where your ResourceManager runs, for example localhost:8040.

Another important configuration value is the amount of memory used by the BSPApplicationMaster. You can configure a base amount of memory for the application master with this configuration key

hama.appmaster.memory.mb

By default, this is set to 100mb.

The total amount of memory used by the ApplicationMaster is calculated as follows

int memoryInMb = 3 * this.getNumBspTask() + conf.getInt("hama.appmaster.memory.mb", 100)

This is because the application master spawns 1-3 threads per launched task that each should take 1mb, plus a minimum of base memory usage of 100. If you face memory issues, you can set this to a higher value.

How to submit a job

General

You have to ways to submit a job, you can either submit it via shell and a packed jar, or you can submit from a java application. In both cases you need the hama-yarn jar in the classpath or inside the jar to run correctly.

Via Shell

bin/yarn jar /path_to_jar org.apache.hama.bsp.YarnSerializePrinting

In this case the jar in /path_to_jar contains the hama-yarn jar or it is already in the classpath of your Hadoop application. You have to replace org.apache.hama.bsp.YarnSerializePrinting with the class which contains the main method which runs the Hama Job.

Via Java Application

Just like in the section above, you have to configure the address of the ResourceManager. Then you can run this from a Java Application, just put it into a main-method.

    HamaConfiguration conf = new HamaConfiguration();
    conf.set("yarn.resourcemanager.address", "0.0.0.0:8040");

    YARNBSPJob job = new YARNBSPJob(conf);
    job.setBspClass(HelloBSP.class);
    job.setJarByClass(HelloBSP.class);
    job.setJobName("Serialize Printing");
    job.setMemoryUsedPerTaskInMb(50);
    job.setNumBspTask(2);
    job.waitForCompletion(false);

How to change existing Hama Jobs to run on YARN

In case you have the following code

    // BSP job configuration
    HamaConfiguration conf = new HamaConfiguration();
    BSPJob bsp = new BSPJob(conf);
    bsp.waitForCompletion(true);

to submit a Hama job. You can just change the BSPJob to YARNBSPJob.

GettingStartedYARN (last edited 2011-12-09 19:55:05 by thomasjungblut)