Getting Started with Hama on YARN
Current Hama and Hadoop requires JRE 1.6 or higher and ssh to be set up between nodes in the cluster:
- Sun Java JDK 1.6.x or higher version
For additional information consult our CompatibilityTable.
This tutorial requires Hadoop 2.x already correctly installed. If you haven't done this yet, please follow the official documentation https://hadoop.apache.org/docs/stable/
Most of the configs are the same for Hama on YARN as for other deployment modes. See the configuration page for more information. There are configs that are specific to Hama on YARN.
In order to run Hama on YARN, it must be set this property. This property means run applications on YARN.
The amount of memory used by the BSPApplicationMaster. The total amount of memory used by the ApplicationMaster is calculated as follows. memoryInMb = 3 * BSP_TASK_NUM + hama.appmaster.memory.mb. This is because the application master spawns 1-3 thread per launched task that each should take 1mb, plus a minimum of base memory usage of 100. If you face memory issues, you can set this to a higher value.
Launching Hama on YARN
Ensure that copy yarn-site.xml in in HADOOP_CONF_DIR or YARN_CONF_DIR to HAMA_CONF_DIR. Because this configuration file is used to connect to the YARN.
Launch Hama application which is serialize printing example:
$HAMA_HOME/bin/hama jar hama-yarn-0.7.0-SNAPSHOT.jar org.apache.hama.bsp.YarnSerializePrinting
You should see "Hello BSP Message" which each container spawned in HDFS where you defined output path.
How to write a Hama-YARN job
The BSPModel hasn't changed, but the way to submit a job has.
Basically you just need the following code to submit a Hama-YARN job
HamaConfiguration conf = new HamaConfiguration(); YARNBSPJob job = new YARNBSPJob(conf); job.setBspClass(HelloBSP.class); job.setJarByClass(HelloBSP.class); job.setJobName("Serialize Printing"); job.setMemoryUsedPerTaskInMb(50); job.setNumBspTask(2); job.waitForCompletion(false);
As you can see, instead of a BSPJob you are starting a YARNBSPJob.
The YARNBSPJob offers an extended API for running on YARN. For example you can set the amount of memory used by a task with
How to submit a job
You have to ways to submit a job, you can either submit it via shell and a packed jar, or you can submit from a java application. In both cases you need the hama-yarn jar in the classpath or inside the jar to run correctly.
bin/yarn jar /path_to_jar org.apache.hama.bsp.YarnSerializePrinting
In this case the jar in /path_to_jar contains the hama-yarn jar or it is already in the classpath of your Hadoop application. You have to replace org.apache.hama.bsp.YarnSerializePrinting with the class which contains the main method which runs the Hama Job.
Via Java Application
Just like in the section above, you have to configure the address of the ResourceManager. Then you can run this from a Java Application, just put it into a main-method.
HamaConfiguration conf = new HamaConfiguration(); conf.set("yarn.resourcemanager.address", "0.0.0.0:8040"); YARNBSPJob job = new YARNBSPJob(conf); job.setBspClass(HelloBSP.class); job.setJarByClass(HelloBSP.class); job.setJobName("Serialize Printing"); job.setMemoryUsedPerTaskInMb(50); job.setNumBspTask(2); job.waitForCompletion(false);
How to change existing Hama Jobs to run on YARN
In case you have the following code
// BSP job configuration HamaConfiguration conf = new HamaConfiguration(); BSPJob bsp = new BSPJob(conf); bsp.waitForCompletion(true);
to submit a Hama job. You can just change the BSPJob to YARNBSPJob.