Table of Contents |
---|
If you are a total newb to Hama, please go directly to the Full Walkthrough section.
...
Getting Started with Hama on YARN
Preparations
Current Hama and Hadoop requires JRE 1.6 7 or higher and ssh to be set up between nodes in the cluster:
- Hadoop-0.232.x
- Sun Java JDK 1.6.x 7 or higher version
For additional information consult our CompatibilityTable.
This tutorial requires Hadoop 02.23.0 x already correctly installed. If you haven't done this yet, please follow the official documentation httphttps://hadoop.apache.org/common/docs/r0.23.0/docs/stable/
Configuration
Only two properties which are resource manager address and default filesystem uri is essentially needed for Hama on YARN. The sample configuration is as follows:
No Format |
---|
<!-- Path to your hama-site.xml -->
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>'your resource manager address or hostname':'resource manager port'</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://'your default file system address or hostname':'default file system port'/</value>
</property>
</configuration>
|
See also configuration page for advanced configurations of Hama.
Advanced Properties
Property Name | Default | Meaning |
hama.appmaster.memory.mb | 100mb | The amount of memory used by the BSPApplicationMaster. The total amount of memory used by the ApplicationMaster is calculated as follows. memoryInMb = 3 * BSP_TASK_NUM + hama.appmaster.memory.mb. This is because the application master spawns 1-3 thread per launched task that each should take 1mb, plus a minimum of base memory usage of 100. If you face memory issues, you can set this to a higher value. |
Launching Hama on YARN
Launch Hama application which is serialize printing example:
No Format |
---|
$HAMA_HOME/bin/hama jar hama-yarn-0.7.0.jar org.apache.hama.bsp.YarnSerializePrinting
|
You should see "Hello BSP" Messages which each container spawned in HDFS where you defined output path in your ternimal. If your application is success, you'll be able to got the following message.
No Format |
---|
INFO bsp.YARNBSPJobClient: Application has completed successfully. Breaking monitoring loop
Hello BSP from 1 of 4: cluster-0:16004
Hello BSP from 2 of 4: cluster-1:16006
Hello BSP from 3 of 4: cluster-0:16008
Hello BSP from 4 of 4: cluster-1:16010
Job Finished in 14.838 seconds
|
How to write a Hama-YARN job
BSP job
The BSPModel hasn't changed, but the way to submit a job has.
Basically you just need the following code to submit a Hama-YARN job.
No Format |
---|
HamaConfiguration conf = new HamaConfiguration(); conf.set("yarn.resourcemanager.address", "0.0.0.0:8040"); YARNBSPJob job = new YARNBSPJob(conf); job.setBspClass(HelloBSP.class); job.setJarByClass(HelloBSP.class); job.setJobName("Serialize Printing"); job.setMemoryUsedPerTaskInMb(50); job.setNumBspTask(2); job.waitForCompletion(false); |
...
No Format |
---|
job.setMemoryUsedPerTaskInMb(50); |
How to configure a job
There are some configuration values that the job needs to have in order to submit sucessfully to YARN infrastructure.
The importantest configuration is the yarn.resourcemanager.address
. This should point to the address (hostname+port) where your ResourceManager runs, for example localhost:8040
.
Another important configuration value is the amount of memory used by the BSPApplicationMaster. You can configure a base amount of memory for the application master with this configuration key
No Format |
---|
hama.appmaster.memory.mb
|
By default, this is set to 100mb.
The total amount of memory used by the ApplicationMaster is calculated as follows
No Format |
---|
int memoryInMb = 3 * this.getNumBspTask() + conf.getInt("hama.appmaster.memory.mb", 100)
|
Graph job
Hama Graph jobs also isn't changed but you should change a little code from GraphJob
to YARNGraphJob
to run Hama graph job. Let's show the following link, PageRank on YARN.
Compared to PageRank in existing graph example, this code only is changed from GraphJob
to YARNGraphJob
. How to launch graph job on YARN is same as existing graph job.This is because the application master spawns 1-3 threads per launched task that each should take 1mb, plus a minimum of base memory usage of 100. If you face memory issues, you can set this to a higher value.
How to submit a job
General
...
to submit a Hama job. You can just change the BSPJob
to YARNBSPJob
.
Full Walkthrough
This walkthrough guides you step by step to a working Hama BSP application on YARN. However, you must have correctly installed Hadoop 0.23.x on your machine.
<TODO>
Make some fancy pictures from eclipse and how to get a jar out of it and submitIf you want to submit graph job of Hama, only change BSPJob
object to YARNGraphJob
.