Partition Function

In Hama BSP computing framework, the Partition function is used for obtaining scalability of a Bulk Synchronous Parallel processing, and determining how to distribute the slices of input data among BSP processors. Unlike Map/Reduce data processing model, many scientific algorithms based on Message-Passing Bulk Synchronous Parallel model often requires that a processor obtain “nearby or related” data from other processors in order to complete the computation. In this case, you can create your own Partition function for determining processor inter-communication and how to distribute the data.

Internally, Input data-partitioning works as following sequence:

Create your own Partitioner

Tutorial

....

  BSPJob job = new BSPJob(conf);
  ...
  job.setPartitioner(HashPartitioner.class);
  ...

Specify the partition files and directories

If the input is already partitioned, you can skip pre-partitioning step as following configuration:

  ...