Partitioning

Partition Function

In Hama BSP computing framework, the Partition function is used for obtaining scalability of a Bulk Synchronous Parallel processing, and determining how to distribute the slices of input data among BSP processors. Unlike Map/Reduce data processing model, many scientific algorithms based on Message-Passing Bulk Synchronous Parallel model often requires that a processor obtain “nearby or related” data from other processors in order to complete the computation. In this case, processors determine their communication partners, or neighbors using Partition function.

Internally, Input data-partitioning works as following sequence:

If user specified partition function, internally, "partitioning job" is ran as a pre-processing step.
- Each task of "partitioning job" reads its assigned data block and rewrite them to particular partition files.
After prepartitioning done, launch the BSP job.

Create your own Partitioner

Partitioner Configuration

  BSPJob job = new BSPJob(conf);
  ...
  job.setPartitioner(HashPartitioner.class);
  ...

Page tree

Partitioning

Partition Function

Create your own Partitioner

Partitioner Configuration