You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 40 Next »

Partition Function

In Hama BSP computing framework, the Partition function is used for obtaining scalability of a Bulk Synchronous Parallel processing, and determining how to distribute the slices of input data among BSP processors. Unlike Map/Reduce data processing model, many scientific algorithms based on Message-Passing Bulk Synchronous Parallel model often requires that a processor obtain “nearby or related” data from other processors in order to complete the computation. In this case, you can create your own Partition function for determining processor inter-communication and how to distribute the data.

Internally, Input data-partitioning works as following sequence:

  • If user specified partition function, internally, "partitioning job" is ran as a pre-processing step.
    • Each task of "partitioning job" reads its assigned data block and rewrite them to particular partition files.
    • Job Scheduler assigns partitions to proper task based on partition ID.
  • After prepartitioning done, launch the BSP job.

Create your own Partitioner

Partitioner Configuration

  BSPJob job = new BSPJob(conf);
  ...
  job.setPartitioner(HashPartitioner.class);
  ...
  • No labels