Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The partitioner is designed for determining how to distribute the input data among computing workers of a Bulk Synchronous Parallel processing. Remember, this is not related with output collection, unlike MapReduceMap/Reduce's partition function.

Input data-partitioning works as following sequence:

  • If user specified partition function, internally, "partitioning job" is ran as a pre-processing step.

...

    • Each task of "partitioning job" reads its assigned data block and rewrite them to particular partition files.
  • After prepartitioning done, launch the mapreduce job.

No Format
  BSPJob job = new BSPJob(conf);
  ...
  job.setPartitioner(HashPartitioner.class);
  ...