Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Instead of running a separate job, we inject a partitioning superstep before the first superstep of the job. (This has a dependency on the Superstep API) For graph jobs, we can configure this partitioning superstep class specific to graph partitioning class that partitions and loads vertices. - Suraj Menon
    • Since scaling the number of BSP tasks between supersteps in single job, is not possible, the key is "how to launch the number of tasks differently with the number of file blocks". Even though graph uses own graph partitioning class, the fact that parsing vertex structure should be done at loadVertices() method hasn't changed to avoid unnecessary IO overheads. - Edward J. Yoon
  • The partitions instead of being written to HDFS, which is creating a copy of input files in HDFS Cluster (too costly I believe), should be written to local files and read from. - Suraj Menon
    • If we want to run multiple jobs on same input, for example, friendship graph is input and want to run PageRank job and SSSP job, ...., etc., reuse of partitions should be considered. If partitions are written on local fs, metadata should be managed to reuse them. - Edward J. Yoon

...