You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 38 Next »

Single Source Shortest Paths

  • Uses the SSSP algorithm described in the Google Pregel paper
  • Introduces partitioning and collective communication
  • Lets the user submit his/her own SequenceFile to calculate the SSSP's

Implementation

For detailed questions in terms of implementation have a look at my blog. It describes the algorithm and focuses on the main ideas showing implementation things.

http://codingwiththomas.blogspot.com/2011/05/shortest-path-finding-with-apache-hama.html

Usage

hama/bin/hama jar ../hama-0.x.0-examples.jar sssp <name of the start vertex> <optional: output path> <optional: path of your own sequencefile>

Change "x" to the version you are using!

Note: If you provide your own sequencefile, make sure you've set the output path.

The output path should never be the root path!

Submit your own Graph

You can transform your graph as a adjacency list to fit into the input which Hama is going to parse and calculate the shortest paths between the vertices.

The file that Hama can successfully parse is a SequenceFile that contains both Key and Value as a Text.

  K           /                V 
Vertex[Text] / AdjacentVertex : Weight [Text]

A vertex typically contains a name that uniquely identifies a vertex. So you are associating a key vertex that just contains a name to another vertex (that contains its name) to the weight. Both seperated by ":".

Let's look at this sample:

      K    /   V
    Berlin /  Paris : 25
    Berlin / London : 40
    London / Paris : 10

This will adjacent Berlin to Paris and London, and London to Paris with the given weights.

If you are familiar with MapReduce, this looks like a mapper output that can be easily reduced.

Notes:

Make sure...

  • your names are unique! Otherwise it will result in strange output.
  • it is only ONE file!
  • Key and Value are Text.class fields!
    • Value always contains only ONE vertex
    • Value is separated by ":" to split the name and the weight

Sample adjacency list SequenceFile

You can download a adjacencylist SequenceFile containing 2,031,215 vertices and 24,793,713 edges, named with cities in the world and random weights. It is arround 1.14GB large and can be downloaded here:

http://code.google.com/p/hama-shortest-paths/source/browse/trunk/hama-gsoc/files/cities-adjacencylist/adjacencylist.seq2

You can run it with

hama/bin/hama jar ../hama-0.x.0-examples.jar sssp Klewno sample/output PATH_TO_THE_SEQUENCEFILE

Obviously you have to replace "PATH_TO_THE_SEQUENCEFILE" with the path where the sample file is stored.

You can adjust the starting vertex city (here: "Klewno") to the city you want, I'm pretty sure the graph contains it.

Have fun! If you are facing problems, feel free to ask questions on the official mailing list.

  • No labels