Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Single Source Shortest Paths

  • The SSSP (abbr. for Single Source Shortest Paths) Uses the SSSP algorithm described in the Google Pregel paper was used.
  • Introduces IO usage, partitioning based on hashing of vertextID, and collective communication.

Short summary of the algorithm

...

  • Lets the user submit his/her own SequenceFile to calculate the SSSP's

Implementation

For detailed questions in terms of implementation have a look at my blog. It describes the algorithm and focuses on the main ideas showing implementation things.

http://codingwiththomas.blogspot.com/2011/05/shortest-path-finding-with-apache-hama.html

...

Usage

No Format
hama/bin/hama jar ../hama-0.x.0-examples.jar sssp <start<name of the start vertex> <input<optional: output path> <optional: <outputpath path>of [numberyour ofown tasks]sequencefile>

Change "x" to the version you are using!

Note: If you provide your own sequencefile, make sure you've set the output path.

The output path should never be the root path!You need to provide a start vertex name from where the computation should start calculating the shortest paths, scroll down how to provide an input file for it.

Submit your own

...

Graph

You can transform your graph as a adjacency list to fit into the input which Hama is going to parse and calculate the SSSPshortest paths between the vertices.

The file that Hama can successfully parse is a TextFile that has the following layout:

No Format

Berlin\tFrankfurt:20\tMunich:50
Frankfurt\tBerlin:20\tMunich:10
Munich

This piece of text will adjacent Berlin to Frankfurt (with edge weight of 20) and Munich (with edge weight of 10). Munich is a dangling node, it has no outlinks. As you can see a vertex is always on the leftmost side (we call it the key-site), and the outlinks (to which other vertex it is connected to) are seperated by tabs (\t) as the following elements. SSSP needs edge weights, you must provide them by separating the name of the vertex with a colon ":". The weight must be an integer.

Make sure that every vertex's outlink can somewhere be found in the file as a key-site. Otherwise it will result in weird NullPointerExceptions.

Then you can run sssp on it with:

No Format

bin/hama jar ../hama-0.x.0-examples.jar sssp Berlin /tmp/input.txt /tmp/sssp-output

Note that based on what you have configured, the paths may be in HDFS or on local disk.

Output

After the job ran you can see a small snapshot of what the algorithm calculated, for the textfile above you should see:

No Format

12/02/24 16:47:48 INFO bsp.BSPJobClient: Current supersteps number: 5
12/02/24 16:47:48 INFO bsp.BSPJobClient: The total number of supersteps: 5
Berlin | 0
Munich | 30
Frankfurt | 20
Job Finished in 4.018 seconds

SequenceFile that contains both Key and Value as a Text.

No Format

  K           /                V 
Vertex[Text] / AdjacentVertex : Weight [Text]

A vertex typically contains a name that uniquely identifies a vertex. So you are associating a key vertex that just contains a name to another vertex (that contains its name) to the weight. Both seperated by ":".

Let's look at this sample:

No Format

      K    /   V
    Berlin /  Paris : 25
    Berlin / London : 40
    London / Paris : 10

This will adjacent Berlin to Paris and London, and London to Paris with the given weights.

If you are familiar with MapReduce, this looks like a mapper output that can be easily reduced.

Notes:

Make sure...

  • your names are unique! Otherwise it will result in strange output.
  • it is only ONE file!
  • Key and Value are Text.class fields!
    • Value always contains only ONE vertex
    • Value is separated by ":" to split the name and the weight

Have fun! If you are facing problems, feel free to ask questions on the official mailing listOn the left side you see your vertex name and on the right the cost which is needed to get to that vertex.