Differences between revisions 11 and 12
Revision 11 as of 2012-04-19 15:13:05
Size: 3348
Comment:
Revision 12 as of 2012-09-12 12:46:12
Size: 2848
Comment:
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
 * The implementation for the SSSP can be found at [[https://svn.apache.org/repos/asf/incubator/hama/branches/0.4/examples/src/main/java/org/apache/hama/examples/ShortestPaths.java|ShortestPath]].
Line 20: Line 19:
bin/hama jar ../hama-0.4.0-examples.jar sssp <start vertex> <input path> <output path> [number of tasks] bin/hama jar ../hama-0.x.0-examples.jar sssp <start vertex> <input path> <output path> [number of tasks]
Line 43: Line 42:
Now you need to transform the text file using:
{{{
bin/hama jar ../hama-0.4.0-examples.jar sssp-text2seq /tmp/input.txt /tmp/out/
}}}
Line 51: Line 45:
bin/hama jar ../hama-0.4.0-examples.jar sssp Berlin /tmp/out /tmp/sssp-output bin/hama jar ../hama-0.x.0-examples.jar sssp Berlin /tmp/input.txt /tmp/sssp-output
Line 70: Line 64:
In the output sequence file you should get a org.apache.hadoop.io.Text (KEY) and org.apache.hadoop.io.IntWritable (VALUE) pair which is exactly the output from above.

Single Source Shortest Paths

  • The SSSP (abbr. for Single Source Shortest Paths) algorithm described in the Google Pregel paper was used.

  • Introduces IO usage, partitioning based on hashing of vertextID, and collective communication.

Short summary of the algorithm

  • Initialize all vertices' cost to reach it to INFINITY, just the start vertex will have cost 0.
  • Initially send the new cost to all adjacent vertex containing the new cost plus the edge weight between them
  • Reviewing messages: if the cost coming from a message is lower than the actual cost, update the cost and send a message to the adjacent vertices, containing the new cost plus the edge weight between them (similar to the last step)
  • Repeat the last step until no updates can be made anymore.

Usage

bin/hama jar ../hama-0.x.0-examples.jar sssp <start vertex> <input path> <output path> [number of tasks]

You need to provide a start vertex name from where the computation should start calculating the shortest paths, scroll down how to provide an input file for it.

Submit your own graph

You can transform your graph as a adjacency list to fit into the input which Hama is going to parse and calculate the SSSP.

The file that Hama can successfully parse is a TextFile that has the following layout:

Berlin\tFrankfurt:20\tMunich:50
Frankfurt\tBerlin:20\tMunich:10
Munich

This piece of text will adjacent Berlin to Frankfurt (with edge weight of 20) and Munich (with edge weight of 10). Munich is a dangling node, it has no outlinks. As you can see a vertex is always on the leftmost side (we call it the key-site), and the outlinks (to which other vertex it is connected to) are seperated by tabs (\t) as the following elements. SSSP needs edge weights, you must provide them by separating the name of the vertex with a colon ":". The weight must be an integer.

Make sure that every vertex's outlink can somewhere be found in the file as a key-site. Otherwise it will result in weird NullPointerExceptions.

Then you can run sssp on it with:

bin/hama jar ../hama-0.x.0-examples.jar sssp Berlin /tmp/input.txt /tmp/sssp-output

Note that based on what you have configured, the paths may be in HDFS or on local disk.

Output

After the job ran you can see a small snapshot of what the algorithm calculated, for the textfile above you should see:

12/02/24 16:47:48 INFO bsp.BSPJobClient: Current supersteps number: 5
12/02/24 16:47:48 INFO bsp.BSPJobClient: The total number of supersteps: 5
Berlin | 0
Munich | 30
Frankfurt | 20
Job Finished in 4.018 seconds

On the left side you see your vertex name and on the right the cost which is needed to get to that vertex.

SSSP (last edited 2012-09-12 12:46:12 by thomasjungblut)