Page History

Single Source Shortest Paths

Uses the SSSP The SSSP (abbr. for Single Source Shortest Paths) algorithm described in the Google Pregel paper was used.
Introduces IO usage, partitioning based on hashing of vertextID, and collective communication.
Lets the user submit his/her own SequenceFile to calculate the SSSP's

Implementation

For detailed questions in terms of implementation have a look at my blog. It describes the algorithm and focuses on the main ideas showing implementation things.

Short summary of the algorithm

...

Initialize all vertices' cost to reach it to INFINITY, just the start vertex will have cost 0.
Initially send the new cost to all adjacent vertex containing the new cost plus the edge weight between them
Reviewing messages: if the cost coming from a message is lower than the actual cost, update the cost and send a message to the adjacent vertices, containing the new cost plus the edge weight between them (similar to the last step)
Repeat the last step until no updates can be made anymore.

Usage

No Format
hama/bin/hama jar ../hama-0.x.0-examples.jar sssp <name<start ofvertex> the<input start vertex> <optional: outputpath> <output path> <optional: path[number of your own sequencefile>tasks]

Change "x" to the version you are using!

Note: If you provide your own sequencefile, make sure you've set the output path.

The output path should never be the root path!You need to provide a start vertex name from where the computation should start calculating the shortest paths, scroll down how to provide an input file for it.

Submit your own

...

graph

You can transform your graph as a adjacency list to fit into the input which Hama is going to parse and calculate the shortest paths between the verticesSSSP.

The file that Hama can successfully parse is a SequenceFile that contains both Key and Value as a Text.

No Format
K / V Vertex[Text] / AdjacentVertex : Weight [Text]

A vertex typically contains a name that uniquely identifies a vertex. So you are associating a key vertex that just contains a name to another vertex (that contains its name) to the weight. Both seperated by ":".

Let's look at this sample:

No Format
K / V Berlin / Paris : 25 Berlin / London : 40 London / Paris : 10

This will adjacent Berlin to Paris and London, and London to Paris with the given weights.

If you are familiar with MapReduce, this looks like a mapper output that can be easily reduced.

Notes:

Make sure...

your names are unique! Otherwise it will result in strange output.
it is only ONE file!
Key and Value are Text.class fields!
- Value always contains only ONE vertex
- Value is separated by ":" to split the name and the weight

Sample adjacency list SequenceFile

You can download a adjacencylist SequenceFile containing 2,031,215 vertices and 24,793,713 edges, named with cities in the world and random weights. It is arround 1.14GB large and can be downloaded here:

http://code.google.com/p/hama-shortest-paths/source/browse/trunk/hama-gsoc/files/cities-adjacencylist/adjacencylist.seq2

You can run it with

TextFile that has the following layout:

No Format
Berlin\tFrankfurt:20\tMunich:50 Frankfurt\tBerlin:20\tMunich:10 Munich

This piece of text will adjacent Berlin to Frankfurt (with edge weight of 20) and Munich (with edge weight of 10). Munich is a dangling node, it has no outlinks. As you can see a vertex is always on the leftmost side (we call it the key-site), and the outlinks (to which other vertex it is connected to) are seperated by tabs (\t) as the following elements. SSSP needs edge weights, you must provide them by separating the name of the vertex with a colon ":". The weight must be an integer.

Make sure that every vertex's outlink can somewhere be found in the file as a key-site. Otherwise it will result in weird NullPointerExceptions.

Then you can run sssp on it with:

No Format

No Format
hama/bin/hama jar ../hama-0.x.0-examples.jar sssp Klewno sample/output PATH_TO_THE_SEQUENCEFILE

Obviously you have to replace "PATH_TO_THE_SEQUENCEFILE" with the path where the sample file is stored.

You can adjust the starting vertex city (here: "Klewno") to the city you want, I'm pretty sure the graph contains it.

 Berlin /tmp/input.txt /tmp/sssp-output

Note that based on what you have configured, the paths may be in HDFS or on local disk.

Output

After the job ran you can see a small snapshot of what the algorithm calculated, for the textfile above you should see:

No Format


12/02/24 16:47:48 INFO bsp.BSPJobClient: Current supersteps number: 5
12/02/24 16:47:48 INFO bsp.BSPJobClient: The total number of supersteps: 5
Berlin | 0
Munich | 30
Frankfurt | 20
Job Finished in 4.018 seconds

On the left side you see your vertex name and on the right the cost which is needed to get to that vertexHave fun! If you are facing problems, feel free to ask questions on the official mailing list.

Page tree

Versions Compared

Old Version 50

New Version Current

Key