You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 18 Next »

We should remove this

public class PageRank {

  public static void main(String[] argv) {
    int[][] data = { 
      { 2, 3, 4, 5, 7 }, 
      { 1 }, 
      { 1, 2 }, 
      { 2, 3, 5 },
      { 1, 3, 4, 6 },
      { 1, 5 }, 
      { 5 } 
    };

    double[][] a = new double[data.length][data.length];
    for (int i = 0; i < data.length; i++) {
      for (int j = 0; j < data[i].length; j++) {
        a[data[i][j] - 1][i] = 1.0 / data[i].length;
      }
    }
    double[] eigVector = new Matrix(a).eigenDecomposition().getV().transpose().getArray()[0];
    double norm1 = new Matrix(eigVector , eigVector .length).norm1();
    for (int i = 0; i < eigVector.length; i++)
      System.out.println(eigVector[i] / norm1);
  }
}

PageRank

  • Uses the PageRank algorithm described in the Google Pregel paper
  • Introduces partitioning and collective communication
  • Lets the user submit his/her own TextFile to calculate the sites' Pagerank!

Implementation

For detailed questions in terms of implementation have a look at my blog. It describes the algorithm and focuses on the main ideas showing implementation things.

http://codingwiththomas.blogspot.com/2011/04/pagerank-with-apache-hama.html

Usage

hama/bin/hama jar ../hama-0.x.0-examples.jar page <optional: output path> <optional: input path>

Change "x" to the version you are using!

Note: If you provide your own textfile, make sure you've set the output path.

The output path should never be the root path!

Submit your own Web-graph

You can transform your graph as a adjacency list to fit into the input which Hama is going to parse and calculate the Pagerank.

The file that Hama can successfully parse is a TextFile that has the following layout:

Site1\tSite2\tSite3
Site2\tSite3
Site3

This piece of text will adjacent Site1 to Site2 and Site3, Site2 to Site3 and Site3 is a dangling node. As you can see a site is always on the leftmost side (we call it the key-site), and the outlinks are seperated by tabs (\t) as the following elements.

Make sure that every site's outlink can somewhere be found in the file as a key-site. Otherwise it will result in weird NullPointerExceptions.

Output

The output is a double value that is between zero and 1.0. Where 1.0 is a very "famous" site.

All pages' rank should sum up to 1.0, otherwise the algorithm is broken.

Have fun! If you are facing problems, feel free to ask questions on the official mailing list.

  • No labels