You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 56 Next »

This article is about how to bring your graph into a format the Graph module of Hama can read and successfully let you run algorithms on it.

For this concrete example, the Google web graph from 2002 is used (http://snap.stanford.edu/data/web-Google.html), we will give you a step by step guide from the download to a run of Pagerank on this dataset. To run this example in your code, you will need the http://code.google.com/p/guava-libraries/.

# to be extended

    Path txtPath = new Path("/tmp/web-Google.txt");
    Path input = new Path("/tmp/pagerankin.seq");
    HamaConfiguration conf = new HamaConfiguration(new Configuration());
    HashMultimap<Integer, Integer> map = HashMultimap.create();
    BufferedReader br = new BufferedReader(new InputStreamReader(
        fileSystem.open(txtPath)));
    String line = null;
    while ((line = br.readLine()) != null) {
      String[] split = line.split("\t");
      map.put(Integer.parseInt(split[0]), Integer.parseInt(split[1]));
    }

    Set<Entry<Integer, Collection<Integer>>> entries = map.asMap().entrySet();

    Path input = new Path(
        "/Users/thomas.jungblut/Downloads/web_in/pagerankin.seq");
    VertexWritable.CONFIGURATION = conf;
    SequenceFile.Writer writer = new SequenceFile.Writer(fileSystem, conf,
        input, VertexWritable.class, VertexArrayWritable.class);
    for (Entry<Integer, Collection<Integer>> entry : entries) {
      VertexWritable<Text, DoubleWritable> key = new VertexWritable<Text, DoubleWritable>(
          new DoubleWritable(0.0d), new Text(entry.getKey() + ""), Text.class,
          DoubleWritable.class);
      ArrayList<Integer> arrayList = new ArrayList<Integer>(entry.getValue());
      @SuppressWarnings("unchecked")
      VertexWritable<Text, NullWritable>[] adjacents = new VertexWritable[entry
          .getValue().size()];
      for (int i = 0; i < adjacents.length; i++) {
        adjacents[i] = new VertexWritable<Text, NullWritable>(
            NullWritable.get(), new Text(arrayList.get(i) + ""), Text.class,
            NullWritable.class);
      }
      VertexArrayWritable val = new VertexArrayWritable();
      val.set(adjacents);
      writer.append(key, val);
    }
    writer.close();

Now we can setup the job with the following code

 GraphJob pageJob = new GraphJob(conf, PageRank.class);
    pageJob.setJobName("Pagerank");
    pageJob.set("hama.graph.repair", "true");

    pageJob.setVertexClass(PageRankVertex.class);
    pageJob.setInputPath(input);
    pageJob.setOutputPath(new Path("/tmp/pagerank-output/"));

    // set the defaults
    pageJob.setMaxIteration(30);
    // we need to include a vertex in its adjacency list,
    // otherwise the pagerank result has a constant loss
    pageJob.set("hama.graph.self.ref", "true");
    pageJob.setAggregatorClass(AverageAggregator.class);

    pageJob.setVertexIDClass(Text.class);
    pageJob.setVertexValueClass(DoubleWritable.class);
    pageJob.setEdgeValueClass(NullWritable.class);

    pageJob.setInputFormat(SequenceFileInputFormat.class);
    pageJob.setPartitioner(HashPartitioner.class);
    pageJob.setOutputFormat(SequenceFileOutputFormat.class);
    pageJob.setOutputKeyClass(Text.class);
    pageJob.setOutputValueClass(DoubleWritable.class);

    long startTime = System.currentTimeMillis();
    if (pageJob.waitForCompletion(true)) {
      System.out.println("Job Finished in "
          + (System.currentTimeMillis() - startTime) / 1000.0 + " seconds");
    }

You should see the algorithm converge relative fast, in my case it was after nine supersteps. If you read the results back from the sequencefile output, you will see the following top 10 ranked sites:

885605 = 0.00149900065779375
846221 = 0.0010280702392776039
557124 = 8.654234880507804E-4
537039 = 6.634317501245855E-4
163075 = 6.529762251084758E-4
597621 = 6.503367245789417E-4
41909 = 5.845160681337011E-4
551829 = 5.702205338951212E-4
504140 = 5.507901000809657E-4
765334 = 5.432108978490109E-4
486980 = 5.394792436341423E-4
  • No labels