You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Dynamic Addition and Removal of Vertexes in Graph API

Introduction

Nowadays more and more people turn to distributed environments to store and analyze Big Data. For that reason there is a growing need in both achieving efficiency and new features. Hama is an upcoming project that gives to researchers and analysts a way to handle big amounts of data, through the BSP computation model.

A very interesting feature of the Hama project is the Graph API for graph analysis. A lot of scientists and companies represent and manage their data with the use of one or more of the many different kinds of graphs (e.g. incremental graphs).

This article will present some new features on the Graph API of Hama, on how to create a dynamic graphs on run time. It also will serve as a technical document of the internals of those features.

Description

At the time being, we will discuss only two different operations (addition and deletion) from both points of the end-user and internal architecture.

Some points that we need to keep in mind, are:

  1. Deletion/Addition is happening after a super step. We are providing methods inside a vertex instance that create/delete a vertex 2. New vertexes need to be distributed through partitioner to be placed on right peers. 3. Keep in mind that a new created vertex and old vertexes, will have the same super step counter. Various algorithms change their behavior though time, an example with an internal counter to serve as a state is needed

Implementation Details

The GraphJobRunner class, contains the major steps for the graph computation. It starts with an initial setup, where the vertexes are loaded to memory, then there is the main computation loop, where the supersteps occur, and in the end there is the cleanup, where of the graph and the results are written to HDFS.

Our main target (concerning the operations +/-) is to enable the functionality of GraphJobRunner::loadVertices() function during a superstep so the user will be able to call a function during a superstep (e.g. addVertex(value, List<> edges)) to add a vertex. The user will also be able to call a delete method (e.g. deleteVertex(vertexID)) to remove vertexes.

An important note here is that adding or removing vertexes is happening in every sync of the peers. This means that during a superstep the additions or deletions will be locally stored and after the end of the superstep will be committed, through messages, to the peers through the partitioner.

If you plan to use the ListVerticesInfo, you will need to sort the list in each step after adding/removing vertices. You can consider using of SortedSet instead of list. – Edward

  • No labels