Hamburg

Motivation

The MapReduce (M/R) programming model is inappropriate to problems based on data where each portion depends on many other potions and their relations are very complicated. It is because these problems cause as follows:

limit to assigning one reducer
- In case that the relations of data are very complex, assigning intermediate data to appropriate reducers by considering their dependency of partitioned graphs may be very hard. Assigning only one reducer is a straightway to solve complexity dependency, but it is apparent to cause deterioration of scalability.
many M/R iterations
or make an M/R program more complicated
- To avoid above two inefficient methods, the M/R program will be complicated with code to communicate data among data nodes.

These problems are very common in many areas; especially, many graph problems are exemplary. Let's write description of an example Therefore, we try to propose a new programming model, named Hamburg. The main objective of Hamburg is to support well the problems based on data having complexity dependency one another. This page is an initial work of our proposal.

Goal

Follow scalability concept of shared-nothing architecture Support to computation for data having complexity relations, like graph data.

Hamburg

Hambrug is an alternative to M/R programming model. It is based on bulk synchronization parallel (BSP) model. Like M/R, Hambrug takes advantages from shared-nothing architecture (SN), so I expect that it will also show scalablity without almost degradation of performance as the number of participant nodes increases. A Hamburg based on BSP computation step consists of three sub steps:

Computation on data that reside in local storage; it is similar to map operation in M/R.
Each node communicates its necessary data into one another.
All processors synchronize which waits for all of the communications actions to complete.

The main difference between Hamburg and M/R is that Hamburg does not make intermediate data aggregate into reducer. Instead, each computation node communicates only necessary data into one another. It will be efficient if total communicated data is smaller then intermediate data to be aggregated into reducers.

Initial contributors

Edward J. (edwardyoon AT apache.org)
Hyunsik Choi (hyunsik.choi AT gmail.com)

Any volunteers are welcome.

Block Diagram

Hama / Heart
----------------------
MapReduce / Hamburg
-----------------------
 HDFS

Page tree

Hamburg

Motivation

Goal

Hamburg

Initial contributors

Block Diagram