TableOfContents(4)

This effort is still a "work in progress". Please feel free to add comments. BRBut please make the content less visible by using smaller fonts. – Edward J. Yoon


Overview

Hama is a parallel matrix computational package.

Matrices are basically tables. They are ways of storing numbers and other things. Typical matrix has rows and columns. Actually called a 2-way matrix because it has two dimensions. For example, you might have respondents-by-attitudes. Of course, you might collect the same data on the same people at 5 points in time. In that case, you either have 5 different 2-way matrices, or you could think of it as a 3-way matrix, that is respondent-by-attitude-by-time.

We choosed Hbase which <row, column, timestamp> column-oriented sparse table storage to store the matrices.

See also: \[http://labs.google.com/papers/bigtable-osdi06.pdf Bigtable\], A Distributed Storage System for Structured Data


Parallel Strategies for Dense Matrix

In Map/Reduce programming, user can easily take advantage of the below parallel data layouts, communication paradigms.

Square blocking

The matrix multiplication of the original arrays can be transformed into matrix multiplication of blocks. For example,

C_block(1,1)=A_block(1,1)*B_block(1,1) + A_block(1,2)*B_block(2,1)

C                 A               B
+-----+-----+     +-----+-----+   +-----+-----+
| x x |     |     | --> | --> |   | | | |     |
| x x |     |     | --> | --> |   | ↓ ↓ |     |
+-----+-----+  =  +-----+-----+ * +-----+-----+
|     |     |     |     |     |   | | | |     |
|     |     |     |     |     |   | ↓ ↓ |     |
+-----+-----+     +-----+-----+   +-----+-----+

So, we can reduce the time of full scan.