This effort is still a "work in progress". Please feel free to add comments. BRBut please make the content less visible by using smaller fonts. – Edward J. Yoon
Overview
Hama is a parallel matrix computational package.
Matrices are basically tables. They are ways of storing numbers and other things. Typical matrix has rows and columns. Actually called a 2-way matrix because it has two dimensions. For example, you might have respondents-by-attitudes. Of course, you might collect the same data on the same people at 5 points in time. In that case, you either have 5 different 2-way matrices, or you could think of it as a 3-way matrix, that is respondent-by-attitude-by-time.
We choosed Hbase which <row, column, timestamp> column-oriented sparse table storage to store the matrices.
- Hama use column-oriented storage of matrices (HBase) , and so compressed column format is a natural choice of sparse storage
- Hama forces the elements of each column to be stored in increasing order of their row index
See also: [http://labs.google.com/papers/bigtable-osdi06.pdf Bigtable], A Distributed Storage System for Structured Data
Parallel Strategies for Dense Matrix
In Map/Reduce programming, user can easily take advantage of the below parallel data layouts, communication paradigms.
- 1D Row Blocked Layout
- 1D Row Block Cyclic Layout
- 2D Row and Row Blocked Layout
- 2D Row and Row Block Cyclic Layout
Square blocking
The matrix multiplication of the original arrays can be transformed into matrix multiplication of blocks. For example,
C_block(1,1)=A_block(1,1)*B_block(1,1) + A_block(1,2)*B_block(2,1)
C A B +-----+-----+ +-----+-----+ +-----+-----+ | x x | | | --> | --> | | | | | | | x x | | | --> | --> | | ↓ ↓ | | +-----+-----+ = +-----+-----+ * +-----+-----+ | | | | | | | | | | | | | | | | | | ↓ ↓ | | +-----+-----+ +-----+-----+ +-----+-----+
So, we can reduce the time of full scan.