You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 728 Next »

TableOfContents(4)

This effort is still a "work in progress". Please feel free to add comments. BRBut please make the content less visible by using smaller fonts. – Edward J. Yoon


Overview

A parallel matrix computation package.

Package Structure

  • org.apache.hama : Dense and structured sparse matrices
  • org.apache.hama.algebra : Algebraic operations on map/reduce
  • org.apache.hama.io : I/O operations with matrices and vectors
  • org.apache.hama.mapred : Map/Reduce Input/Output Formats
  • org.apache.hama.sparse : Unstructured sparse matrices

Sparse Matrix

NOTE:

  • Sparse matrix operations cannot be optimized
  • Sparse structures which are growable can exceed the initial bandwidth allocation, while those which are not growable are fixed, and over-allocation will cause an error
  • Matrices which are column major typically perform better with column-oriented operations, and likewise for row major matrices. Matrix/vector multiplication is row-major, while transpose multiplication is column-major

Why sparse matrices?

  • Many classes of problems result in matrices with a large number of zeros
  • A sparse matrix is a special class of matrix that allows only the non-zero terms to be stored
  • Reduction in the storage requirements for sparse matrices
  • Significant speed improvement as many calculations involving zero elements are neglected

Storage of sparse matrices

We choosed HBase which column-oriented sparse table storage to reduce storage and complexity.

  • Hama use column-oriented storage of matrices (HBase) , and so compressed column format is a natural choice of sparse storage
  • Hama forces the elements of each column to be stored in increasing order of their row index
  1  0  0       (1,1) = 1           
  0  3  1       (2,2) = 3
  0  0  0       (2,3) = 1

See also: [http://labs.google.com/papers/bigtable-osdi06.pdf Bigtable], A Distributed Storage System for Structured Data

Pseudo code for sparse matrix addition

NOTE:

  • There are no duplicates in the input.
  • No labels