Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: converted to 1.6 markup

...

Table of Contents

This effort is still a "work in progress". Please feel free to add comments. BR

But please make the content less visible by using smaller fonts. – Edward J. Yoon

...

Hama (Hadoop Matrix) is a distributed matrix computation package currently in incubation with Apache. It is a library of matrix operations for large-scale processing and development environments as well as a Map/Reduce framework for a large-scale numerical analysis and data mining, that need the intensive computation power of matrix inversion, e.g., linear regression, PCA, SVM and etc. It will be useful for many scientific applications, e.g., physics computations, linear algebra, computational fluid dynamics, statistics, graphic rendering and many more.

Block Diagram

...

\[http://wiki.apache.org/hama-data/attachments/Architecture/attachments/block.png\]

Implementation

User Interfaces

...

The matrix or network structure that frequently changes should have flexible storage structure for easy update and indicies that point to the appropriate entry. Also, we need a model that uses the concept of column-iterative methods.unmigrated-wiki-markup

HBase is an open-source, distributed, column-oriented store modeled as a google bigtable. Hama uses \[http://hadoop.apache.org/hbase/ Hbase\] to store the matrices and graphs which are represented mathematically.

Matrices are basically tables. They are ways of storing numbers and other things. Typical matrix has rows and columns which is often called a 2-way matrix because it has two dimensions. For example, you might have respondents-by-attitudes. Of course, you might collect the same data on the same people at 5 points in time. In that case, you either have 5 different 2-way matrices, or you could think of it as a 3-way matrix, that is respondent-by-attitude-by-time.

...