This effort is still a "work in progress". Please feel free to add comments.
Introduction
HAMA is a distributed framework on Hadoop for massive matrix and graph computations, currently being incubated as one of the incubator project by the Apache Software Foundation.
Goal
The Hama project goal is to provide easy matrix/graph computing programming environment on the Hadoop (distributed system). We are focusing on are as follows:
- Compatibility
- Scalability
- Flexibility
- Usability and Applicability
The overall architecture of HAMA
Below diagram is illustrates the overall architecture of HAMA.
+--------------------------------------+ | Matrix/Graph Computing Program | User Applications +--------------------------------------+ +------------------------------------------+ | HAMA : BSP, Angrapa, ..., etc | Computing Engines +------------------------------------------+ +----------------------------------------------------+ | ZooKeeper | Distributed Locking Service +----------------------------------------------------+ +----------------------------------------------------+ | Hadoop : HDFS, HBase, ..., etc | Distributed Storage Systems +----------------------------------------------------+
BSP framework
The BSP package is a implementation of BSP (Bulk Synchronous Parallel) over Hadoop RPC(sockets).
The BSP package consists of the following components:
Shell/DSL
- Hama DSL (Domain Specific Language) in Groovy – Work in progress
- Hama Shell – Work in progress