...
– Just a thought, considering the depleted activity in HBase should we not explore ways to avoid HBase ? --Prasen
Represent a graph using adjacency matrix
Perform matrix operations
The Hadoop/Hbase is designed to efficiently process large data set by connecting many commodity computers together to work in parallel but, If there's a inter-node communication, the elapsed run time will be slower with more nodes. Consequently, an "effective" algorithm should avoid large amounts of communication.
Algorithms
Dense Matrix-Matrix multiplication
Blocking jobs:
- Collect the blocks to 'collectionTable' from A and B.
- A map task receives a row n as a key, and vector as its value
- emit (blockID, sub-vector)
- Reduce task combines block
- A map task receives a row n as a key, and vector as its value
...
- A map task receives a blockID n as a key, and two submatrices as its value
- Reduce task computes sum of blocks
Computes maximum absolute row sum
- https://issues.apache.org/jira/browse/HAMA-171
- A map task receives a row n as a key, and vector as its value
- emit (row, the sum of the absolute value of each entries)
- Reduce task selects the maximum one
Computes determinant of matrix
...