Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In setup stage every peer reads input dense vector from file. After that, framework will partition matrix rows by the algorithm provided in custom partitioner automatically. After that local computation is performed. We gain some cells of result vector in bsp procedure, and they are written to output file. Output file is reread to construct instance of dense vector for further computation.

Implementation

How to get

Implementation can be found in my GitHub repository Apache JIRA] and patch can be found in [https://issues.apache.org/jira/browse/HAMA-524 as soon as JIRA will become available. GitHub repository contains only classes related to SpMV. Before you start with SpMV make sure that you have followed this guide and set up environment variables and so on.

Optional additional setup

I considered two possible use cases of SpMV:

...

First variable allows fast access to jar with hama examples, which plased in hama home directory, second variable is prefix in HDFS for tests in this tutorial. If you not defined this variables just substitute appropriate values into following scripts.

Representation of matrices in text format

It was decided to allow users to work with SpMV through text files. So in this section I will describe text format for matrices. I decided to represent all matrices and vectors as follows: each row of the matrix is represented by row index, length of the row, number of non-zero items, pairs of index and value. All values inside rows are separated by whitespace, rows are separated by newline. Vectors are represented as matrix rows with arbitrary row index(not used). So, for example:

No Format

[1 0 2]    3 2 0 1 2 2
[0 0 0]  = 3 0
[0 5 1]    3 2 1 5 2 1

Now let's show some example. Imagine that you need to multiply

No Format

[1 0 6 0]   [2]   [38] 
[0 4 0 0] * [3] = [12] 
[0 2 3 0]   [6]   [24] 
[3 0 0 5]   [0]   [6]

First of all, you should create appropriate text files for input matrix and input vector. For input matrix file should look like

No Format

0 4 2 0 1 2 6
1 4 1 1 4
2 4 2 1 2 2 3
3 4 2 0 3 3 5

For vector file should be look like

No Format

0 4 3 0 2 1 3 2 6

Usage with RandomMatrixGenerator

...

No Format
0:    hadoop dfs -rmr $SPMV/*/*
1:    hama jar $HAMA_EXAMPLES rmgenerator $SPMV/matrix-seq 6 6 0.4 4
2:    hama jar $HAMA_EXAMPLES rmgenerator $SPMV/vector-seq 1 6 0.9 4
3:    hama jar $HAMA_EXAMPLES spmv $SPMV/matrix-seq $SPMV/vector-seq $SPMV/result-seq 4
4:    hadoop dfs -rmr $SPMV/result-seq/part
5:    hama jar $HAMA_EXAMPLES matrixtotext $SPMV/matrix-seq $SPMV/matrix-txt
6:    hama jar $HAMA_EXAMPLES matrixtotext $SPMV/vector-seq $SPMV/vector-txt
7:    hama jar $HAMA_EXAMPLES matrixtotext $SPMV/result-seq $SPMV/result-txt
8:    hadoop dfs -cat /user/hduser/spmv/matrix-txt/*
   0	 6 3 5 0.24316243288531214 2 0.638622414091597 3 0.5480468710898891
   3	 6 2 5 0.5054043538570098 2 0.03911646523753309
   1	 6 3 4 0.5077528966368161 5 0.5780340816354201 3 0.4626752204959449
   4	 6 2 1 0.6512355661856207 4 0.08804976645891671
   2	 6 2 4 0.7200271909735554 1 0.3510851368183805
   5	 6 2 2 0.5848717104309032 3 0.0889791409798859

9:    hadoop dfs -cat /user/hduser/spmv/vector-txt/*
   0	 6 6 0 0.3365077672167889 1 0.17498609722570935 2 0.32806410950648845 3 0.6016567879100464 4 0.786158850847722 5 0.6856872945972037
10:   hadoop dfs -cat /user/hduser/spmv/result-txt/*
   0	 6 6 0 0.7059786044267415 1 1.0738967463653346 2 0.6274907669206862 3 0.35938205240905363 4 0.18317827331814918 5 0.24541032101100438

We got the expected result. So, now we will explain the meaning of each line in code snippet above.

Line 0: Clean up of directories related to SpMV tests.

...

To use SpMV in this mode you should provide text files in appropriate format. I decided to represent all matrices and vectors as follows: each row of the matrix is represented by row index, length of the row, number of non-zero items, pairs of index and value. All values inside rows are separated by whitespace, rows are separated by newline. Vectors are represented as matrix rows with arbitrary row index(not used). So, for example:

No Format

[1 0 2]    3 2 0 1 2 2
[0 0 0]  = 3 0
[0 5 1]    3 2 1 5 2 1

Now let's show some example. Imagine that you need to multiply

No Format

 [1 0 6 0]   [2]   [38] 
 [0 4 0 0] * [3] = [12] 
 [0 2 3 0]   [6]   [24] 
 [3 0 0 5]   [0]   [6]

First of all, you should create appropriate text files for input matrix and input vector. For input matrix file should look like

No Format

0 4 2 0 1 2 6
1 4 1 1 4
2 4 2 1 2 2 3
3 4 2 0 3 3 5

For vector file should be look like

No Format

0 4 3 0 2 1 3 2 6

as described above. After that you should copy these files to HDFS. If you don't feel comfortable with HDFS please see this tutorial. After you have copied input matrix into matrix-txt and input vector into vector-txt, we are ready to start. The following code snippet shows, how you can multiply matrices in this mode. Explanations will be given below.

...