Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The generic algorithm will contain one superstep, because no communication is needed: 0.

  1. Matrix and vector distribution. 2.

...

  1. Custom partitioning. 23. Local computation. 34. Output of result vector. 45. Constructing of dense vector.

...

No Format
[1 0 2]    3 2 0 1 2 2
[0 0 0]  = 3 0
[0 5 1]    3 2 1 5 2 1

Now let's show some example. Imagine that you need to multiply

No Format

[1 0 6 0]   [2]   [38] 
[0 4 0 0] * [3] = [12] 
[0 2 3 0]   [6]   [24] 
[3 0 0 5]   [0]   [6]

First of all, you should create appropriate text files for input matrix and input vector. For input matrix file should look like

No Format

0 4 2 0 1 2 6
1 4 1 1 4
2 4 2 1 2 2 3
3 4 2 0 3 3 5

For vector file should be look like

No Format

0 4 3 0 2 1 3 2 6

Usage with RandomMatrixGenerator

RandomMatrixGenerator as a SpMV works with sequence file format. So, to multiply random matrix with random vector we will do the following: generate matrix and vector; convert matrix, vector and result to text file; view matrix, vector and result. This sequence is described by the following code snippet:

No Format
01:    hadoop dfs -rmr $SPMV/*/*
12:    hama jar $HAMA_EXAMPLES rmgenerator $SPMV/matrix-seq 6 6 0.4 4
23:    hama jar $HAMA_EXAMPLES rmgenerator $SPMV/vector-seq 1 6 0.9 4
34:    hama jar $HAMA_EXAMPLES spmv $SPMV/matrix-seq $SPMV/vector-seq $SPMV/result-seq 4
45:    hadoop dfs -rmr $SPMV/result-seq/part
56:    hama jar $HAMA_EXAMPLES matrixtotext $SPMV/matrix-seq $SPMV/matrix-txt
67:    hama jar $HAMA_EXAMPLES matrixtotext $SPMV/vector-seq $SPMV/vector-txt
78:    hama jar $HAMA_EXAMPLES matrixtotext $SPMV/result-seq $SPMV/result-txt
89:    hadoop dfs -cat /user/hduser/spmv/matrix-txt/*
   0	 6 3 5 0.24316243288531214 2 0.638622414091597 3 0.5480468710898891
   3	 6 2 5 0.5054043538570098 2 0.03911646523753309
   1	 6 3 4 0.5077528966368161 5 0.5780340816354201 3 0.4626752204959449
   4	 6 2 1 0.6512355661856207 4 0.08804976645891671
   2	 6 2 4 0.7200271909735554 1 0.3510851368183805
   5	 6 2 2 0.5848717104309032 3 0.0889791409798859

910:    hadoop dfs -cat /user/hduser/spmv/vector-txt/*
   0	 6 6 0 0.3365077672167889 1 0.17498609722570935 2 0.32806410950648845 3 0.6016567879100464 4 0.786158850847722 5 0.6856872945972037
1011:   hadoop dfs -cat /user/hduser/spmv/result-txt/*
   0	 6 6 0 0.7059786044267415 1 1.0738967463653346 2 0.6274907669206862 3 0.35938205240905363 4 0.18317827331814918 5 0.24541032101100438

We got the expected result. So, now we will explain the meaning of each line in code snippet above.

Line 01: Clean up of directories related to SpMV tests.

Line 12-23: Generation of input matrix and vector. In this example we test 6x6 matrix and 1x6 vector multiplication

Line 34: SpMV algorithm.

Line 45: Deletion of part files from output directory at line 4. NOTE: matrixtotext will fail if this step will not be performed, because result-seq will containg part folder and matrixtotext don't know how to deal with it yet.

Line 56-78: Convertion of input matrix, input vector and result to text format.

Line 89-1011: Showing the result.

Usage with arbitrary text files

...

No Format
Usage: matrixtotext <input matrix dir> <output matrix dir> [number of tasks (default max)]

Now let's show some example. To use SpMV in this mode you should provide text files in appropriate format, as described above. Imagine that you need to multiply

No Format

[1 0 6 0]   [2]   [38] 
[0 4 0 0] * [3] = [12] 
[0 2 3 0]   [6]   [24] 
[3 0 0 5]   [0]   [6]

First of all, you should create appropriate text files for input matrix and input vector. For input matrix file should look like

No Format

0 4 2 0 1 2 6
1 4 1 1 4
2 4 2 1 2 2 3
3 4 2 0 3 3 5

For vector file should be look like

No Format

0 4 3 0 2 1 3 2 6

After that you should copy these files to HDFS. If you don't feel comfortable with HDFS please see this tutorial. After you have copied input matrix into matrix-txt and input vector into vector-txt, we are ready to start. The following code snippet shows, how you can multiply matrices in this mode. Explanations will be given below.

...

Line 6: Output of result vector. You can see that we gained an expected vector.

Possible improvements

  1. Bug fixing. My main aim now - provide stable work of SpMV. 2. Significant improvement in total time of algorithm can be achieved by creating custom partitioner class. It will give us load balancing and therefore better efficiency. This is the main possibility for optimization, because we decided, that using of row-wise matrix access i acceptable. Maybe it can be achieved by reordering of input or by customizing partitioning algorithm of framework.

...