Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

...

Hama 0.7.0-SNAPSHOT vs

...

Giraph 1.

...

2.

...

0

  • 2 nodes cluster (8 cores 16GB)
  • 6 tasks (opt: -Xmx2048m)
  • 40,000 vertices
  • Memory based Queue and Vertices storage (ListVerticesInfo) was used (for both version).

...

Edges per Vertex

...

0.7.0-SNAPSHOT

...

Hama 0.6.3

...

500

...

359.518 seconds

...

284.075 seconds

...

600

...

440.835 seconds

...

347.325 seconds

...

700

...

521.53 seconds

...

368.14 seconds

...

800

...

605.78 seconds

...

Out Of Memory

...

900

...

705.896 seconds

...

Out Of Memory

...

1000

...

794.966 seconds

...

Out Of Memory

...

1100

...

879.476 seconds

...

Out Of Memory

...

1200

...

1032.919 seconds

...

Out Of Memory

...

1300

...

1173.371 seconds

  • node cluster
    • Quanta S210 X22RQ-3
    • CPU: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz x 2
    • MEM: 192 GB
    • HDD: 3TB x 1 (HW RAID, Physical HDD x 2)
    • NIC: 1G x 2, 10G x 4
  • Same input data (Random generated JSON format text files) was used
  • Same child opts (-Xmx4048m)
  • Same number of tasks was used
  • Same hash partitioner was used

NOTE: Hama allows you to generate a random graph data in JSON format, which is easily compatible with other similar systems. Try it on your cluster!

Vertices

The max-edges per vertex

Tasks

Hama-0.7.0

Giraph-1.2.0

300000

1000

30

70.425 seconds

154 seconds

300000

900

30

64.388 seconds

141 seconds

300000

800

30

58.355 seconds

130 seconds

300000

700

30

49.309 seconds

114 seconds

300000

600

30

44.334 seconds

104 seconds

300000

500

30

40.321 seconds

92 seconds

300000

400

30

37.304 seconds

81 seconds

300000

300

30

31.248 seconds

72 seconds

Vertices

The max-edges per vertex

Tasks

Hama-0.7.0

Giraph-1.2.0

30000

10000

20

72.485 seconds

130 seconds

30000

9000

20

70.399 seconds

124 seconds

30000

8000

20

58.388 seconds

113 seconds

30000

7000

20

49.39 seconds

104 seconds

30000

6000

20

47.153 seconds

89 seconds

30000

5000

20

40.34 seconds

79 seconds

30000

4000

20

37.292 seconds

72 seconds

30000

3000

20

31.483 seconds

61 seconds

What are the major changes from the last release?

The major improvement changes are in the queue and messaging systems. We now use own outgoing/incoming message manager instead of using Java's built-in queues. It stores messages in serialized form in a set of bundles (or a single bundle) to reduce the memory usage and RPC overhead. Another important improvement is the enhanced graph package. Instead of sending each message individually, we package the messages per vertex and send a packaged message to their assigned destination nodes. The thread-pool executor service also used for each vertex computation. With this, we achieve better performance.

Graph Package: PageRank Benchmarks 0.6.4 vs 0.7.0-SNAPSHOT

  • In 0.7.0-SNAPSHOT, the sender-side sends a list of messages per Vertex, the receiver-side uses normal Queue instead of Sorted Queue
    • CPU and Memory usage has been dramatically decreased!
  • Kryo Serializer used for message serialization

https://lh3.googleusercontent.com/-AjfoPL8c-I0/VRI4metjElI/AAAAAAAAE84/1IkhjzweP1Q/w778-h470-no/hama_1.png

 

Input Size (Bytes)

Total Tasks

Total Edges

Execution Time (0.6.4)

Execution Time (0.7.0-SNAPSHOT)

pagerank

905310

6

200,000

16.71 seconds

13.598 secs

pagerank

7065912

6

2,000,000

19.514 seconds

16.524 seconds

pagerank

857467170

6

200,000,000

1133.622 seconds

202.996 seconds

pagerank

857467170

10

200,000,000

770.589 seconds

145.751 seconds

pagerank

1437500140

10

300,000,000

1337.851 seconds

229.847 seconds

PageRank Benchmark test 0.6.4 vs 0.6.3

See http://blog.datasayer.com/2014/03/apache-hama-benchmark-test-at-lg-cns.html

...

Hama 0.6 RC3

  • Oracle BDA
  • 180 Tasks
  • SDP protocol disabled

...