You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 32 Next »

Compare between Hadoop RPC and Avro

Messenger

Physical Nodes

Tasks

MSG bytes

MSG exchanges per Superstep

Supersteps

Job Execution Time

Hadoop RPC

16

100

16

10 million

32

168.571 seconds

Avro RPC

16

100

16

10 million

32

159.573 seconds

Hadoop RPC

16

100

16

10,000

2048

711.924 seconds

Avro RPC

16

100

16

10,000

2048

735.994 seconds

PageRank (PR)

  • Experimental environments
    • One rack (9 nodes 144 cores) cluster
    • 10G network
    • Hadoop 0.20.2
    • Hama 0.4 TRUNK r1235586.
    • MR based Random Graph Generator
    • Task and data partition based on hashing of vertextID in graph and size of input data.

Web pages (10 anchors per page)

Tasks

Supersteps

Job Execution Time

10 million

17

25

2596.858 seconds

10 million

90

25

551.194 seconds

20 million

35

25

2349.266 seconds

20 million

90

25

1122.242 seconds

30 million

54

25

2636.32 seconds

30 million

90

25

1629.504 seconds

Compare with old version

Version

Physical Nodes

Tasks

MSG bytes

MSG exchanges per Superstep

Supersteps

Job Execution Time

Hama 0.2-incubating

16

16

16

16000

2048

2030.187 seconds

Hama 0.4-SNAPSHOT

16

160

16

160000

2048

1060.434 seconds

K-Means Clustering

  • Experimental environments
    • One rack (16 nodes 256 cores) cluster
    • 10G network
    • Hadoop 0.20.2
    • Hama 0.4 TRUNK r1222075.
    • Job Execution Time includes the generation of a random dataset of N points and assigning the first k as initial centers.
    • Block partitioning has been used.

N

k

Dimension

Max Iteration

Tasks

Job Execution Time

1 million

10

2

10

1

21.863 seconds

10 million

10

2

10

5

55.339 seconds

10 million

10

2

10

5

45,614 seconds

10 million

10

2

10

20

36,156 seconds

10 million

10

2

10

40

33,199 seconds

100 million

10

2

10

43

81.826 seconds

200 million

10

2

10

85

127.026 seconds

200 million

10

2

10

85

236,682 seconds

N

k

Dimension

Max Iteration

Tasks

Job Execution Time

1 million

10

10

20

2

44.56 seconds

10 million

10

10

20

14

69.842 seconds

Single Shortest Path Problem

  • Experimental environments
    • One rack (16 nodes 256 cores) cluster
    • 10G network
    • Hadoop 0.20.2
    • Hama 0.4 TRUNK r1213634.
    • MR based Random Graph Generator
    • Task and data partition based on hashing of vertextID in graph and size of input data.

http://4.bp.blogspot.com/-nsYLwF_l3-c/TucnAg0GBCI/AAAAAAAAAU4/VSeUosa2Q58/s1600/scr.png

Vertices (x10 edges)

Tasks

Supersteps

Job Execution Time

10 million

6

5423

656.393 seconds

20 million

12

2231

449.542 seconds

30 million

18

4398

886.845 seconds

40 million

24

5432

1112.912 seconds

50 million

30

10747

2079.262 seconds

60 million

36

8158

1754.935 seconds

70 million

42

20634

4325.141 seconds

80 million

48

14356

3236.194 seconds

90 million

54

11480

2785.996 seconds

100 million

60

7679

2169.528 seconds

Random Communication Benchmark

4 racks

  • 1024 VM nodes (1024 cores)
  • 10G network

Hama 0.4

  • Work in progress

1 rack

  • 16 nodes (256 cores)
  • 10G network

Hama 0.4 (r.1177507)

  • 160 Tasks (10 tasks per node)

Size of each message

Messages per superstep

Number of supersteps

Job runtime

16 bytes

10,000

32

63.875 seconds

16 bytes

20,000

32

81.76 seconds

16 bytes

30,000

32

102.879 seconds

16 bytes

40,000

32

117.783 seconds

16 bytes

50,000

32

129.778 seconds

16 bytes

60,000

32

147.876 seconds

16 bytes

70,000

32

156.896 seconds

16 bytes

80,000

32

184.609 seconds

16 bytes

90,000

32

187.035 seconds

16 bytes

100,000

32

199.027 seconds


  • 16 Tasks (1 task per node)

Hama 0.3

Improved compared with 0.2-incubating.

Size of each message

Messages per superstep

Number of supersteps

0.2-incubating

0.3-incubating

16 bytes

1000

512

507.837 seconds

486.838 seconds

16 bytes

1000

1024

979.198 seconds

874.016 seconds

Hama 0.2

Test 1 (many small messages vs. few large messages)

Size of each message

Messages per superstep

Number of supersteps

Job runtime

500 bytes

100

16

7.461 seconds

500 bytes

100

32

9.397 seconds

500 bytes

100

64

14.341 seconds

500 bytes

100

128

26.394 seconds

500 bytes

100

256

43.411 seconds

500 bytes

100

512

84.489 seconds

500 bytes

100

1024

156.581 seconds

500 bytes

100

2048

308.671 seconds

Size of each message

Messages per superstep

Number of supersteps

Job runtime

10 kb

5

16

13.679 seconds

10 kb

5

32

23.427 seconds

10 kb

5

64

46.398 seconds

10 kb

5

128

86.476 seconds

10 kb

5

256

171.511 seconds

10 kb

5

512

339.608 seconds

10 kb

5

1024

675.994 seconds

10 kb

5

2048

1872.939 seconds

Test 2

Size of each message

Messages per superstep

Number of supersteps

Job runtime

16 bytes

1000

16

20.365 seconds

16 bytes

1000

32

36.386 seconds

16 bytes

1000

64

67.404 seconds

16 bytes

1000

128

126.503 seconds

16 bytes

1000

256

251.602 seconds

16 bytes

1000

512

507.837 seconds

16 bytes

1000

1024

979.198 seconds

16 bytes

1000

2048

2030.187 seconds

  • No labels