DataProcessingBenchmarks

  1. Hadoop Map/Reduce Data Processing Benchmarks
    1. Group/Sort
      1. MapReduce Flow
      2. Benchmarks
  2. Hbase Matrix computations Benchmarks
      1. MapReduce Flow


Hadoop Map/Reduce Data Processing Benchmarks

Group/Sort

SQL > select ipaddress, count(*) from access_log group by ipaddress order by count(*) desc limit 0,100;
σ count. ipaddresscountcount(ipaddress). ipaddress (access_log)))

MapReduce Flow
Benchmarks
1.5 GB access_log on 10 node cluster

This test should include the data load time for the MySql column, not just the SQL time.

http://wiki.apache.org/hadoop-data/attachments/DataProcessingBenchmarks/attachments/C__Users_udanax_Desktop_test-10.png

MySql 5.0.27

Hadoop-0.15.2

Hadoop-0.15.2

Hadoop-0.15.2

Hadoop-0.15.2

Hadoop-0.15.2

Data

B-tree disk table (MyISAM)

Text files (access_log)

Text files (access_log)

Text files (access_log)

Text files (access_log)

Text files (access_log)

Machine

1

2

4

6

8

10

Rows

5,914,669

5,914,669

5,914,669

5,914,669

5,914,669

5,914,669

Results

100

100

100

100

100

100

Time

4.43 sec

172.30 sec

108.01 sec

77.41 sec

66.30 sec

60.78 sec


Hbase Matrix computations Benchmarks

MapReduce Flow

last edited 2008-02-20 10:17:41 by udanax