Testing HBase Performance and Scalability

Content

Tool Description

HADOOP-1476 adds to HBase src/test the script org.apache.hadoop.hbase.PerformanceEvaluation (June 12th, 2007). It runs the tests described in Performance Evaluation, Section 7 of the BigTable paper. See the citation for test descriptions. They will not be described below. The script is useful evaluating HBase performance and how well it scales as we add region servers.

Here is the current usage for the PerformanceEvaluation script:

[stack@aa0-000-12 ~]$ ./hadoop-trunk/src/contrib/hbase/bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation
Usage: java org.apache.hadoop.hbase.PerformanceEvaluation[--master=host:port] [--miniCluster] <command> <nclients>

Options:
 master          Specify host and port of HBase cluster master. If not present,
                 address is read from configuration
 miniCluster     Run the test on an HBaseMiniCluster

Command:
 randomRead      Run random read test
 randomReadMem   Run random read test where table is in memory
 randomWrite     Run random write test
 sequentialRead  Run sequential read test
 sequentialWrite Run sequential write test
 scan            Run scan test

Args:
 nclients        Integer. Required. Total number of clients (and HRegionServers)
                 running: 1 <= value <= 500
Examples:
 To run a single evaluation client:
 $ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1

If you pass nclients > 1, PerformanceEvaluation starts up a mapreduce job in which each map runs a single loading client instance.

To run the PerformanceEvaluation script, compile the HBase test classes:

$ cd ${HBASE_HOME}
$ ant compile-test

The above ant target compiles all test classes into ${HADOOP_HOME}/build/contrib/hbase/test. It also generates ${HADOOP_HOME}/build/contrib/hbase/hadoop-hbase-test.jar. The latter jar includes all HBase test and src classes and has org.apache.hadoop.hbase.PerformanceEvaluation as its Main-Class. Use the test jar running PerformanceEvaluation on a hadoop cluster (You'd run the client as a MR job when you want to run multiple clients concurrently).

Here is how to run a single-client PerformanceEvaluation sequentialWrite test:

{{{$ ${HADOOP_HOME}/src/contrib/hbase/bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1 }}}

Here is how you would run the same on hadoop cluster:

{{{$ ${HADOOP_HOME}/bin/hadoop jar ${HADOOP_HOME}/build/contrib/hbase/hadoop-hbase-test.jar sequentialWrite 1 }}}

For the latter, you will likely have to copy your hbase configurations -- e.g. your ${HBASE_HOME}/conf/hbase*.xml files -- to ${HADOOP_HOME}/conf and make sure they are replicated across the cluster so your hbase configurations can be found by the running mapreduce job (in particular, clients need to know the address of the HBase master).

Note, the mapreduce mode of the testing script works a little different from single client mode. It does not delete the test table at the end of each run as is done when the script runs in single client mode. Nor does it pre-run the sequentialWrite test before its runs the sequentialRead test (the table needs to be populated with data first before the sequentialRead can run). For the mapreduce version, the onus is on the operator to organize the correct order in which to run the jobs. To delete a table, use the hbase shell and run the drop table command (Run 'help;' for how after starting the shell).

{{{$ ${HBASE_HOME}/bin/hbase shell }}}

One Region Server on June 8th, 2007

Here are some first figures for HBase in advance of any profiling and before addition of caching, etc., taken June 8, 2007

This first test ran on a mini test cluster of four machines only: not the 1768 of the BigTable paper. Each node had 8G of RAM and 2x dual-core 2Ghz Opterons. Every member ran a HDFS datanode. One node ran the namenode and the HBase master, another the region server and a third an instance of the PerformanceEvaluation script configured to run one client. Clients write ~1GB of data: One million rows, each row has a single column whose value is 1000 randomly-generated bytes (See the BigTable paper for a better description).

Experiment

HBase

BigTable

random reads

68

1212

random reads (mem)

Not implemented

10811

random writes

847

8850

sequential reads

301

4425

sequential writes

850

8547

scans

3063

15385

The above table lists how many 1000-byte rows read/written per second. The BigTable values are from '1' Tablet Server column of Figure 6 of the BigTable paper.

Except for scanning, we seem to be an order of magnitude off at the moment. Watching the region server during the write tests, it was lightly loaded. At a minimum, there would appear to be issues with liveness/synchronization in need of fixing.

More to follow after more analysis.

One Region Server on September 16th, 2007

Ran same setup as for the first test above on same machines. The main performance improvement in hbase is that batch updates are only sent to the server by the client on commit where before each batch operation -- start, put, commit -- required a trip to the server. This change cuts the number of trips to the server by 2/3rds at least. Otherwise, the client/server communication has changed where it makes sense to pass bytes rather than an object wrapping bytes for some savings RPCing.

Here is the loading command run: {{{$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation randomRead 1 }}}

Experiment

HBase20070708

HBase20070916

BigTable

random reads

68

272

1212

random reads (mem)

Not implemented

Not implemented

10811

random writes

847

1460

8850

sequential reads

301

267

4425

sequential writes

850

1278

8547

scans

3063

3692

15385

The above table lists how many 1000-byte rows read/written per second.

Random reads are almost 4x faster, random and sequential writes ~50% faster, and scans about ~20% faster but still a long ways to go...

HBase Release 0.15.0

Experiment

HBase20070708

HBase20070916

0.15.0

BigTable

random reads

68

272

264

1212

random reads (mem)

Not implemented

Not implemented

Not implemented

10811

random writes

847

1460

1277

8850

sequential reads

301

267

305

4425

sequential writes

850

1278

1112

8547

scans

3063

3692

3758

15385

HBase TRUNK 12/19/2007, r605675

Below are numbers for version r605675, 12/19/2007. Looks like writing got a bit better -- probably because of recent lock refactoring -- but reading speed has almost halved. I'd have expected the read speeds to also have doubled but I'd guess there is still a HADOOP-2434 like issue over in the datanode.

I've also added numbers for sequential writes, random and next ('scan') reads into and out of a single *open* HDFS mapfile for comparison: i.e. random reading, we are not opening the file each time and the mapfile index is loaded into memory. Going by current numbers, pure mapfile writes are slower than the numbers google posted in initial bigtable paper and reads just a bit faster (except when scanning). GFS must be fast.

Experiment Run

HBase20070708

HBase20070916

0.15.0

20071219

mapfile

BigTable

random reads

68

272

264

167

685

1212

random reads (mem)

Not implemented

Not implemented

Not implemented

Not Implemented

-

10811

random writes

847

1460

1277

1400

-

8850

sequential reads

301

267

305

138

-

4425

sequential writes

850

1278

1112

1691

5494

8547

scans

3063

3692

3758

3731

25641

15385

Subsequently I profiled the mapfile PerformanceEvaluation. Turns out generation of the values and keys to insert were taking a bunch of CPU time. After making a fix key and value generations were between 15-25% (the alternative was precompiling keys and values which would take loads of memory). Rerunning tests, it looks like there can be a pretty broad range of fluctuation in mapfile numbers between runs. I also noticed that the 0.15.x random reads seem to be 50% faster than TRUNK. Investigate.

HBase 0.1.2 (Candidate) 04/25/2008

Numbers for the 0.1.2 candidate. The 'mapfile', '20071219', and 'BigTable' columns are copied from the 'TRUNK 12/19/2007' above. The new columns are for 0.1.2 and for mapfile in 0.16.3 hadoop (This latter test uses new MapFilePerformanceTest script).

Experiment Run

20071219

0.1.2

mapfile

mapfile0.16.3

BigTable

random reads

167

351

685

644

1212

random reads (mem)

Not implemented

Not implemented

Not Implemented

-

10811

random writes

1400

2330

-

-

8850

sequential reads

138

349

-

-

4425

sequential writes

1691

2479

5494

6204

8547

scans

3731

6278

25641

47662

15385

We've near doubled in most areas over the hbase from 20071219. Its a combination of improvements in hadoop -- particularly scanning -- and in HBase itself (locking was redone, we customized RPC to use codes rather than class names, etc.).

HBase 0.2.0 08/08/2008

Numbers for hbase 0.2.0 on java6 on hadoop 0.17.1 and hbase 0.2.0 on hadoop 0.18.0 using java7. Also includes numbers for hadoop mapfile for hadoop 0.16.4, 0.17.1, and hadoop 0.18.0 (on java6) as well as rows copied from above tables so you can compare progress.

Experiment Run

20071219

0.1.2

0.2.0java6

0.2.0java7

mapfile

mapfile0.16.4

mapfile0.17.1

mapfile0.18.0

BigTable

random reads

167

351

428

560

685

644

568

915

1212

random reads (mem)

Not implemented

Not implemented

Not Implemented

-

-

-

-

-

10811

random writes

1400

2330

2167

2218

-

-

-

-

8850

sequential reads

138

349

427

582

-

-

-

-

4425

sequential writes

1691

2479

2076

1966

5494

6204

5684

5800

8547

scans

3731

6278

3737

3784

25641

47662

55692

58054

15385

HBase 0.19.0RC1 01/16/2009

Numbers for hbase 0.19.0RC1 on hadoop 0.19.0 and java6.

{{{[stack@aa0-000-13 ~]$ ~/bin/jdk/bin/java -version java version "1.6.0_11" Java(TM) SE Runtime Environment (build 1.6.0_11-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode)}}}

Tried java7 but no discernible difference (build 1.7.0-ea-b43).

Also includes numbers for hadoop mapfile. Table includes last test, 0.2.0java6 (and hadoop 0.17.2) from above for easy comparison.

Start cluster fresh for each test then wait for all regions to be deployed before starting up tests (means no content in memcache which means that for such as random read we are always going to the filesystem, never getting values from memcache).

Experiment Run

0.2.0java6

mapfile0.17.1

0.19.0RC1!Java6

0.19.0RC1!Java6!Zlib

0.19.0RC1!Java6,8Clients

mapfile0.19.0

BigTable

random reads

428

568

540

80

768

768

1212

random reads (mem)

-

-

-

-

-

-

10811

random writes

2167

2218

9986

-

-

-

8850

sequential reads

427

582

464

-

-

-

4425

sequential writes

2076

5684

9892

7182

14027

7519

8547

scans

3737

55692

20971

20560

14742

55555

15385

Some improvement writing and scanning (faster than BigTable paper seemingly). Random Reads still lag. Sequential Reads lag badly. A bit of fetch-ahead as we did scanning should help here.

Speedup is combo of hdfs improvements, hbase improvements including batching when writing and scanning (the bigtable PE description alludes to scans using prefetch), and use of two JBOD'd disks -- as in google paper -- where previous in tests above, all disks were RAID'd. Otherwise, hardware is same, similar to bigtable papers's dual dual-core opterons, 1G for hbase, etc.

Of note, the mapfile numbers are less than those of hbase when writing because the mapfile tests write one file whereas hbase after first split is writing to multiple files concurrently. On the other hand, hbase random read is very like mapfile random read, at least in single client case; we're effectively asking the filesystem for a random value from the midst of a file in both cases. The mapfile numbers are useful as guage of how much hdfs has come on since the last time we ran PE.

Block compression (native zlib -- hbase bug won't let you specify anything but the DefaultCodec, e.g. lzo) is a little slower writing, way slower random-reading but about same scanning.

The 8 concurrent clients write a single regionserver instance. Our cluster is four computers. Load was put up by running a MR job as follows: $ ./bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation randomRead  8 MR job ran two mappers per computer so 8 clients running concurrently. Timings were those reported at head of the MR job page in the UI.

More Information

As of this writing (2011), YCSB is a popular tool for testing performance. See the HBase book for more information http://hbase.apache.org/book.html

Hbase/PerformanceEvaluation (last edited 2011-05-04 21:20:49 by 207)