Numbers comparing MapFile and RFile (TFile+mods has dropped from the running for the moment anyways). The code used running tests is available over in github. I did following on local filesystem and on 4node hdfs:

{{{$ ./bin/hadoop org.apache.hadoop.hbase.MapFilePerformanceEvaluation $ ./bin/hadoop org.apache.hadoop.hbase.RFilePerformanceEvaluation}}}

For more context, see New File Format.

Local Filesystem

Macosx, 10 byte cells and keys.

MapFile

{{{2009-02-06 10:40:53,553 INFO [main] hbase.MapFilePerformanceEvaluation(86): Running SequentialWriteBenchmark for 100000 rows. 2009-02-06 10:40:53,621 WARN [main] util.NativeCodeLoader(52): Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2009-02-06 10:40:56,379 INFO [main] hbase.MapFilePerformanceEvaluation(89): Running SequentialWriteBenchmark for 100000 rows took 2713ms. 2009-02-06 10:40:56,380 INFO [main] hbase.MapFilePerformanceEvaluation(86): Running UniformRandomSmallScan for 100000 rows. 2009-02-06 10:41:00,367 INFO [main] hbase.MapFilePerformanceEvaluation(89): Running UniformRandomSmallScan for 100000 rows took 3969ms. 2009-02-06 10:41:00,367 INFO [main] hbase.MapFilePerformanceEvaluation(86): Running UniformRandomReadBenchmark for 100000 rows. 2009-02-06 10:41:07,791 INFO [main] hbase.MapFilePerformanceEvaluation(89): Running UniformRandomReadBenchmark for 100000 rows took 7418ms. 2009-02-06 10:41:07,796 INFO [main] hbase.MapFilePerformanceEvaluation(86): Running GaussianRandomReadBenchmark for 100000 rows. 2009-02-06 10:41:14,303 INFO [main] hbase.MapFilePerformanceEvaluation(89): Running GaussianRandomReadBenchmark for 100000 rows took 6483ms. 2009-02-06 10:41:14,303 INFO [main] hbase.MapFilePerformanceEvaluation(86): Running SequentialReadBenchmark for 100000 rows. 2009-02-06 10:41:15,158 INFO [main] hbase.MapFilePerformanceEvaluation(89): Running SequentialReadBenchmark for 100000 rows took 852ms.}}}

rfile 8k buffer

{{{2009-02-06 11:19:03,630 INFO [main] hbase.RFilePerformanceEvaluation(86): Running SequentialWriteBenchmark for 100000 rows. 2009-02-06 11:19:04,512 INFO [main] hbase.RFilePerformanceEvaluation(89): Running SequentialWriteBenchmark for 100000 rows took 835ms. 2009-02-06 11:19:04,516 INFO [main] hbase.RFilePerformanceEvaluation(86): Running UniformRandomSmallScan for 100000 rows. 2009-02-06 11:19:07,075 INFO [main] hbase.RFilePerformanceEvaluation(89): Running UniformRandomSmallScan for 100000 rows took 2424ms. 2009-02-06 11:19:07,078 INFO [main] hbase.RFilePerformanceEvaluation(86): Running UniformRandomReadBenchmark for 100000 rows. 2009-02-06 11:19:13,801 INFO [main] hbase.RFilePerformanceEvaluation(89): Running UniformRandomReadBenchmark for 100000 rows took 6715ms. 2009-02-06 11:19:13,806 INFO [main] hbase.RFilePerformanceEvaluation(86): Running GaussianRandomReadBenchmark for 100000 rows. 2009-02-06 11:19:19,646 INFO [main] hbase.RFilePerformanceEvaluation(89): Running GaussianRandomReadBenchmark for 100000 rows took 5835ms. 2009-02-06 11:19:19,647 INFO [main] hbase.RFilePerformanceEvaluation(86): Running SequentialReadBenchmark for 100000 rows. 2009-02-06 11:19:19,740 INFO [main] hbase.RFilePerformanceEvaluation(89): Running SequentialReadBenchmark for 100000 rows took 89ms.}}}

HDFS

4 node hdfs cluster, ten byte keys and cells

MapFile

$ ./bin/hadoop org.apache.hadoop.hbase.MapFilePerformanceEvaluation {{{09/02/06 20:00:01 INFO hbase.MapFilePerformanceEvaluation: Running SequentialWriteBenchmark for 100000 rows. 09/02/06 20:00:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 09/02/06 20:00:01 INFO compress.CodecPool: Got brand-new compressor 09/02/06 20:00:01 INFO compress.CodecPool: Got brand-new compressor 09/02/06 20:00:04 INFO hbase.MapFilePerformanceEvaluation: Running SequentialWriteBenchmark for 100000 rows took 2754ms. 09/02/06 20:00:04 INFO hbase.MapFilePerformanceEvaluation: Running UniformRandomSmallScan for 100000 rows. 09/02/06 20:00:26 INFO hbase.MapFilePerformanceEvaluation: Running UniformRandomSmallScan for 100000 rows took 22265ms. 09/02/06 20:00:26 INFO hbase.MapFilePerformanceEvaluation: Running UniformRandomReadBenchmark for 100000 rows. 09/02/06 20:02:31 INFO hbase.MapFilePerformanceEvaluation: Running UniformRandomReadBenchmark for 100000 rows took 124587ms. 09/02/06 20:02:31 INFO hbase.MapFilePerformanceEvaluation: Running GaussianRandomReadBenchmark for 100000 rows. 09/02/06 20:04:36 INFO hbase.MapFilePerformanceEvaluation: Running GaussianRandomReadBenchmark for 100000 rows took 125150ms. 09/02/06 20:04:36 INFO hbase.MapFilePerformanceEvaluation: Running SequentialReadBenchmark for 100000 rows. 09/02/06 20:04:37 INFO hbase.MapFilePerformanceEvaluation: Running SequentialReadBenchmark for 100000 rows took 960ms.}}}

rfile 8k buffer using seek+read

First Run

$ ./bin/hadoop org.apache.hadoop.hbase.RFilePerformanceEvaluation {{{09/02/06 20:05:23 INFO hbase.RFilePerformanceEvaluation: Running SequentialWriteBenchmark for 100000 rows. 09/02/06 20:05:24 INFO hbase.RFilePerformanceEvaluation: Running SequentialWriteBenchmark for 100000 rows took 578ms. 09/02/06 20:05:24 INFO hbase.RFilePerformanceEvaluation: Running UniformRandomSmallScan for 100000 rows. 09/02/06 20:05:41 INFO hbase.RFilePerformanceEvaluation: Running UniformRandomSmallScan for 100000 rows took 17492ms. 09/02/06 20:05:41 INFO hbase.RFilePerformanceEvaluation: Running UniformRandomReadBenchmark for 100000 rows. 09/02/06 20:07:41 INFO hbase.RFilePerformanceEvaluation: Running UniformRandomReadBenchmark for 100000 rows took 119389ms. 09/02/06 20:07:41 INFO hbase.RFilePerformanceEvaluation: Running GaussianRandomReadBenchmark for 100000 rows. 09/02/06 20:09:36 INFO hbase.RFilePerformanceEvaluation: Running GaussianRandomReadBenchmark for 100000 rows took 115376ms. 09/02/06 20:09:36 INFO hbase.RFilePerformanceEvaluation: Running SequentialReadBenchmark for 100000 rows. 09/02/06 20:09:36 INFO hbase.RFilePerformanceEvaluation: Running SequentialReadBenchmark for 100000 rows took 102ms.}}}

Second Run

{{{$ ./bin/hadoop org.apache.hadoop.hbase.RFilePerformanceEvaluation 09/02/06 20:35:14 INFO hbase.RFilePerformanceEvaluation: Running SequentialWriteBenchmark for 100000 rows. 09/02/06 20:35:15 INFO hbase.RFilePerformanceEvaluation: Running SequentialWriteBenchmark for 100000 rows took 674ms. 09/02/06 20:35:15 INFO hbase.RFilePerformanceEvaluation: Running UniformRandomSmallScan for 100000 rows. 09/02/06 20:35:29 INFO hbase.RFilePerformanceEvaluation: Running UniformRandomSmallScan for 100000 rows took 13320ms. 09/02/06 20:35:29 INFO hbase.RFilePerformanceEvaluation: Running UniformRandomReadBenchmark for 100000 rows. 09/02/06 20:37:26 INFO hbase.RFilePerformanceEvaluation: Running UniformRandomReadBenchmark for 100000 rows took 117903ms. 09/02/06 20:37:26 INFO hbase.RFilePerformanceEvaluation: Running GaussianRandomReadBenchmark for 100000 row 09/02/06 20:39:26 INFO hbase.RFilePerformanceEvaluation: Running GaussianRandomReadBenchmark for 100000 rows took 119625ms. 09/02/06 20:39:26 INFO hbase.RFilePerformanceEvaluation: Running SequentialReadBenchmark for 100000 rows. 09/02/06 20:39:26 INFO hbase.RFilePerformanceEvaluation: Running SequentialReadBenchmark for 100000 rows took 112ms.}}}

rfile 8k using pread

{{{$ ./bin/hadoop org.apache.hadoop.hbase.RFilePerformanceEvaluation 09/02/06 20:44:27 INFO hbase.RFilePerformanceEvaluation: Running SequentialWriteBenchmark for 100000 rows. 09/02/06 20:44:28 INFO hbase.RFilePerformanceEvaluation: Running SequentialWriteBenchmark for 100000 rows took 568ms. 09/02/06 20:44:28 INFO hbase.RFilePerformanceEvaluation: Running UniformRandomSmallScan for 100000 rows. 09/02/06 20:44:42 INFO hbase.RFilePerformanceEvaluation: Running UniformRandomSmallScan for 100000 rows took 14239ms. 09/02/06 20:44:42 INFO hbase.RFilePerformanceEvaluation: Running UniformRandomReadBenchmark for 100000 rows. 09/02/06 20:46:20 INFO hbase.RFilePerformanceEvaluation: Running UniformRandomReadBenchmark for 100000 rows took 97716ms. 09/02/06 20:46:20 INFO hbase.RFilePerformanceEvaluation: Running GaussianRandomReadBenchmark for 100000 rows. 09/02/06 20:47:54 INFO hbase.RFilePerformanceEvaluation: Running GaussianRandomReadBenchmark for 100000 rows took 93736ms. 09/02/06 20:47:54 INFO hbase.RFilePerformanceEvaluation: Running SequentialReadBenchmark for 100000 rows. 09/02/06 20:47:54 INFO hbase.RFilePerformanceEvaluation: Running SequentialReadBenchmark for 100000 rows took 389ms.}}}

HDFS using 1k cells like Performance Evaluation

rfile w/ 8k buffer and pread

First Run

{{{$ ./bin/hadoop org.apache.hadoop.hbase.RFilePerformanceEvaluation 09/02/06 20:52:36 INFO hbase.RFilePerformanceEvaluation: Running SequentialWriteBenchmark for 100000 rows. 09/02/06 20:52:39 INFO hbase.RFilePerformanceEvaluation: Running SequentialWriteBenchmark for 100000 rows took 2949ms. 09/02/06 20:52:39 INFO hbase.RFilePerformanceEvaluation: Running UniformRandomSmallScan for 100000 rows. 09/02/06 20:53:24 INFO hbase.RFilePerformanceEvaluation: Running UniformRandomSmallScan for 100000 rows took 44921ms. 09/02/06 20:53:24 INFO hbase.RFilePerformanceEvaluation: Running UniformRandomReadBenchmark for 100000 rows. 09/02/06 20:55:07 INFO hbase.RFilePerformanceEvaluation: Running UniformRandomReadBenchmark for 100000 rows took 102617ms. 09/02/06 20:55:07 INFO hbase.RFilePerformanceEvaluation: Running GaussianRandomReadBenchmark for 100000 rows. 09/02/06 21:01:45 INFO hbase.RFilePerformanceEvaluation: Running GaussianRandomReadBenchmark for 100000 rows took 398033ms. 09/02/06 21:01:45 INFO hbase.RFilePerformanceEvaluation: Running SequentialReadBenchmark for 100000 rows. 09/02/06 21:01:56 INFO hbase.RFilePerformanceEvaluation: Running SequentialReadBenchmark for 100000 rows took 10784ms.}}}

Second Run

{{{09/02/06 22:10:51 INFO hbase.RFilePerformanceEvaluation: Running SequentialWriteBenchmark for 100000 rows. 09/02/06 22:10:54 INFO hbase.RFilePerformanceEvaluation: Running SequentialWriteBenchmark for 100000 rows took 3151ms. 09/02/06 22:10:54 INFO hbase.RFilePerformanceEvaluation: Running UniformRandomSmallScan for 100000 rows. 09/02/06 22:11:37 INFO hbase.RFilePerformanceEvaluation: Running UniformRandomSmallScan for 100000 rows took 42660ms. 09/02/06 22:11:37 INFO hbase.RFilePerformanceEvaluation: Running UniformRandomReadBenchmark for 100000 rows. 09/02/06 22:13:18 INFO hbase.RFilePerformanceEvaluation: Running UniformRandomReadBenchmark for 100000 rows took 100919ms. 09/02/06 22:13:18 INFO hbase.RFilePerformanceEvaluation: Running GaussianRandomReadBenchmark for 100000 rows. 09/02/06 22:19:49 INFO hbase.RFilePerformanceEvaluation: Running GaussianRandomReadBenchmark for 100000 rows took 390413ms. 09/02/06 22:19:49 INFO hbase.RFilePerformanceEvaluation: Running SequentialReadBenchmark for 100000 rows. 09/02/06 22:19:59 INFO hbase.RFilePerformanceEvaluation: Running SequentialReadBenchmark for 100000 rows took 10883ms.}}}

MapFile

First Run

{{{$ ./bin/hadoop org.apache.hadoop.hbase.MapFilePerformanceEvaluation 09/02/06 21:03:19 INFO hbase.MapFilePerformanceEvaluation: Running SequentialWriteBenchmark for 100000 rows. 09/02/06 21:03:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 09/02/06 21:03:19 INFO compress.CodecPool: Got brand-new compressor 09/02/06 21:03:19 INFO compress.CodecPool: Got brand-new compressor 09/02/06 21:03:34 INFO hbase.MapFilePerformanceEvaluation: Running SequentialWriteBenchmark for 100000 rows took 14293ms. 09/02/06 21:03:34 INFO hbase.MapFilePerformanceEvaluation: Running UniformRandomSmallScan for 100000 rows. 09/02/06 21:04:03 INFO hbase.MapFilePerformanceEvaluation: Running UniformRandomSmallScan for 100000 rows took 29751ms. 09/02/06 21:04:03 INFO hbase.MapFilePerformanceEvaluation: Running UniformRandomReadBenchmark for 100000 rows. 09/02/06 21:07:50 INFO hbase.MapFilePerformanceEvaluation: Running UniformRandomReadBenchmark for 100000 rows took 226938ms. 09/02/06 21:07:50 INFO hbase.MapFilePerformanceEvaluation: Running GaussianRandomReadBenchmark for 100000 rows. 09/02/06 21:11:41 INFO hbase.MapFilePerformanceEvaluation: Running GaussianRandomReadBenchmark for 100000 rows took 230951ms. 09/02/06 21:11:41 INFO hbase.MapFilePerformanceEvaluation: Running SequentialReadBenchmark for 100000 rows. 09/02/06 21:11:44 INFO hbase.MapFilePerformanceEvaluation: Running SequentialReadBenchmark for 100000 rows took 2560ms.}}}

Second Run

{{{$ ./bin/hadoop org.apache.hadoop.hbase.MapFilePerformanceEvaluation ; ./bin/hadoop org.apache.hadoop.hbase.RFilePerformanceEvaluation 09/02/06 22:02:09 INFO hbase.MapFilePerformanceEvaluation: Running SequentialWriteBenchmark for 100000 rows. 09/02/06 22:02:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 09/02/06 22:02:09 INFO compress.CodecPool: Got brand-new compressor 09/02/06 22:02:09 INFO compress.CodecPool: Got brand-new compressor 09/02/06 22:02:23 INFO hbase.MapFilePerformanceEvaluation: Running SequentialWriteBenchmark for 100000 rows took 14016ms. 09/02/06 22:02:23 INFO hbase.MapFilePerformanceEvaluation: Running UniformRandomSmallScan for 100000 rows. 09/02/06 22:02:56 INFO hbase.MapFilePerformanceEvaluation: Running UniformRandomSmallScan for 100000 rows took 32547ms. 09/02/06 22:02:56 INFO hbase.MapFilePerformanceEvaluation: Running UniformRandomReadBenchmark for 100000 rows. 09/02/06 22:06:50 INFO hbase.MapFilePerformanceEvaluation: Running UniformRandomReadBenchmark for 100000 rows took 234207ms. 09/02/06 22:06:50 INFO hbase.MapFilePerformanceEvaluation: Running GaussianRandomReadBenchmark for 100000 rows. 09/02/06 22:10:48 INFO hbase.MapFilePerformanceEvaluation: Running GaussianRandomReadBenchmark for 100000 rows took 237558ms. 09/02/06 22:10:48 INFO hbase.MapFilePerformanceEvaluation: Running SequentialReadBenchmark for 100000 rows. 09/02/06 22:10:50 INFO hbase.MapFilePerformanceEvaluation: Running SequentialReadBenchmark for 100000 rows took 2625ms.}}}

HDFS 1k cells

MapFile

{{{09/02/06 22:28:58 INFO hbase.MapFilePerformanceEvaluation: Running SequentialWriteBenchmark for 100000 rows. 09/02/06 22:28:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 09/02/06 22:28:58 INFO compress.CodecPool: Got brand-new compressor 09/02/06 22:28:58 INFO compress.CodecPool: Got brand-new compressor 09/02/06 22:29:13 INFO hbase.MapFilePerformanceEvaluation: Running SequentialWriteBenchmark for 100000 rows took 14915ms. 09/02/06 22:29:13 INFO hbase.MapFilePerformanceEvaluation: Running UniformRandomSmallScan for 100000 rows. 09/02/06 22:29:46 INFO hbase.MapFilePerformanceEvaluation: Running UniformRandomSmallScan for 100000 rows took 32558ms. 09/02/06 22:29:46 INFO hbase.MapFilePerformanceEvaluation: Running UniformRandomReadBenchmark for 100000 rows. 09/02/06 22:33:55 INFO hbase.MapFilePerformanceEvaluation: Running UniformRandomReadBenchmark for 100000 rows took 249211ms. 09/02/06 22:33:55 INFO hbase.MapFilePerformanceEvaluation: Running GaussianRandomReadBenchmark for 100000 rows. 09/02/06 22:37:49 INFO hbase.MapFilePerformanceEvaluation: Running GaussianRandomReadBenchmark for 100000 rows took 234521ms. 09/02/06 22:37:49 INFO hbase.MapFilePerformanceEvaluation: Running SequentialReadBenchmark for 100000 rows. 09/02/06 22:37:52 INFO hbase.MapFilePerformanceEvaluation: Running SequentialReadBenchmark for 100000 rows took 2827ms.}}}

RFile 64k buffers

{{{09/02/06 22:37:53 INFO hbase.RFilePerformanceEvaluation: Running SequentialWriteBenchmark for 100000 rows. 09/02/06 22:37:56 INFO hbase.RFilePerformanceEvaluation: Running SequentialWriteBenchmark for 100000 rows took 3083ms. 09/02/06 22:37:56 INFO hbase.RFilePerformanceEvaluation: Running UniformRandomSmallScan for 100000 rows. 09/02/06 22:38:24 INFO hbase.RFilePerformanceEvaluation: Running UniformRandomSmallScan for 100000 rows took 27405ms. 09/02/06 22:38:24 INFO hbase.RFilePerformanceEvaluation: Running UniformRandomReadBenchmark for 100000 rows. 09/02/06 22:41:24 INFO hbase.RFilePerformanceEvaluation: Running UniformRandomReadBenchmark for 100000 rows took 180332ms. 09/02/06 22:41:24 INFO hbase.RFilePerformanceEvaluation: Running GaussianRandomReadBenchmark for 100000 rows. 09/02/06 22:44:20 INFO hbase.RFilePerformanceEvaluation: Running GaussianRandomReadBenchmark for 100000 rows took 175614ms. 09/02/06 22:44:20 INFO hbase.RFilePerformanceEvaluation: Running SequentialReadBenchmark for 100000 rows. 09/02/06 22:44:23 INFO hbase.RFilePerformanceEvaluation: Running SequentialReadBenchmark for 100000 rows took 2840ms.}}}

16 concurrent reading threads

For ten byte cells and 8k rfile blocks, against localfs, MapFile wins. Results for rfile are odd. Pread runs are slower than seek+read. Localfs should just be discounted as wonky. On hdfs, pread is faster than seek+read for rfile -- about 3 times faster -- and rfile is about 3/4 times faster than mapfile in above various tests. My guess is that we pay for the high-level synchronizations on mapfile when operating with the relatively high-latency hdfs.

For 1k cells and 64k rfile blocks, rfile is about twice mapfile on random accesses. Scan tests come in closer together with rfile about 20% faster on the seek and read 30 rows test. Odd is mapfile beat rfile in the sequential read benchmark -- you'd think rfile will kill it in this test of all of them.

Hbase/NewFileFormat/Performance (last edited 2009-09-20 23:54:58 by localhost)