Warning

This doc only applies to HBase 0.20 and beyond. If you are under 0.19.x, please consider upgrading.

This distro doesn't contain all bug fixes (such as when LZO header or block header data falls on read boundary).

Please get latest distro with all fixes from https://github.com/toddlipcon/hadoop-lzo

Why compression?

By enabling compression, the store file (HFile) will use a compression algorithm on blocks as they are written (during flushes and compactions) and thus must be decompressed when reading.

Since this adds a read-time-penalty, why would one enable any compression? There are a few reasons why the advantages of compression can outweigh the disadvantages:

To be as low friction as necessary, a real-time compression library is preferred. Out of the box, HBase ships with only Gzip compression, which is fairly slow.

To achieve maximal performance and benefit, you must enable LZO.

Enabling Lzo compression in HBase

Lzo is a GPL'ed native-library that ships with most Linux distributions. However, to use it in HBase, one must do the following steps:

Ensure the native Lzo base library is available on every node:

Checkout the native connector library:

On Mac:

(Parts of this found on http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ )

> port fetch lzo2 # If for some reason LZO2 is already installed, please uninstall first before doing this
> port edit lzo2 # A vim editor should open

// Add the following block of text in the file and save the file.
variant x86_64 description "Build the 64-bit." {
    configure.args-delete     --build=x86-apple-darwin ABI=standard
    configure.cflags-delete   -m32
    configure.cxxflags-delete -m32

    configure.args-append     --build=x86_64-apple-darwin ABI=64
    configure.cflags-append   -m64 -arch x86_64
    configure.cxxflags-append -m64 -arch x86_64
}

## END ##

> port install lzo2 +x86_64

This ensures the library is built in 64 bit mode, because java 1.6 is 64 bit only. Also to make sure your lzo library is x64_64 as well, type:

$ file /usr/local/lib/liblzo2.2.0.0.dylib
/usr/local/lib/liblzo2.2.0.0.dylib: Mach-O 64-bit dynamically linked shared library x86_64

env JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/ \
C_INCLUDE_PATH=/path/to/lzo64/include LIBRARY_PATH=/path/to/lzo64/lib \
CFLAGS="-arch x86_64" ant clean compile-native test tar

* Note: If you used macports, /path/to/lzo64 will be replaced by /opt/local (e.g. /opt/local/include and /opt/local/lib ) * Note: If for some reason you are getting compilation errors, you can add the following to the environment variables:

CLASSPATH=$HADOOP_HOME/hadoop-<version>-core.jar

* Note: Also during this install, if you are running into permission denied errors, even as ROOT, you can go ahead and change permissions of those files in order for the build to complete

Once the install has completed, a jar file and lib files have been created in the HADOOP-GPL-HOME/build directory. All these files MUST be copied both into your HADOOP_HOME and HBASE_HOME directories using the following commands from the HADOOP-GPL-HOME directory:

> cp build/hadoop-gpl-compression-0.1.0-dev/hadoop-gpl-compression-0.1.0-dev.jar $HADOOP_HOME/lib/
> cp build/hadoop-gpl-compression-0.1.0-dev/hadoop-gpl-compression-0.1.0-dev.jar $HBASE_HOME/lib/
> tar -cBf - -C build/hadoop-gpl-compression-0.1.0-dev/lib/native . | tar -xBvf - -C $HADOOP_HOME/lib/native
> tar -cBf - -C build/hadoop-gpl-compression-0.1.0-dev/lib/native . | tar -xBvf - -C $HBASE_HOME/lib/native

To build lzo2 from source in 64 bit mode:

$ CFLAGS="-arch x86_64" ./configure --build=x86_64-darwin --enable-shared --disable-asm
<configure output>
$ make
$ sudo make install

On Linux:

$ export CFLAGS="-m64"

Build the native connector library:

$ ant compile-native
$ ant jar

On Mac, the resulting library should be x86_64, as above, if not, add in the extra CFLAGS to build.xml in the call to configure in the target compile-native as listed above.

Now you have the following results:

 build/hadoop-gpl-compression-0.1.0-dev.jar
 build/native/Linux-amd64-64/lib/libgplcompression.*

You might have Linux-i386-32 or Mac_OS_X-x86_64-64 or whatever platform you are actually using.

Copy the results into the hbase lib directory:

$ cp build/hadoop-gpl-compression-0.1.0-dev.jar hbase/lib/
$ cp build/native/Linux-amd64-64/lib/libgplcompression.* hbase/lib/native/Linux-amd64-64/

Note there is an extra 'lib' level in the build, which is not present in the hbase/lib/native/ tree.

(VERY IMPORTANT) Distribute the new files to every machine in your cluster.

Using Lzo

While creating tables in hbase shell, specify the per-column family compression flag:

 create 'mytable', {NAME=>'colfam:', COMPRESSION=>'lzo'}

That's it!

Testing Compression is enabled

One more thing, to test compression is properly enabled, run: ./bin/hbase org.apache.hadoop.hbase.util.CompressionTest (Above presumes at least hbase 0.20.1) Above will dump out usage on how to run the CompressionTest. Be sure to run on all nodes in your cluster to ensure compression is working on all.

Here is how you'd run the compression tool on macosx:

$ HBASE_CLASSPATH=/Users/stack/checkouts/hadoop-gpl-compression/build/hadoop-gpl-compression-0.2.0-dev.jar ./bin/hbase -Djava.library.path=/Users/stack/checkouts/hbase/target/hbase-0.95-SNAPSHOT/hbase-0.95-SNAPSHOT/lib/native/Mac_OS_X-x86_64-64/  org.apache.hadoop.hbase.util.CompressionTest ~/Downloads/hfiles/default/7677693808046789623 lzo

Other tools

Does this help? Todd Lipcons' hadoop-lzo-packager

Troubleshooting

If you get com.hadoop.compression.lzo.LzoCompressor: java.lang.UnsatisfiedLinkError, check that 64 bit lzo libraries are being installed in /usr/lib rather than /usr/lib64. Even though a standalone java application to load up the lzo library could see it in /usr/lib, running hadoop/hbase it wouldn't take. Just copy the liblzo files over and make the appropriate links (From Samuel Yu up on the mailing list)

UsingLzoCompression (last edited 2012-03-22 05:31:38 by adsl-71-141-118-34)