Overview
Hadoop has native implementations of certain components for reasons of both performace & non-availability of java-implementations in a single dynamically linked native-hadoop library. On the *nix platform it is libhadoop.so. This section describes the usage & details on how to build the native-libraries.
Native Hadoop Libraries
Hadoop has the following native components:
zlib compression codec - Reworked zlib codec using nio's direct-buffers which gives us 60%-70% speedup. (more details here)
lzo compression codec - Implemented due to lack of java bindings for lzo . (more details here)
Usage
It is fairly simple to use the native-hadoop libraries:
Take a look at the supported platforms
Either download the prebuilt 32-bit i386-Linux native-hadoop libraries (available as part of hadoop distribution in lib/native) or build them yourself.
Ensure you have either or/both of >zlib-1.2 and >lzo2.0 packages for your platform installed; depending on your needs.
That's it!
The bin/hadoop script ensures that the native-hadoop library is on the library path via the system property -Djava.library.path= (another alternative is to use the LD_LIBRARY_PATH variable, but the former is recommended).
To check everything went alright check the hadoop log files for:
{{{DEBUG util.NativeCodeLoader - Trying to load the custom-built native-hadoop library... INFO util.NativeCodeLoader - Loaded the native-hadoop library}}}
God-forbid something goes wrong, then:
{{{INFO util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable }}} If so, recheck the above steps.
Supported Platforms
Native-hadoop library is supported for *nix platforms only. Unfortunately it is known not to work on Cygwin and Mac OS X and has mainly been used on the Linux platform. Patches from anyone interested in getting them working on Cygwin/MacOSX are welcome!
It has been tested on the following Linux distributions:
On all the above platforms a 32/64 bit native-hadoop library will work with a respective 32/64 bit jvm.
Building Native Hadoop Libraries
Native-hadoop library is written in ANSI C and built using the GNU autotools-chain (autoconf, autoheader, automake, autoscan, libtool). This means it should be straight-forward to build them on any platform with a standards compliant C compiler and the GNU autotools-chain. (See supported platforms)
In particular the various packages you would need on the target platform are:
C compiler (e.g. GNU C Compiler
- GNU Autools Chain
zlib-development package (stable version >= 1.2.0)
lzo-development package (stable version >= 2.0)
Once you have the pre-requisites use the standard build.xml and pass along the compile.native flag (set to true) to build the native-hadoop library:
{{{$ ant -Dcompile.native=true <target> }}} The native-hadoop library is not built by default since not everyone is interested in using them.
That's it! You should see the newly-built native-hadoop library in:
{{{$ build/native/<platform>/lib }}} where <platform> is combination of the system-properties: {os.name}-{os.arch}-{sun.arch.data.model}; for e.g. Linux-i386-32
Notes:
It is mandatory to have both the zlib and lzo development packages on the target platform for building the native-hadoop library; however for deployment it is sufficient to install zlib or lzo if you wish to use only one of them.
- It is necessary to have the correct 32/64 libraries of both zlib/lzo depending on the 32/64 bit jvm for the target platform for building/deployment of the native-hadoop library.