HDFS is a great filesystem for streaming large amounts data across large scale clusters. However the random access latency is typically the same performance you would get in reading from a local drive if the data you are trying to access is not in the operating systems file cache. In other words every access to HDFS is similar to a local read with a cache miss. There have been great performance boosts in HDFS over the past few years but it still can't perform at the level that a search engine needs.

Now you might be thinking that Lucene reads from the local hard drive and performs great, so why wouldn't HDFS perform fairly well on it's own? However most of time the Lucene index files are cached by the operating system's file system cache. So Blur has it's own file system cache allows it to perform low latency data look-ups against HDFS.


On shard server start-up Blur creates 1 or more block cache slabs blur.shard.blockcache.slab.count that are each 128 MB in size. These slabs can be allocated on or off the heap blur.shard.blockcache.direct.memory.allocation. Each slab is broken up into 16,384 blocks with each block size being 8K. Then on the heap there is a concurrent LRU cache that tracks what blocks of what files are in which slab(s) at what offset. So the more slabs of cache you create the more entries there will be in the LRU thus more heap.



Say the shard server(s) that you are planning to run Blur on have 32G of ram. These machines are probably also running HDFS data nodes as well with very high xcievers (dfs.datanode.max.xcievers in hdfs-site.xml) say 8K. If the data nodes are configured with 1G of heap then they may consume up to 4G of memory due to the high thread count because of the xcievers. Next let's say you configure Blur to 4G of heap as well, and you want to use 12G of off heap cache.

In the blur-env.sh file you would need to change BLUR_SHARD_JVM_OPTIONS to include "-XX:MaxDirectMemorySize=12g" and possibly "-XX:+UseLargePages" depending on your Linux setup. If you leave the blur.shard.blockcache.slab.count to the default -1 the shard startup will automatically detect the -XX:MaxDirectMemorySize size and automatically use almost all of the memory. By default the JVm has 64m in reserve for direct memory so by default Blur leaves at least that amount available to the JVM.

Custom Configuration

Again in the blur-env.sh file you would need to change BLUR_SHARD_JVM_OPTIONS to include "-XX:MaxDirectMemorySize=13g" and possibly "-XX:+UseLargePages" depending on your Linux setup. I set the MaxDirectMemorySize to more than 12G to make sure we don't hit the maximum limit and cause a OOM exception, this does not reserve 13G it's a control to not allow more than that. Below is a working example, it also contains GC logging and GC configuration:

     export BLUR_SHARD_JVM_OPTIONS="-XX:MaxDirectMemorySize=13g \
                                    -XX:+UseLargePages \
                                    -Xms4g \
                                    -Xmx4g \
                                    -Xmn512m \
                                    -XX:+UseCompressedOops \
                                    -XX:+UseConcMarkSweepGC \
                                    -XX:+CMSIncrementalMode \
                                    -XX:CMSIncrementalDutyCycleMin=10 \
                                    -XX:CMSIncrementalDutyCycle=50 \
                                    -XX:ParallelGCThreads=8 \
                                    -XX:+UseParNewGC \
                                    -XX:MaxGCPauseMillis=200 \
                                    -XX:GCTimeRatio=10 \
                                    -XX:+DisableExplicitGC \
                                    -verbose:gc \
                                    -XX:+PrintGCDetails \
                                    -XX:+PrintGCDateStamps \
                                    -Xloggc:$BLUR_HOME/logs/gc-blur-shard-server_`date +%Y%m%d_%H%M%S`.log"

Next you will need to setup blur-site.properties by changing blur.shard.blockcache.slab.count to 96. This is telling blur to allocate 96 128MB slabs of memory at shard server start-up. Note, that the first time you do this that the shard servers may take long time to allocate the memory. This is because the OS could be using most of that memory for it's own filesystem caching and it will need to unload it which may cause some IO due the cache synching to disk.

Also the blur.shard.blockcache.direct.memory.allocation is set to true by default, this will tell the JVM to try and allocate the memory off heap. If you want to run the slabs in the heap (which is not recommended) set this value to false.

BlockCacheConfiguration (last edited 2013-06-14 01:12:10 by AaronMcCurry)