Getting Started

Clone

First clone the project and compile the project using Maven. Once this is complete the blur libraries and dependences will be copied into the lib directory.

Zookeeper Setup

Setup [Zookeeper][Zookeeper]. It is recommended that all production setups use a clustered Zookeeper environment, following best [practices][replicated_zk].

Hadoop Setup

Blur requires Hadoop to be installed because of library dependencies, but running the Hadoop daemons on the servers is optional.

HDFS Notes

If you are running Blur on a single machine this is not necessary, but [single node][single_node] setup is still required for libraries.

Setup Hadoop's HDFS filesystem, which is required for clustered setup. Though possible, the Map/Reduce system is not recommended to be run on the same machines the are running the Blur daemons. Follow the Hadoop [cluster setup][cluster_setup] guide.

HDFS Options

HDFS is not required to be installed and running on the same servers as Blur. However if the source HDFS is being used for heavy Map/Reduce or any other heavy I/O operations, performance could be affected. The storage location for each table is setup independently and via a URI location (e.g. hdfs://<namenode>:<port>/blur/tables/table/path). So there may be several tables online in a Blur cluster and each one could reference a different HDFS instance. This assumes that all the HDFS instances are compatible with one another.

blur-env.sh Configuration

Next you will need to configure the config/blur-env.sh file. The two exports that are required:

export JAVA_HOME=/usr/lib/j2sdk1.6-sun export HADOOP_HOME=/var/hadoop-0.20.2

blur.properties Configuration

Then you will need to setup the config/blur-site.properties file. The default site configuration:

There are many other options in that can be set, see config/blur-default.properties

shards

Then in the config/shards list the servers that should run as blur shard servers. By default shard servers run on port 40020 and bind to the 0.0.0.0 address.

controllers

Like the shards file, in the config/controllers list servers that will run as the blur controller servers. By default controller servers run on port 40010 and bind to the 0.0.0.0 address.

NOTE: If you are going to run a single shard server running controllers is not required. A single shard server is fully functional on it's own. Controllers and the shard servers share the same thrift API, so later your code won't have to be modified to run against a cluster.

$BLUR_HOME

It is a good idea to add export BLUR_HOME=/var/blur in your .bash_profile.

Setup Nodes

Copy the Blur directory to the same location on all servers in the cluster.