Getting Started
Clone
First clone the project and compile the project using Maven. Once this is complete the blur libraries and dependences will be copied into the lib directory.
Zookeeper Setup
Setup [Zookeeper][Zookeeper]. It is recommended that all production setups use a clustered Zookeeper environment, following best [practices][replicated_zk].
Hadoop Setup
Blur requires Hadoop to be installed because of library dependencies, but running the Hadoop daemons on the servers is optional.
HDFS Notes
If you are running Blur on a single machine this is not necessary, but [single node][single_node] setup is still required for libraries.
Setup Hadoop's HDFS filesystem, which is required for clustered setup. Though possible, the Map/Reduce system is not recommended to be run on the same machines the are running the Blur daemons. Follow the Hadoop [cluster setup][cluster_setup] guide.
HDFS Options
HDFS is not required to be installed and running on the same servers as Blur. However if the source HDFS is being used for heavy Map/Reduce or any other heavy I/O operations, performance could be affected. The storage location for each table is setup independently and via a URI location (e.g. hdfs://<namenode>:<port>/blur/tables/table/path). So there may be several tables online in a Blur cluster and each one could reference a different HDFS instance. This assumes that all the HDFS instances are compatible with one another.
blur-env.sh Configuration
Next you will need to configure the config/blur-env.sh file. The two exports that are required:
export JAVA_HOME=/usr/lib/j2sdk1.6-sun export HADOOP_HOME=/var/hadoop-0.20.2
blur.properties Configuration
Then you will need to setup the config/blur-site.properties file. The default site configuration:
- blur.zookeeper.connection=localhost blur.cluster.name=default
There are many other options in that can be set, see config/blur-default.properties
shards
Then in the config/shards list the servers that should run as blur shard servers. By default shard servers run on port 40020 and bind to the 0.0.0.0 address.
- shard1 shard2 shard3
controllers
Like the shards file, in the config/controllers list servers that will run as the blur controller servers. By default controller servers run on port 40010 and bind to the 0.0.0.0 address.
- controller1 controller2
NOTE: If you are going to run a single shard server running controllers is not required. A single shard server is fully functional on it's own. Controllers and the shard servers share the same thrift API, so later your code won't have to be modified to run against a cluster.
$BLUR_HOME
It is a good idea to add export BLUR_HOME=/var/blur in your .bash_profile.
Setup Nodes
Copy the Blur directory to the same location on all servers in the cluster.