=NOTE: This document is updated for Chukwa trunk development instruction; you should probably look at the Administration Guide for stable release instructions instead.
Chukwa is a system for large-scale reliable log collection and processing with Hadoop. The Chukwa design overview discusses the overall architecture of Chukwa. You should read that document before this one. The purpose of this document is to help you install and configure Chukwa.
Chukwa should work on any POSIX platform, but GNU/Linux is the only production platform that has been tested extensively. Chukwa has also been used successfully on Mac OS X, which several members of the Chukwa team use for development.
The only absolute software requirements are Java 1.6 or better and Hadoop 0.20.205+. HICC, the Chukwa visualization interface, requires HBase 0.90.4.
The Chukwa cluster management scripts rely on ssh; these scripts, however, are not required if you have some alternate mechanism for starting and stopping daemons.
A minimal Chukwa deployment has three components:
A Hadoop and HBase cluster on which Chukwa will process data (referred to as the Chukwa cluster). A collector process, that writes collected data to HBase. One or more agent processes, that send monitoring data to the collector. The nodes with active agent processes are referred to as the monitored source nodes. In addition, you may wish to run the Chukwa Demux jobs, which parse collected data, or HICC, the Chukwa visualization tool.
http://people.apache.org/~eyang/docs/chukwa-0.5-arch.png
General Hadoop configuration is available at: Hadoop Configuration
log4j.appender.DRFA=org.apache.log4j.net.SocketAppender log4j.appender.DRFA.RemoteHost=localhost log4j.appender.DRFA.Port=9096 log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n |
bin/hbase shell < /path/to/CHUKWA_HOME/conf/hbase.schema |
The local agent speaks a simple text-based protocol, by default over port 9093. Suppose you want Chukwa to monitor system metrics, hadoop metrics, and hadoop logs on the localhost:
Type \[without quotation marks\] "add org.apache.hadoop.chukwa.datacollection.adaptor.sigar.SystemMetrics [SystemMetrics] 60 0" |
Type \[without quotation marks\] "add [SocketAdaptor] [HadoopMetrics] 9095 0" |
Type \[without quotation marks\] "add [SocketAdaptor] Hadoop 9096 0" |
For data analytics with pig, there are some additional environment setup. Pig does not use the same environment variable name as Hadoop, therefore make sure the following environment are setup correctly:
The Hadoop Infrastructure Care Center (HICC) is the Chukwa web user interface. To set up HICC, do the following: