In Chukwa, data is collected by adaptors. Adaptors run inside the agent process, and can be dynamically started and stopped.

(mention that commands can go in initial_adaptors or else at control port)

The general syntax of the add command is as follows:

add [name =] <adaptor_class_name> <datatype> <adaptor specific params> <initial offset>.

The name field lets you give this adaptor a meaningful name, in order to stop it later. If you do not specify a name, Chukwa will autogenerate one by hashing the class name and parameters. The adaptor class name specifies what adaptor to run. The datatype specifies how to process the collected data. The initial offset should normally be zero; specifying a nonzero initial offset lets you do advanced things like only processing a portion of a file.

The adaptors built into Chukwa are as follows:

FileAdaptor: Push a whole file, as one Chunk, then exit. Requires file name as parameter.

Example: add FileTailer FooData /tmp/foo 0


Example: add filetailer.FileTailingAdaptor BarData /foo/bar 0



DirTailingAdaptor Takes a directory path and a second adaptor name as mandatory parameters; repeatedly scans that directory and all subdirectories, and starts the indicated adaptor running on each file.

Example: add DirTailingAdaptor logs /var/log/ filetailer.CharFileTailingAdaptorUTF8 0

ExecAdaptor Takes a frequency (in miliseconds) as optional parameter, and then program name as mandatory parameter. Runs that program repeatedly at a rate specified by frequency.

Example: add ExecAdaptor Df 60000 /bin/df -x nfs -x none 0 Runs df every minute, labelling output as Df.