We describe a general framework for implementing algorithms for detecting anomalies in systems (Hadoop or otherwise) being monitored by Chukwa, by using the data collected by the Chukwa framework, as well as for visualizing the outcomes of these algorithms. We envision that anomaly detection algorithms for the Chukwa-monitored clusters can be most naturally implemented as described here.
The types of operations that this framework would enable fall in these broad categories:
The tasks described above will be performed in a PostProcess stage which occurs after the Demux. These tasks will take as their inputs the output of the Demux stage, and generate as their outputs anomalous system elements, (ii) abstract system views, or (iii) visualizable data (e.g. raw datapoints to be fed into visualization widgets). These tasks will be MapReduce or Pig jobs, and Chukwa would manage these tasks by accepting a list of MapReduce and/or Pig jobs, and these jobs would form the anomaly detection workflow.
In keeping with the consistency of the Chukwa architecture, these jobs in the anomaly detection workflow would have to accept SequenceFiles of ChukwaRecords as their inputs, and would generate SequenceFiles of ChukwaRecords as their outputs.
Finally, the outputs of these tasks would be fed into HICC for visualization. The current approach would be to use the MDL (Metrics Data Loader) to load the data to an RDBMS of choice which can be read by HICC widgets.
Hence, the overall workflow of the anomaly detection would be as follows:
Current active developments for the Chukwa Anomaly Detection Framework are for detecting anomalies in Hadoop based on the following tools/concepts from the CMU Fingerpointing project:
The FSMBuilder
component implements SALSA state-machine extraction, and is a MapReduce job which reads SequenceFiles of ChukwaRecords and outputs SequenceFiles of ChukwaRecords, with each ChukwaRecord storing a single state. We describe the workflows for some of the tools below:
This visualization shows the detailed task-level progress of MapReduce jobs across nodes in the cluster.
FSMBuilder
MapReduce job, available soon) SALSA is used to extract state-machine views of Hadoop's execution - uses post-Demux output; uses JobData/JobHistory
FSMBuilder
is loaded into RDBMS using MDLThis visualization shows the aggregate data-flows across DataNodes in an HDFS instance.
FSMBuilder
, available soon) SALSA is used to extract state-machine views of Hadoop's execution - uses post-Demux output; uses ClientTraceDetailed
(CHUKWA-282)FSMBuilder
, available soon) SALSA is used to extract state-machine views of Hadoop's execution - uses post-Demux output; uses JobData/JobHistory
This visualization is generated from the output of the Demux operation. The steps (mostly envisioned to be automated) involved in generating the visualization are:
FSMBuilder
(Currently unavailable, pending feature additions to the PostProcessor to support non-MDL tasks): Read post-Demux data (SequenceFiles of ChukwaRecords of JobData data) as input, write state-machine view as SequenceFiles of ChukwaRecords of states. (Unsupported at time of writing, but will be available soon)