Chukwa is a Hadoop subproject devoted to large-scale log collection and analysis. Chukwa is built on top of the Hadoop distributed filesystem (HDFS) and MapReduce framework and inherits Hadoop’s scalability and robustness. Chukwa also includes a flexible and powerful toolkit for displaying monitoring and analyzing results, in order to make the best use of this collected data.
Documentation
Guide for Chukwa Committers - How to be part of Chukwa community
Chukwa Release Process - Release process for Chukwa
How to push new information to Chukwa - A tutorial walking through the process of sending a log file to chukwa and how Chukwa parses records from the datasink file.
Chukwa_Processes_and_Data_Flow - A description of the various processes that operate on Chukwa data and how that data moves through HDFS.
Anomaly Detection Framework with Chukwa - A description of Anomaly Detection Framework design for Chukwa 0.2.
Presentations
ChukwaPoster.pdf - Chukwa Poster
chukwa_presentation.pdf - An overview of the Chukwa Monitoring System
chukwa_presentation_cca08.pdf - A talk presented about Chukwa by Berkeley graduate students at Cloud Computing and its Applications 08 October 2008.
Download
Chukwa is part of the Hadoop distribution. You can view the source as part of the Hadoop Apache SVN repository here
Papers
chukwa_cca08.pdf - Cloud Computing and its Applications (CCA) 2008
Links
JIRA HADOOP-3719 - The original Apache JIRA ticket for contributing Chukwa to Hadoop as a contrib project.
JIRA HADOOP-4709 - A batch update to the JIRA in Hadoop/src/contrib. After this update the Chukwa team will be fully embracing the Apache JIRA development model, as suggested in the comments on this JIRA.