This page provides a synopsis of the conference call held to discuss HADOOP-6332

Attendees

Konstantin Boudnik (Yahoo), Alex Loddengaard (Cloudera), Steve Loughran (HP), Steve Watt (IBM), Stuart Hood (Rackspace)

Definition

Cluster QA - Provide Quality Assurance by verifying with runtime testing that a Hadoop Cluster is setup and working correctly. At present there is no way to do this. There are existing tests like TeraSort but they only test certain aspects of the system.

Requirements

Each attendee described their requirements and what they would like to see from the end product.

General Consensus : There is a user need for regression testing (* on what specifically?* )

Steve Loughran :

  • Dynamic cluster testing
  • Cluster QA

Steve Watt :

  • Verify that Hadoop works on IBM Java and IBM distributions such as RHEL and SLES.
  • Cluster QA
  • Reduce QA time (The existing functional tests take too long and distributed execution might reduce the overall time)

Alex Loddengaard:

  • Cluster QA

Kostantin Boudnik:

  • Cluster QA

Initial Proposal

  • Deployment is a separate concern from Testing. Deployment is outside of the scope of this JIRA. Users will be responsible for setting up their cluster.
  • Deployment information is provided as a set of parameters (possibly environment variables) to the testing framework.
  • Create a 'Cluster QA' set of JUNIT based tests. Start by selecting several of the existing functional tests and port them to run over the cluster.
  • Contain the tests inside a hadoop-version-test.jar (or alternatively separate jars for each project split, i.e. mapred-test, hdfs-test etc.)
  • Design the framework such that is able to take the parameters specified and run the tests in the test jar specified against the cluster identified.
  • Once the cluster is running the tests could be run against the default cluster with something like : "bin/hadoop -verify"
  • Start collecting data on how long system tests (such as TeraSort) take to run on certain cluster configurations so we can provide users a benchmark they can use to validate the health of their cluster.

Next Steps

  • Iterate in public (publish this Wiki Link and the Meeting Wiki Page to the mailing lists to solicit feedback)
  • Get community engagement and consensus on approach
  • Provide first drop of code

Available Resources

  • Stephen Watt (70 % of his time)
  • Konstantin Boudnik (100 % of his time)
  • No labels