Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Updated compilation and test instructions

...

Make sure that your code introduces no new warnings into the javac compilation.

To compile with Hadoop 1.x 

Code Block
> ant clean jar

To compile with Hadoop 2.x 

Code Block
> ant clean jar -Dhadoopversion=23

The hadoopversion setting has 2 values - 20 and 23. -Dhadoopversion=20 which is the default denotes the Hadoop 0.20.x and 1.x releases which are the old versions with JobTracker. -Dhadoopversion=23 denotes the Hadoop 0.23.x and Hadoop 2.x releases which are the next gen versions of Hadoop which are based on YARN and have separate Resource Manager and Application Masters instead of a single JobTracker that managed both resources (cpu, memory) and running of mapreduce applications.  The exact versions of Hadoop 1.x or 2.x pig compiles against is configured in ivy/libraries.properties and is usually updated to compile against the latest stable releases.

Unit Tests

The full suite of pig unit tests has a huge number of tests and there are multiple hadoop versions - Hadoop 1.x and Hadoop 2.x and multiple execution modes - mapreduce (default), spark, tez against which the whole test suite can be run. Since it takes a really long time, you are not expected to run the full suite of tests before submitting the patch. You can just run and verify the test classes affected by your patch and also run test-commit which runs a core set of tests that takes 20 mins. If the fix is specific to a particular execution mode (For eg: tez or spark), run the tests with that exectype. The Pig commit build (https://builds.apache.org/job/Pig-trunk-commit) which runs daily will report any additional failures on the committed patch and a new patch can be submitted that fixes those failures later.  Some of the different test goals are test - full suite of unit tests in mapreduce mode, test-tez - full suite of unit tests in tez mode, test-commit - core set of tests in mapreduce mode.  In the below examples, remove -Dhadoopversion=23 to run the tests with Hadoop 1.x instead of Hadoop 2.x. The tez and spark execution modes are only applicable with Hadoop 2.x. 

To run the full suite of testcases in mapreduce mode with Hadoop 2.x. Usually you don't have to run this unless you are doing major changes.

Code Block
> ant clean test -Dhadoopversion=23

To run the full suite of testcases in tez mode with Hadoop 2.x. This is a shortcut which takes care of adding -Dhadoopversion=23 -Dexectype=tez . Usually you don't have to run this unless you are doing major changes.

Code Block
> ant clean test-tez

To run a single testcase with Hadoop 2.x. You can do this to verify the new tests that you have added or run specific testcases affected by your patch.

Code Block
> ant clean test -Dtestcase=TestEvalPipeline -Dhadoopversion=23

To run a single testcase with Hadoop 2.x and tez as execution engine

Code Block
> ant clean test -Dtestcase=TestEvalPipeline2 -Dhadoopversion=23 -Dexectype=tez

To run the core set of unit tests follow below steps. Please make sure that all the core unit tests and the tests you wrote succeed before constructing your patch. 

Code Block
> cd trunk
> ant -Djavac.args="-Xlint -Xmaxwarns 1000" clean jar test-commit -Dhadoopversion=23

This should run in around 20 minutes.

...