Table of Contents |
---|
Getting the source code
First of all, you need the Pig source code.
Get the source code on your local drive using SVN. Most development is done on the "trunk":
svn checkout http://svn.apache.org/repos/asf/pig/trunk/
or
git clone https://github.com/apache/pig.git
Making Changes
Before you start, send a message to the Pig developer mailing list, or file a bug report in Jira. Describe your proposed changes and check that they fit in with what others are doing and have planned for the project. Be patient, it may take folks a while to understand your requirements.
...
- All public classes and methods should have informative Javadoc comments.
- Do not use @author tags.
- Code should be formatted according to Sun's conventions. We use four spaces (not tabs) for indentation.
- Contributions should pass unit tests.
- New unit tests should be provided to demonstrate bugs and fixes. JUnit is our test framework:
- You must implement a class that extends
junit.framework.TestCase
and whose class whose class name containsTest
. - If an
HDFS
cluster and/or aMapReduce
cluster is needed by your test, add a field of typeMiniCluster
MiniGenericCluster
to the class and initialize it with a statement like the following (the name of the field is not important).TestAlgebraicEval.java
is an example of a test that uses cluster. The test will then run on a cluster created on the local machine.
MiniCluster MiniGenericCluster cluster = MiniCluster MiniGenericCluster.buildCluster();
- You must implement a class that extends
- Define methods within your class and annotate it with
@Test
, and call JUnit's many assert methods to verify conditions; these methods will be executed when you runant test
. - Place your class in the
test
tree. - You can then run the core unit test with the command
ant test-commit
. Similarly, you can run a specific unit test with the commandant test -Dtestcase=<ClassName>
(For exampleant test -Dtestcase=TestPigFile
)
...
Make sure that your code introduces no new warnings into the javac compilation.
To compile with Hadoop 2.x:
Code Block |
---|
> ant clean jar |
The latest Pig codebase only supports Hadoop 2.x which is based on YARN and has separate Resource Manager and Application Masters instead of a single JobTracker that managed both resources (cpu, memory) and running of mapreduce applications. The exact versions of Hadoop 2.x pig compiles against is configured in ivy/libraries.properties
and is usually updated to compile against the latest stable releases.
Please note that in earlier versions Pig used to support older Hadoop versions too, and there was an option to select a certain Hadoop version at build time. If you would like to contribute to older release branches (0.16.0 or below) you will have to set the hadoopversion property. It has 2 values - 20 and 23. -Dhadoopversion=20
which is the default denotes the Hadoop 0.20.x and 1.x releases which are the old versions with JobTracker. The other option, -Dhadoopversion=23
denotes the Hadoop 0.23.x and Hadoop 2.x releases.
Unit Tests
The full suite of pig unit tests has a huge number of tests and there are multiple execution modes - mapreduce (default), spark, tez against which the whole test suite can be run. Since it takes a really long time, you are not expected to run the full suite of tests before submitting the patch. You can just run and verify the test classes affected by your patch and also run test-commit which runs a core set of tests that takes 20 mins. If the fix is specific to a particular execution mode (For eg: tez or spark), run the tests with that exectype. The Pig commit build (https://builds.apache.org/job/Pig-trunk-commit) which runs daily will report any additional failures on the committed patch and a new patch can be submitted that fixes those failures later. Some of the different test goals are test
- full suite of unit tests in mapreduce mode, test-tez
- full suite of unit tests in tez mode, test-commit
- core set of tests in mapreduce mode.
To run the full suite of testcases in mapreduce mode with Hadoop 2.x. Usually you don't have to run this unless you are doing major changes.
Code Block |
---|
> ant clean test |
To run the full suite of testcases in tez mode with Hadoop 2.x. This is a shortcut which takes care of adding -Dexectype=tez
. Usually you don't have to run this unless you are doing major changes.
Code Block |
---|
> ant clean test-tez |
To run a single testcase with Hadoop 2.x. You can do this to verify the new tests that you have added or run specific testcases affected by your patch.
Code Block |
---|
> ant clean test -Dtestcase=TestEvalPipeline |
To run a single testcase with Hadoop 2.x and tez as execution engine
Code Block |
---|
> ant clean test -Dtestcase=TestEvalPipeline2 -Dexectype=tez |
To run the core set of unit tests follow below steps. Please make sure that all the core unit tests and the tests you wrote succeed before constructing your patch.
Code Block |
---|
> cd trunk > ant -Djavac.args="-Xlint -Xmaxwarns 1000" clean jar test-commit |
This should run in around 20 minutes.
After a while, if you see
Code Block |
---|
BUILD SUCCESSFUL
|
all is ok, but if you see
Code Block |
---|
BUILD FAILED
|
then please examine error messages in build/test and fix things before proceeding.
...
Please also check the javadoc.
Code Block |
---|
> ant docdocs > firefox build/docs/api/index.html |
Examine all public classes you've changed to see that documentation is complete and informative. Your patch must not generate any javadoc warnings.
...
Check to see what files you have modified with:
Code Block |
---|
svn stat
|
Add any new files with:
Code Block |
---|
svn add src/.../MyNewClass.java
|
In order to create a patch, just type:
Code Block |
---|
svn diff > myBeautifulPatch.patch
|
...
If working from a git repo, please be aware the the default diff format will not apply in SVN repos. Please generate patches with the --no-prefix
option so they apply cleanly.
Code Block |
---|
git diff --no-prefix
|
h3. Testing a patch (Ignore this section for now)
You can run the same tools that the automated Jenkins patch test system will run on a patch. This enables you to fix problems with your patch once Jenkins or a committer points them out. The test-patch
Ant target will run your patch through the same checks that Jenkins currently does except for executing the core and contrib unit tests.
To use this target, you must run it from a clean workspace (ie svn stat
shows no modifications or additions). From your clean workspace, run:
Code Block |
---|
ant \
-Dpatch.file=/patch/to/my.patch \
-Dforrest.home=/path/to/forrest/ \
-Dfindbugs.home=/path/to/findbugs \
-Djava5.home=/patch/to/java5home \
-Dscratch.dir=/path/to/a/temp/dir \ (optional)
-Dsvn.cmd=/path/to/subversion/bin/svn \ (optional)
-Dgrep.cmd=/path/to/grep \ (optional)
-Dpatch.cmd=/path/to/patch \ (optional)
test-patch
|
...
To apply a patch either you generated or found from JIRA, from the trunk
directory you can issue
Code Block |
---|
patch -p0 <cool_patch.patch
|
if you just want to check whether the patch applies you can run patch with --dry-run option
Code Block |
---|
patch -p0 --dry-run <cool_patch.patch
|
...
Looking for a place to start? A great first place is to peruse the JIRA and find an issue that needs resolved. Especially, here is a list of Jiras marked as "newbie". If you're looking for a bigger project, try the Pig Journal. This gives a list of projects the Pig team would like to see worked on.