GettingStarted

Getting Started

Requirements

  1. Java 1.5 or newer, preferably from Sun. Set JAVA_HOME to the root of your Java installation.

  2. Ant build tool: [WWW] http://ant.apache.org/.

  3. To run unit tests, you also need JUnit: [WWW] http://junit.sourceforge.net/.

  4. To run pig programs, you need access to a Hadoop cluster: [WWW] http://lucene.apache.org/hadoop/. It's also possible to run pig in "local" mode, with severely limited performance - this mode doesn't require setting up a Hadoop cluster.

Building Pig

  1. Check out pig code from svn: svn co http://svn.apache.org/repos/asf/incubator/pig/trunk.

  2. Build the code from the top directory: ant. If the build is successful, you should see pig.jar created in that directory.

  3. If you need to interact with a hadoop 0.15.x cluster, run ant clean; ant -Dhadoop.jarfile=hadoop15.jar to rebuild with the 15 hadoop code. The clean is needed when switching hadoop versions

Running Pig Programs

There are two ways to run pig. The first way is by using pig.pl that can be found in the scripts directory of your source tree. Using the script would require having Perl installed on your machine. You can use it by issuing the following command: pig.pl -cp pig.jar:HADOOPSITEPATH where HADOOPSITEPATH is the directory in which hadoop-site.xml file for your Hadoop cluster is located. Example:

pig.pl -cp pig.jar:/hadoop/conf

The second way to do this is by using java directly:

java -cp pig.jar:HADOOPSITEPATH org.apache.pig.Main

This starts pig in the default map-reduce mode. You can also start pig in "local" mode:

java -cp pig.jar org.apache.pig.Main -x local

Or

java -jar pig.jar -x local

Regardless of how you invoke pig, the commands that are specified above will take you to an interactive shell called grunt where you can run DFS and pig commands. The documentation about grunt will be posted on wiki soon. If you want to run Pig in batch mode, you can append your pig script to either of the commands above. Example:

pig.pl -cp pig.jar:/hadoop/conf myscript.pig

or

java -cp pig.jar:/hadoop/conf myscript.pig

last edited 2008-05-14 14:33:06 by Jasonattributor