This page provides the information you need to get started running Pig.

Run Modes

Pig has two run modes or exectypes, local and hadoop (currently called mapreduce).

To get a listing of all Pig commands, including the run modes, use:

$ pig –help

Note: A ticket has been entered to change -x, -exectype local|mapreduce to -x, -exectype local|hadoop

Run Ways

You can run Pig three ways – using either local mode or hadoop (mapreduce) mode:

Note: Also see the Pig Latin exec and run commands.

Sample Code

The sample code files you need to run the examples on this page include:

The examples are based on these Pig commands, which extract all user IDs from the /etc/passwd file.

A = load 'passwd' using PigStorage(':'); 
B = foreach A generate $0 as id;
dump B; 
store B into ‘id.out’;

Environment

Unix and Windows users need to install and set up Java (including $JAVA_HOME).

Windows users need to install Cygwin and the Perl package (http://www.cygwin.com/)

To set environment variables, use the right command for your shell:

The examples use export.

Local Mode

This section shows you how to run Pig in local mode, using the Grunt shell, a Pig script, and an embedded program.

To run Pig in local mode, you only need access to a single machine. To make things simple, copy these files to your current working directory (you may want to create a temp directory and move to it):

Grunt Shell

To run Pig’s Grunt shell in local mode, follow these instructions.

First, point $PIG_CLASSPATH to the pig.jar file (in your current working directory):

$ export PIG_CLASSPATH=./pig.jar

From your current working directory, run:

$ pig -x local

The Grunt shell is invoked and you can enter commands at the prompt.

grunt> A = load 'passwd' using PigStorage(':'); 
grunt> B = foreach A generate $0 as id; 
grunt> dump B; 

Script File

To run a Pig script file in local mode, follow these instructions (which are the same as the Grunt Shell instructions above – you just include the script file).

First, point $PIG_CLASSPATH to the pig.jar file (in your current working directory):

$ export PIG_CLASSPATH=./pig.jar

From your current working directory, run:

$ pig -x local id.pig

The Pig Latin statements are executed and the results are displayed to your terminal screen.

Embedded Program

To compile and run an embedded Java/Pig program in local mode, follow these instructions.

From your current working directory, compile the program:

$ javac -cp pig.jar idlocal.java

Note: idlocal.class is written to your current working directory. Include “.” in the class path when you run the program.

From your current working directory, run the program:

Unix:   $ java -cp pig.jar:. idlocal
Cygwin: $ java –cp ‘.;pig.jar’ idlocal

To view the results, check the output file, id.out.

Hadoop Mode

This section shows you how to run Pig in hadoop (mapreduce) mode, using the Grunt shell, a Pig script, and an embedded program.

To run Pig in hadoop (mapreduce) mode, you need access to a Hadoop cluster. You also need to copy these files to your home or current working directory.

Grunt Shell

To run Pig’s Grunt shell in hadoop (mapreduce) mode, follow these instructions. When you begin the session, Pig will allocate a 15-node cluster. When you quit the session, Pig will deallocate the nodes.

From your current working directory, run:

$ pig
 or
$ pig -x mapreduce

The Grunt shell is invoked and you can enter commands at the prompt.

grunt> A = load 'passwd' using PigStorage(':'); 
grunt> B = foreach A generate $0 as id; 
grunt> dump B; 

Script File

To run Pig script files in hadoop (mapreduce) mode, follow these instructions (which are the same as the Grunt Shell instructions above – you just include the script file). Again, Pig will automatically allocate and deallocate a 15-node cluster.

From your current working directory, run:

$ pig id.pig
or
$ pig -x mapreduce id.pig

The Pig Latin statements are executed and the results are displayed to your terminal screen.

Embedded Program

To compile and run an embedded Java/Pig program in hadoop (mapreduce) mode, follow these instructions.

First, point $HADOOPDIR to the directory that contains the hadoop-site.xml file. Example:

$ export HADOOPDIR=/yourHADOOPsite/conf 

From your current working directory, compile the program:

$ javac -cp pig.jar idhadoop.java

Note: idhadoop.class is written to your current working directory. Include “.” in the class path when you run the program.

From your current working directory, run the program:

Unix:   $ java -cp pig.jar:.:$HADOOPDIR idhadoop
Cygwin: $ java –cp ‘.;pig.jar;$HADOOPDIR’ idhadoop

To view the results, check the idout directory on your Hadoop system.

RunPig (last edited 2009-09-20 23:38:33 by localhost)