|
Size: 6136
Comment:
|
← Revision 34 as of 2009-09-20 23:38:33 ⇥
Size: 6148
Comment: converted to 1.6 markup
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 27: | Line 27: |
| * Script file: attachment:id.pig * Embedded program: attachment:idlocal.java and attachment:idhadoop.java |
* Script file: [[attachment:id.pig]] * Embedded program: [[attachment:idlocal.java]] and [[attachment:idhadoop.java]] |
This page provides the information you need to get started running Pig.
Run Modes
Pig has two run modes or exectypes, local and hadoop (currently called mapreduce).
Local Mode: To run Pig in local mode, you need access to a single machine.
Hadoop (mapreduce) Mode: To run Pig in hadoop (mapreduce) mode, you need access to a Hadoop cluster and HDFS installation.
To get a listing of all Pig commands, including the run modes, use:
$ pig –help
Note: A ticket has been entered to change -x, -exectype local|mapreduce to -x, -exectype local|hadoop
Run Ways
You can run Pig three ways – using either local mode or hadoop (mapreduce) mode:
Grunt Shell: Enter Pig commands manually using Pig’s interactive shell, Grunt.
Script File: Place Pig commands in a script file and run the script.
Embedded Program: Embed Pig commands in a host language and run the program.
Note: Also see the Pig Latin exec and run commands.
Sample Code
The sample code files you need to run the examples on this page include:
Script file: id.pig
Embedded program: idlocal.java and idhadoop.java
The examples are based on these Pig commands, which extract all user IDs from the /etc/passwd file.
A = load 'passwd' using PigStorage(':');
B = foreach A generate $0 as id;
dump B;
store B into ‘id.out’;
Environment
Unix and Windows users need to install and set up Java (including $JAVA_HOME).
Windows users need to install Cygwin and the Perl package (http://www.cygwin.com/)
To set environment variables, use the right command for your shell:
- setenv PIGDIR /pig (tcsh, csh)
- export PIGDIR=/pig (bash, sh, ksh)
The examples use export.
Local Mode
This section shows you how to run Pig in local mode, using the Grunt shell, a Pig script, and an embedded program.
To run Pig in local mode, you only need access to a single machine. To make things simple, copy these files to your current working directory (you may want to create a temp directory and move to it):
- The /etc/passwd file
The pig.jar file, created when you build Pig (see BuildPig)
- The sample code files (id.pig and idlocal.java) located on this page
Grunt Shell
To run Pig’s Grunt shell in local mode, follow these instructions.
First, point $PIG_CLASSPATH to the pig.jar file (in your current working directory):
$ export PIG_CLASSPATH=./pig.jar
From your current working directory, run:
$ pig -x local
The Grunt shell is invoked and you can enter commands at the prompt.
grunt> A = load 'passwd' using PigStorage(':');
grunt> B = foreach A generate $0 as id;
grunt> dump B;
Script File
To run a Pig script file in local mode, follow these instructions (which are the same as the Grunt Shell instructions above – you just include the script file).
First, point $PIG_CLASSPATH to the pig.jar file (in your current working directory):
$ export PIG_CLASSPATH=./pig.jar
From your current working directory, run:
$ pig -x local id.pig
The Pig Latin statements are executed and the results are displayed to your terminal screen.
Embedded Program
To compile and run an embedded Java/Pig program in local mode, follow these instructions.
From your current working directory, compile the program:
$ javac -cp pig.jar idlocal.java
Note: idlocal.class is written to your current working directory. Include “.” in the class path when you run the program.
From your current working directory, run the program:
Unix: $ java -cp pig.jar:. idlocal Cygwin: $ java –cp ‘.;pig.jar’ idlocal
To view the results, check the output file, id.out.
Hadoop Mode
This section shows you how to run Pig in hadoop (mapreduce) mode, using the Grunt shell, a Pig script, and an embedded program.
To run Pig in hadoop (mapreduce) mode, you need access to a Hadoop cluster. You also need to copy these files to your home or current working directory.
- The /etc/passwd file
The pig.jar file, created when you build Pig (see BuildPig)
- The sample code files (id.pig and idhadoop.java) located on this page
Grunt Shell
To run Pig’s Grunt shell in hadoop (mapreduce) mode, follow these instructions. When you begin the session, Pig will allocate a 15-node cluster. When you quit the session, Pig will deallocate the nodes.
From your current working directory, run:
$ pig or $ pig -x mapreduce
The Grunt shell is invoked and you can enter commands at the prompt.
grunt> A = load 'passwd' using PigStorage(':');
grunt> B = foreach A generate $0 as id;
grunt> dump B;
Script File
To run Pig script files in hadoop (mapreduce) mode, follow these instructions (which are the same as the Grunt Shell instructions above – you just include the script file). Again, Pig will automatically allocate and deallocate a 15-node cluster.
From your current working directory, run:
$ pig id.pig or $ pig -x mapreduce id.pig
The Pig Latin statements are executed and the results are displayed to your terminal screen.
Embedded Program
To compile and run an embedded Java/Pig program in hadoop (mapreduce) mode, follow these instructions.
First, point $HADOOPDIR to the directory that contains the hadoop-site.xml file. Example:
$ export HADOOPDIR=/yourHADOOPsite/conf
From your current working directory, compile the program:
$ javac -cp pig.jar idhadoop.java
Note: idhadoop.class is written to your current working directory. Include “.” in the class path when you run the program.
From your current working directory, run the program:
Unix: $ java -cp pig.jar:.:$HADOOPDIR idhadoop Cygwin: $ java –cp ‘.;pig.jar;$HADOOPDIR’ idhadoop
To view the results, check the idout directory on your Hadoop system.