Grunt Shell

Note: For Pig 0.2.0 or later, some content on this page may no longer be applicable.

Introduction

This document describes commands supported by grunt that can be used in interactive shell as well as in batch mode. The supported commands include DFS commands, pig commands as well as a few others. All of them are discussed in the document.

Commands

This section describes currently available commands. The commands in each section are listed in alphabetical order. All commands are case insensitive and white spaces are not significant.

DFS

This is a basic set of commands that allow you to navigate hadoop file system.

cat

This command is similar to the Unix cat commands and allows to print content of file(s) to the screen.

cat <PATH1> <PATH2> ...

If multiple files are specified, they are concatenated together . If directory is specified, it is recursively traversed and all content is concatenated together.

Example:

grunt> cat students
joe smith
john adams
anne white
grunt>

cd

This command is similar to the Unix cd command and can be used to navigate the file system:

cd <DIR>
or
cd

If directory is specified, this directory is made user's current working directory and all other operations happen relatively to this directory. If no directory is specified, user's home directory (/user/NAME) becomes the current working directory.

Example:

grunt> cd /data
grunt>

copyFromLocal

This command allows to copy a file or a director from local file system to DFS.

copyFromLocal <SRC PATH> <DST PATH>

If a directory is specified, it is recursively copied over. "." can be used to specify that the new file/directory should be created in the current working directory and retain the name of the source file/directory.

Examples:

grunt> copyFromLocal /data/students students
grunt> ls students
/data/students <r 3> 8270
grunt>  copyFromLocal  /data/tests new_tests
grunt> ls new_test
/data/new_test/test1.data<r 3>   664
/data/new_test/test2.data<r 3>    344
/data/new_test/more_data        <dir>

copyToLocal

This command allows to copy file or directory from DFS to a local file system.

copyToLocal <SRC PATH> <DST PATH>

If a directory is specified, it is recursively copied over. "." can be used to specify that the new file/directory should be created in the current working directory (directory from which the script was executed or grunt shell started) and retain the name of the source file/directory.

Examples:

grunt> copyToLocal students /data
copyToLocal data /data/mydata

cp

This command is similar to the Unix cp command and allows to copy files or directories within DFS.

cp <SRC PATH> <DST PATH>

If a directory is specified, it is recursively copied over. "." can be used to specify that the new file/directory should be created in the current working directory and retain the name of the source file/directory.

Examples

cp students students_save

ls

This command is similar to the Unix ls command and allows to list the content of a directory.

ls <DIR>
or
ls

If DIR is specified, the command lists the content of the specified directory. Otherwise, the content of the current working directory is listed.

Example:

grunt> ls /data
/data/DDLs  <dir>
/data/count <dir>
/data/data  <dir>
/data/schema        <dir>
grunt>

mkdir

This command is similar to the Unix mkdir command and allows to create new directories.

mkdir <DIR>

If parts of the path do not exist, they will get created.

Example:

grunt> mkdir data/20070905
grunt> 

If neither data nor 20070905 directories existed, they both would be created.

mv

This command is identical to cp except it removes the source file/directory as soon as it is copied.

Example:

grunt> mv output output2
grunt> ls output
File or directory output does not exist.
grunt> ls output2
/data/output2/map-000000<r 3>     508844
/data/output2/output3     <dir>
/data/output2/part-00000<r 3>     0

pwd

This command is identical to Unix pwd command and it prints the name of the current working directory.

Example:

grunt> pwd
/data
grunt>

rm

This command is similar to Unix rm command and it allows to remove one or more file/directory. %RED% Note that it would recursively remove a directory even if it is not empty and it does not confirm remove and the removed data is not recoverable.%ENDCOLOR%

rm <PATH1> <PATH2> ...

Examples:

grunt> rm /data/students
grunt> rm students students_sav

Pig

All regular pig commands can be executed from the shell. See PigLatin for more details.

Other Commands

define

Allows to define parameterized user defined function. Used in conjunction with register. Described in PigFunctions

describe

This command allows to review a schema of a particular alias. Schema format is described in PigLatinSchemas.

Example:

grunt> a = load '/data/students' as (name,
age, gpa);
grunt> b = filter a by name matches 'zach%';
grunt> c = group b by name;
grunt> d = foreach c generate group, COUNT(b.age);
grunt> describe a
a: (name, age, gpa )
grunt> describe b
b: (name, age, gpa )
grunt> describe c
c: (group, b: (name, age, gpa ) )
grunt> describe d
d: (group, count1 )

If you don't specify names for columns, you would not see it generated as the example below shows:

grunt> a = load '/data/students';
grunt> b = filter a by $0 matches 'zach%';
grunt> c = group b by $0;
grunt> d = foreach c generate group, COUNT(b.$1);
grunt> describe a;
a: ( )
grunt> describe b;
b: ( )
grunt> describe c;
c: (group: ( ), b: ( ) )
grunt> describe d;
d: (group: ( ), count1 )

dump

Allows to dump content of pig alias to the screen. Useful for debugging. Described in PigLatin

explain

This command allows to review the execution plan to compute the specified relationship.

grunt> A = load '/user/pig/tests/data/singlefile/studenttab10k' using PigStorage('\t') as (name, age, gpa);
grunt> B = group A by name;
grunt> C = foreach B generate group, COUNT(A.$1);
grunt> explain C;
Logical Plan:
|---LOEval ( GENERATE {[PROJECT $0],[COUNT(GENERATE {[PROJECT $1]->[PROJECT $1]})]} ) 
      |---LOCogroup ( GENERATE {[PROJECT $0],[*]} ) 
            |---LOLoad ( file = /user/pig/tests/data/singlefile/studenttab10k AS name,age,gpa )
-----------------------------------------------
Physical Plan:
|---POMapreduce
    Map : *
    Combine : Generate(Project(0),FuncEval(org.apache.pig.builtin.COUNT$Initial(Generate(Composite(Project(1),Project(1))))))
    Reduce : Generate(Project(0),FuncEval(org.apache.pig.builtin.COUNT$Final(Generate(Composite(Project(1),Project(1))))))
    Grouping : Generate(Project(0),*)
    Input File(s) : /user/pig/tests/data/singlefile/studenttab10k
    Properties : pig.input.splittable:true

The output consists of two parts: logical plan and physical plan. The logical plan shows a pipeline of operators to be executed to build the relation. The physical plan shows how this is mapped to the physical backend; in this case - Hadoop.

help

This command shows available commands.

Example:

grunt> help
Commands:
<pig latin statement>;
store <alias> into <filename> [using <functionSpec>]
dump <alias>
describe <alias>
kill <job_id>
ls <path>
du <path>
mv <src> <dst>
cp <src> <dst>
rm <src>
copyFromLocal <localsrc> <dst>
cd <dir>
pwd
cat <src>
copyToLocal <src> <localdst>
mkdir <path>
cd <path>
define <functionAlias> <functionSpec>
register <udfJar>
debugOn
debugOff
quit

illustrate

This command shows a sample execution of your script.

grunt> A = load '/user/pig/tests/data/singlefile/studenttab10k' using PigStorage('\t') as (name, age, gpa);
grunt> B = group A by name;
grunt> C = foreach B generate group, COUNT(A.$1);
grunt> illustrate C;

----------------------------------------
| A     | name         | age   | gpa   | 
----------------------------------------
|       | xavi ... bec | 58    | 2.99  | 
|       | xavi ... bec | 23    | 0.59  | 
----------------------------------------
-------------------------------------------------------------------------------
| B     | group        | A: (name, age, gpa )                                 | 
-------------------------------------------------------------------------------
|       | xavi ... bec | {(xavi ... bec, 58, 2.99), (xavi ... bec, 23, 0.59)} | 
-------------------------------------------------------------------------------
---------------------------------
| C     | group        | count1 | 
---------------------------------
|       | xavi ... bec | 2      | 
---------------------------------

The details can be seen in ExampleGenerator.

kill

This command allows to kill a job based on its job id.

kill <JOBID>

Example:

grunt> kill job_0001

quit

This command should be used to exit the shell.

grunt> quit

register

Allows to register jar with user defined functions. Can be used in conjunction with define. Described in PigFunctions

#SetCommand

set

This command allows to path key-value pairs to pig. The format of the command is:

grunt> set <key> '<value>'

Both keys and values are case sensitive.

The following keys are currently supported:

Key

Value

Description

debug

on/off

enables/disables debug level logging

job.name

single quoted string that contains the name

allows to set user specified name for the the job

Examples:

grunt> set debug on
grunt> set debug off
grunt> set job.name 'my job'
grunt>

store

This command allows to store content of pig alias to a file. Described in PigLatin.

Grunt (last edited 2009-09-20 23:38:30 by localhost)