|
Size: 10823
Comment:
|
← Revision 8 as of 2009-09-20 23:38:30 ⇥
Size: 10823
Comment: converted to 1.6 markup
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 1: | Line 1: |
| [[Anchor(Grunt_Shell)]] | <<Anchor(Grunt_Shell)>> |
| Line 6: | Line 6: |
| [[Anchor(Introduction)]] | <<Anchor(Introduction)>> |
| Line 11: | Line 11: |
| [[Anchor(Commands)]] | <<Anchor(Commands)>> |
| Line 16: | Line 16: |
| [[Anchor(DFS)]] | <<Anchor(DFS)>> |
| Line 21: | Line 21: |
| [[Anchor(cat)]] | <<Anchor(cat)>> |
| Line 42: | Line 42: |
| [[Anchor(cd)]] | <<Anchor(cd)>> |
| Line 62: | Line 62: |
| [[Anchor(copyFromLocal)]] | <<Anchor(copyFromLocal)>> |
| Line 86: | Line 86: |
| [[Anchor(copyToLocal)]] | <<Anchor(copyToLocal)>> |
| Line 104: | Line 104: |
| [[Anchor(cp)]] | <<Anchor(cp)>> |
| Line 121: | Line 121: |
| [[Anchor(ls)]] | <<Anchor(ls)>> |
| Line 145: | Line 145: |
| [[Anchor(mkdir)]] | <<Anchor(mkdir)>> |
| Line 165: | Line 165: |
| [[Anchor(mv)]] | <<Anchor(mv)>> |
| Line 182: | Line 182: |
| [[Anchor(pwd)]] | <<Anchor(pwd)>> |
| Line 195: | Line 195: |
| [[Anchor(rm)]] | <<Anchor(rm)>> |
| Line 211: | Line 211: |
| [[Anchor(Pig)]] | <<Anchor(Pig)>> |
| Line 216: | Line 216: |
| [[Anchor(Other_Commands)]] | <<Anchor(Other_Commands)>> |
| Line 219: | Line 219: |
| [[Anchor(define)]] | <<Anchor(define)>> |
| Line 223: | Line 223: |
| [[Anchor(describe)]] | <<Anchor(describe)>> |
| Line 263: | Line 263: |
| [[Anchor(dump)]] | <<Anchor(dump)>> |
| Line 267: | Line 267: |
| [[Anchor(explain)]] | <<Anchor(explain)>> |
| Line 294: | Line 294: |
| [[Anchor(help)]] | <<Anchor(help)>> |
| Line 326: | Line 326: |
| [[Anchor(illustrate)]] | <<Anchor(illustrate)>> |
| Line 357: | Line 357: |
| [[Anchor(kill)]] | <<Anchor(kill)>> |
| Line 371: | Line 371: |
| [[Anchor(quit)]] | <<Anchor(quit)>> |
| Line 379: | Line 379: |
| [[Anchor(register)]] | <<Anchor(register)>> |
| Line 384: | Line 384: |
| [[Anchor(set)]] | <<Anchor(set)>> |
| Line 410: | Line 410: |
| [[Anchor(store)]] | <<Anchor(store)>> |
Grunt Shell
Note: For Pig 0.2.0 or later, some content on this page may no longer be applicable.
Introduction
This document describes commands supported by grunt that can be used in interactive shell as well as in batch mode. The supported commands include DFS commands, pig commands as well as a few others. All of them are discussed in the document.
Commands
This section describes currently available commands. The commands in each section are listed in alphabetical order. All commands are case insensitive and white spaces are not significant.
DFS
This is a basic set of commands that allow you to navigate hadoop file system.
cat
This command is similar to the Unix cat commands and allows to print content of file(s) to the screen.
cat <PATH1> <PATH2> ...
If multiple files are specified, they are concatenated together . If directory is specified, it is recursively traversed and all content is concatenated together.
Example:
grunt> cat students joe smith john adams anne white grunt>
cd
This command is similar to the Unix cd command and can be used to navigate the file system:
cd <DIR> or cd
If directory is specified, this directory is made user's current working directory and all other operations happen relatively to this directory. If no directory is specified, user's home directory (/user/NAME) becomes the current working directory.
Example:
grunt> cd /data grunt>
copyFromLocal
This command allows to copy a file or a director from local file system to DFS.
copyFromLocal <SRC PATH> <DST PATH>
If a directory is specified, it is recursively copied over. "." can be used to specify that the new file/directory should be created in the current working directory and retain the name of the source file/directory.
Examples:
grunt> copyFromLocal /data/students students grunt> ls students /data/students <r 3> 8270 grunt> copyFromLocal /data/tests new_tests grunt> ls new_test /data/new_test/test1.data<r 3> 664 /data/new_test/test2.data<r 3> 344 /data/new_test/more_data <dir>
copyToLocal
This command allows to copy file or directory from DFS to a local file system.
copyToLocal <SRC PATH> <DST PATH>
If a directory is specified, it is recursively copied over. "." can be used to specify that the new file/directory should be created in the current working directory (directory from which the script was executed or grunt shell started) and retain the name of the source file/directory.
Examples:
grunt> copyToLocal students /data copyToLocal data /data/mydata
cp
This command is similar to the Unix cp command and allows to copy files or directories within DFS.
cp <SRC PATH> <DST PATH>
If a directory is specified, it is recursively copied over. "." can be used to specify that the new file/directory should be created in the current working directory and retain the name of the source file/directory.
Examples
cp students students_save
ls
This command is similar to the Unix ls command and allows to list the content of a directory.
ls <DIR> or ls
If DIR is specified, the command lists the content of the specified directory. Otherwise, the content of the current working directory is listed.
Example:
grunt> ls /data /data/DDLs <dir> /data/count <dir> /data/data <dir> /data/schema <dir> grunt>
mkdir
This command is similar to the Unix mkdir command and allows to create new directories.
mkdir <DIR>
If parts of the path do not exist, they will get created.
Example:
grunt> mkdir data/20070905 grunt>
If neither data nor 20070905 directories existed, they both would be created.
mv
This command is identical to cp except it removes the source file/directory as soon as it is copied.
Example:
grunt> mv output output2 grunt> ls output File or directory output does not exist. grunt> ls output2 /data/output2/map-000000<r 3> 508844 /data/output2/output3 <dir> /data/output2/part-00000<r 3> 0
pwd
This command is identical to Unix pwd command and it prints the name of the current working directory.
Example:
grunt> pwd /data grunt>
rm
This command is similar to Unix rm command and it allows to remove one or more file/directory. %RED% Note that it would recursively remove a directory even if it is not empty and it does not confirm remove and the removed data is not recoverable.%ENDCOLOR%
rm <PATH1> <PATH2> ...
Examples:
grunt> rm /data/students grunt> rm students students_sav
Pig
All regular pig commands can be executed from the shell. See PigLatin for more details.
Other Commands
define
Allows to define parameterized user defined function. Used in conjunction with register. Described in PigFunctions
describe
This command allows to review a schema of a particular alias. Schema format is described in PigLatinSchemas.
Example:
grunt> a = load '/data/students' as (name, age, gpa); grunt> b = filter a by name matches 'zach%'; grunt> c = group b by name; grunt> d = foreach c generate group, COUNT(b.age); grunt> describe a a: (name, age, gpa ) grunt> describe b b: (name, age, gpa ) grunt> describe c c: (group, b: (name, age, gpa ) ) grunt> describe d d: (group, count1 )
If you don't specify names for columns, you would not see it generated as the example below shows:
grunt> a = load '/data/students'; grunt> b = filter a by $0 matches 'zach%'; grunt> c = group b by $0; grunt> d = foreach c generate group, COUNT(b.$1); grunt> describe a; a: ( ) grunt> describe b; b: ( ) grunt> describe c; c: (group: ( ), b: ( ) ) grunt> describe d; d: (group: ( ), count1 )
dump
Allows to dump content of pig alias to the screen. Useful for debugging. Described in PigLatin
explain
This command allows to review the execution plan to compute the specified relationship.
grunt> A = load '/user/pig/tests/data/singlefile/studenttab10k' using PigStorage('\t') as (name, age, gpa);
grunt> B = group A by name;
grunt> C = foreach B generate group, COUNT(A.$1);
grunt> explain C;
Logical Plan:
|---LOEval ( GENERATE {[PROJECT $0],[COUNT(GENERATE {[PROJECT $1]->[PROJECT $1]})]} )
|---LOCogroup ( GENERATE {[PROJECT $0],[*]} )
|---LOLoad ( file = /user/pig/tests/data/singlefile/studenttab10k AS name,age,gpa )
-----------------------------------------------
Physical Plan:
|---POMapreduce
Map : *
Combine : Generate(Project(0),FuncEval(org.apache.pig.builtin.COUNT$Initial(Generate(Composite(Project(1),Project(1))))))
Reduce : Generate(Project(0),FuncEval(org.apache.pig.builtin.COUNT$Final(Generate(Composite(Project(1),Project(1))))))
Grouping : Generate(Project(0),*)
Input File(s) : /user/pig/tests/data/singlefile/studenttab10k
Properties : pig.input.splittable:trueThe output consists of two parts: logical plan and physical plan. The logical plan shows a pipeline of operators to be executed to build the relation. The physical plan shows how this is mapped to the physical backend; in this case - Hadoop.
help
This command shows available commands.
Example:
grunt> help Commands: <pig latin statement>; store <alias> into <filename> [using <functionSpec>] dump <alias> describe <alias> kill <job_id> ls <path> du <path> mv <src> <dst> cp <src> <dst> rm <src> copyFromLocal <localsrc> <dst> cd <dir> pwd cat <src> copyToLocal <src> <localdst> mkdir <path> cd <path> define <functionAlias> <functionSpec> register <udfJar> debugOn debugOff quit
illustrate
This command shows a sample execution of your script.
grunt> A = load '/user/pig/tests/data/singlefile/studenttab10k' using PigStorage('\t') as (name, age, gpa);
grunt> B = group A by name;
grunt> C = foreach B generate group, COUNT(A.$1);
grunt> illustrate C;
----------------------------------------
| A | name | age | gpa |
----------------------------------------
| | xavi ... bec | 58 | 2.99 |
| | xavi ... bec | 23 | 0.59 |
----------------------------------------
-------------------------------------------------------------------------------
| B | group | A: (name, age, gpa ) |
-------------------------------------------------------------------------------
| | xavi ... bec | {(xavi ... bec, 58, 2.99), (xavi ... bec, 23, 0.59)} |
-------------------------------------------------------------------------------
---------------------------------
| C | group | count1 |
---------------------------------
| | xavi ... bec | 2 |
---------------------------------The details can be seen in ExampleGenerator.
kill
This command allows to kill a job based on its job id.
kill <JOBID>
Example:
grunt> kill job_0001
quit
This command should be used to exit the shell.
grunt> quit
register
Allows to register jar with user defined functions. Can be used in conjunction with define. Described in PigFunctions
set
This command allows to path key-value pairs to pig. The format of the command is:
grunt> set <key> '<value>'
Both keys and values are case sensitive.
The following keys are currently supported:
Key |
Value |
Description |
debug |
on/off |
enables/disables debug level logging |
job.name |
single quoted string that contains the name |
allows to set user specified name for the the job |
Examples:
grunt> set debug on grunt> set debug off grunt> set job.name 'my job' grunt>
store
This command allows to store content of pig alias to a file. Described in PigLatin.