Differences between revisions 2 and 3
Revision 2 as of 2009-06-09 20:16:12
Size: 2292
Editor: nat-dip6
Revision 3 as of 2009-09-20 23:38:08
Size: 2292
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
[[Anchor(Built-in_functions)]] <<Anchor(Built-in_functions)>>
Line 9: Line 9:
[[Anchor(Storage_Functions)]] <<Anchor(Storage_Functions)>>
Line 18: Line 18:
[[Anchor(Filter_Functions)]] <<Anchor(Filter_Functions)>>
Line 22: Line 22:
[[Anchor(Eval_Functions)]] <<Anchor(Eval_Functions)>>
Line 32: Line 32:
[[Anchor(Group_Functions)]] <<Anchor(Group_Functions)>>

Note: For Pig 0.2.0 or later, some content on this page may no longer be applicable.

Built-in functions

We have a modest library of built-in functions. Feel free to contribute your own.

Storage Functions

These functions are used for loading/storing data.

  • PigStorage - for loading/storing text files with delimited records. Note that PigStorage can only store flat tuples, i.e., tuples having atomic fields. If you want to store nested data, use BinStorage instead.

  • BinStorage - BinStorage can store arbitrarily nested data. It can also be used for loading intermediate results that were previously stored using it.

  • TextLoader - for loading unstructured text files. Each line is loaded as a tuple with a single field which is the entire line. It cannot be used for storing data.

  • PigDump - for storing arbitrarily nested data in human-readable format.

Filter Functions

  • IsEmpty - tests whether a bag is empty

Eval Functions

  • COUNT - computes the number of elements in a bag (also known as the "cardinality" of a bag)

  • SUM - computes the sum of the numeric values in a single-column bag

  • AVG - computes the average of the numeric values in a single-column bag

  • MIN/MAX - computes the min/max of the numeric values in a single-column bag.

  • ARITY - computes the number of fields in a tuple (also known as the "arity" of a tuple)

  • TOKENIZE - splits a string and outputs a bag of words

  • DIFF - Compares the fields of a tuple with arity 2. If the fields are DataBags, it will emit any Tuples that are in on of the DataBags but not the other. If the fields are values, it will emit tuples with values that do not match.

Group Functions

There are as yet no built-in group functions because usually users just want to group by the values of fields. If you want all tuples to go in the same group, you can use GROUP <alias> ALL. Similarly, you can say GROUP <alias> ANY if you don't care about how tuples are grouped. See PigLatin.

PigBuiltins (last edited 2009-09-20 23:38:08 by localhost)