FrontPage

Pig Wiki

[WWW] Pig is a dataflow programming environment for processing very large files. Pig's language is called Pig Latin. A Pig Latin program consists of a directed acyclic graph where each node represents an operation that transforms data. Operations are of two flavors: (1) relational-algebra style operations such as join, filter, project; (2) functional-programming style operators such as map, reduce.

Pig compiles these dataflow programs into (sequences of) map-reduce jobs and executes them using Hadoop. It is also possible to execute Pig Latin programs in a "local" mode (without Hadoop cluster), in which case all processing takes place in a single local JVM.

News

Do you Pig? Most of Yahoo does!

New to Pig? Try our fresh off the press [WWW] Pig Tutorial!

Need Pig functions? Take a look at our brand new [WWW] Piggy Bank!

Interested in Pig Guts? We are completely redesigning the Pig execution and optimization framework. This work includes (1) creating new operator representations at the various layers (logical, physical, map-reduce) to facilitate optimization and (2) streamlining the execution pipeline. See the PigOptimizationWishList and PigExecutionModel for the design details. Implementation is already underway ...

General Information

User Documentation

Developer Documentation

Related Resources

[WWW] Hadoop Wiki

last edited 2008-07-31 21:36:08 by CorinneC