Pig Wiki
Pig is a dataflow programming environment for processing very large files. Pig's language is called Pig Latin. A Pig Latin program consists of a directed acyclic graph where each node represents an operation that transforms data. Operations are of two flavors: (1) relational-algebra style operations such as join, filter, project; (2) functional-programming style operators such as map, reduce.
Pig compiles these dataflow programs into (sequences of) map-reduce jobs and executes them using Hadoop. It is also possible to execute Pig Latin programs in a "local" mode (without Hadoop cluster), in which case all processing takes place in a single local JVM.
NEWS
We are undertaking a complete redesign of the execution and optimization framework, including:
new (hopefully cleaner) operator representations at the various layers (logical, physical, map-reduce) to facilitate optimization along the lines of PigOptimizationWishList, and
streamlining the execution pipeline.
See PigExecutionModel for the design details; implementation of this design is underway.
General Information
PigOverview - an overview of Pig's capabilities
Pig talks:
Pig paper:
pdf An interview with one of Yahoo's most prominent Pig users, including his take on Pig Latin vs. SQL:
video
User Documentation
PigLatin - the language manual
PigFunctions - Guide for writing your own Pig functions
Grunt - shell manual
ExampleGenerator - Guide for using the ILLUSTRATE command to help debug scripts
Developer Documentation
How tos
Road map
Specification Proposals
PigPerformance - current performance numbers