Pig Wiki
Pig is a dataflow programming environment for processing very large files. Pig's language is called Pig Latin. A Pig Latin program consists of a directed acyclic graph where each node represents an operation that transforms data. Operations are of two flavors: (1) relational-algebra style operations such as join, filter, project; (2) functional-programming style operators such as map, reduce.
Pig compiles these dataflow programs into (sequences of) map-reduce jobs and executes them using Hadoop. It is also possible to execute Pig Latin programs in a "local" mode (without Hadoop cluster), in which case all processing takes place in a single local JVM.
News
Do you Pig? Most of Yahoo does!
100s of uses!
1000s of jobs!
30% of Hadoop jobs are via Pig
New to Pig? Try our fresh off the press
Pig Tutorial!
Need Pig functions? Take a look at our brand new
Piggy Bank!
Interested in Pig Guts? We are completely redesigning the Pig execution and optimization framework. This work includes (1) creating new operator representations at the various layers (logical, physical, map-reduce) to facilitate optimization and (2) streamlining the execution pipeline. See the PigOptimizationWishList and PigExecutionModel for the design details. Implementation is already underway ...
General Information
PigOverview - An overview of Pig's capabilities
PigTalksPapers - Pig talks, papers, interviews
User Documentation
Getting Started
PigTutorial - Begin here ... everything is set up for you
BuildPig - How to build Pig
RunPig - How to run Pig
Pig System
Grunt - The shell manual
Pig Latin - The language manual
Pig Functions - Built-ins, Piggy Bank, write your own
Javadocs - Refer to the Javadocs for embedded Pig and UDFs
Pig FAQs - The answer to your question may be here
Developer Documentation
How tos
Road map
Specification Proposals
Performance
PigPerformance (current performance numbers)