Apache Pig Wiki
Note: Pig is switching to use new wiki system. We will keep the old wiki running, but new material should go to new wiki!!!
Apache Pig is a platform for analyzing large data sets. Pig's language, Pig Latin, lets you specify a sequence of data transformations such as merging data sets, filtering them, and applying functions to records or groups of records. Pig comes with many built-in functions but you can also create your own user-defined functions to do special-purpose processing.
Pig Latin programs run in a distributed fashion on a cluster (programs are complied into Map/Reduce jobs and executed using Hadoop). For quick prototyping, Pig Latin programs can also run in "local mode" without a cluster (all processing takes place in a single local JVM).
Do you Pig? At Yahoo! 40% of all Hadoop jobs are run with Pig. Come join us!
Why Pig Latin instead of SQL? Pig Latin: A Not-So-Foreign Language ...
Want to contribute but don't know where to kick in? Here is our journal of projects we have worked on, are working on, and hope to work on. Find a project that interests you and jump on in.
Pig available as part of Amazon's Elastic MapReduce, as of August 2009.
PigTalksPapers - Pig talks, papers, interviews
PoweredBy - a (partial) list of companies using Pig
Online Pig Training - Complete with video lectures, exercises, and a pre-configured virtual machine. Developed by Cloudera and Yahoo!
PiggyBank - User-defined functions (UDFs) contributed by Pig users!
PigTools - Tools Pig users have built around and on top of Pig.
PigInteroperability - How to make Pig work with other platforms you may be using, such as HBase and Cassandra.
Penny - A distributed debugging framework for Pig.
- How tos
- Road map
- Specification Proposals
NestedLogicalPlan (draft version)
PigMix (current benchmark results)
PigIllustrate (revival proposal)
PigErrorHandlingInScripts (discussion of proposal)