## Please edit system and help pages ONLY in the moinmaster wiki! For more ## information, please see MoinMaster:MoinPagesEditorGroup. ##master-page:FrontPage #format wiki #language en #pragma section-numbers off = Apache Pig Wiki = [[http://incubator.apache.org/pig/|Apache Pig]] is a platform for analyzing large data sets. Pig's language, Pig Latin, lets you specify a sequence of data transformations such as merging data sets, filtering them, and applying functions to records or groups of records. Pig comes with many built-in functions but you can also create your own user-defined functions to do special-purpose processing. Pig Latin programs run in a distributed fashion on a cluster (programs are complied into Map/Reduce jobs and executed using Hadoop). For quick prototyping, Pig Latin programs can also run in "local mode" without a cluster (all processing takes place in a single local JVM). Do you Pig? At Yahoo! 40% of all Hadoop jobs are run with Pig. Come join us! == News == '''Why Pig Latin instead of SQL?''' [[http://www.cs.cmu.edu/~olston/publications/sigmod08.pdf|Pig Latin: A Not-So-Foreign Language ...]] '''Pig Has Grown Up!'''. On 10/22/08 Pig graduated from the [[http://incubator.apache.org/|Incubator]] and joined [[http://hadoop.apache.org/|Apache Hadoop]] as a subproject. '''Pig is Getting Faster!''' 2-6 times faster, for many queries. We've created a set of benchmarks and run them against the pig 0.1.0 release (modified to run on hadoop 0.18) and against the current trunk (previously `types` branch.) Joins and order bys in particular made large performance gains. For complete details see PigMix. '''Interested in Pig Guts?''' We are completely redesigning the Pig execution and optimization framework. For design details see PigOptimizationWishList and PigExecutionModel. '''Want to contribute but don't know where to kick in?''' Here is a [[http://wiki.apache.org/pig/ProposedProjects|list of project]] that we desired. We need new blood! '''Pig available as part of Amazon's Elastic !MapReduce''', as of August 2009. == General Information == * [[http://hadoop.apache.org/pig/|Official Apache Pig Website]] * PigTalksPapers - Pig talks, papers, interviews == User Documentation == * [[http://hadoop.apache.org/pig/|User Documentation]] * [[http://www.cloudera.com/hadoop-training-pig-introduction|Online Pig Training]] - Complete with video lectures, exercises, and a pre-configured virtual machine. Developed by Cloudera and Yahoo! * PiggyBank - User-defined functions (UDFs) contributed by Pig users! == Developer Documentation == * How tos * HowToDocumentation * HowToContribute * HowToCommit * HowToRelease * PigDeveloperCookbook * Road map * ProposedRoadMap * Specification Proposals * PigTypesFunctionalSpec * PigTypesDesign * UserDefinedOrdering * PigAbstractionLayer * PigExecutionModel * PigStreamingFunctionalSpec * ParameterSubstitution * PigOptimizationWishList * NestedLogicalPlan (''draft version'') * PigErrorHandling * PigMultiQueryPerformanceSpecification * PigSkewedJoinSpec * PigAccumulatorSpec * PigSampler * Performance * PigPerformance (current performance numbers) == Related Resources == * [[http://hadoop.apache.org/core/|Hadoop Core]] * [[http://wiki.apache.org/hadoop/|Hadoop Wiki]] == Contrib Projects == * [[http://wiki.apache.org/pig/zebra|Zebra]]