FrontPage

Pig Wiki

[WWW] Pig is a dataflow programming environment for processing very large files. Pig's language is called Pig Latin. A Pig Latin program consists of a directed acyclic graph where each node represents an operation that transforms data. Operations are of two flavors: (1) relational-algebra style operations such as join, filter, project; (2) functional-programming style operators such as map, reduce.

Pig compiles these dataflow programs into (sequences of) map-reduce jobs and executes them using Hadoop. It is also possible to execute Pig Latin programs in a "local" mode (without Hadoop cluster), in which case all processing takes place in a single local JVM.

NEWS

We are undertaking a complete redesign of the execution and optimization framework, including:

  1. new (hopefully cleaner) operator representations at the various layers (logical, physical, map-reduce) to facilitate optimization along the lines of PigOptimizationWishList, and

  2. streamlining the execution pipeline.

See PigExecutionModel for the design details; implementation of this design is underway.

General Information

User Documentation

Developer Documentation

Related Resources

[WWW] Hadoop Wiki

last edited 2008-05-06 14:28:04 by PiSong