This document describes the design for the error handling feature in Pig. The design follows the requirements and functional specification. The error handling design is influenced by Mika Raento's excellent online resource.

Pig's architecture is designed to support several back-ends. Currently, the only supported back-end is Hadoop. The front-end is ideally back-end agnostic. For a breakdown of the front-end components refer to the functional specification.

Error Handling


PigException will serve as the base class for all the frontend exceptions. PigException will also be the exception thrown by Pig to external systems. Presently, the Pig APIs throw IOException. As a result, PigException will extend IOException in order to maintain backward compatibility. FrontendException will extend PigException and serve as the umbrella for all front-end exceptions. The task specific exceptions from the front-end components will subclass FrontendException to ensure clarity and enable extensions in the future.


PigException contains the following attributes and methods.


  1. retriable: A boolean variable to indicate if an exception is retriable or not
  2. errorSouce: An enum to represent the source of the error. The enum can be extended in the future to include more information like sub-component. The values for this enum type will be:
    1. User input
    2. Bug
    3. User environment
    4. Remote environment
  3. errorCode: An integer that represents the error. Used for documentation and automation
  4. detailedMsg: A string that holds detailed information that is pertinent to the Pig developer. It will contain details that are not required by the user


  1. retirable: Return true if the exception is retirable; false otherwise
  2. getErrorCode: Returns the error code associated with this exception
  3. getDetailedMessage: Returns the string detailedMsg

Methods of interest from IOException and its super classes

  1. getMessage: User facing message will be reported using getMessage()
  2. getCause: When exceptions are chained, the cause of each exception is retrieved using getCause()
  3. getStackTrace: Useful for determining the root cause and contains details required by the developer
  4. initCause: Used for chaining exceptions

As mentioned earlier, the source of the exception is classified into the four categories. Each exception should report the appropriate source based on the context. Absence of the source will be treated as a bug by default.


   1 java.lang.RuntimeException: Unexpected data type 74 found in stream.                                                   
   2         at
   3         at org.apache.pig.builtin.BinStorage.bytesToInteger(
   4         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(
   5         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(
   6         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(
   7         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(
   8         at
   9         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$
  10         at
  11         at
  12         at org.apache.hadoop.mapred.TaskTracker$Child.main(                              

Warning message aggregation


Hadoop provides the ability to aggregate counters for the entire application. The change in counter values has to be performed via the Hadoop reporter. A new interface, PigLogger will be used to abstract logging of warning messages. A back-end specific PigHadoopLogger will implement this interface and provide the functionality of warning message aggregation using Hadoop counters if the warning message aggregation is turned on. The EvalFunc class will include a new PigHadoopLogger attribute to allow UDF authors to log and aggregate warning messages. The key for the warning aggregation will be one of many pre-defined keys in Pig. If the warning message aggregation is turned off, the warning messages are sent to STDERR which will appear in Hadoop's STDERR logs.


   1 ...
   2 public abstract class EvalFunc<T>  {
   3     // UDFs must use this to report progress
   4     // if the exec is taking more that 300 ms
   5     protected PigProgressable reporter;
   7     // UDFs must use this to log and aggregate
   8     // warning messages
   9     protected PigHadoopLogger log;
  10 ...
  11 }


Currently, the type checker uses a collector to collect error and warning messages. The use of the collector has to be extended for each subsystem in the front-end.

Open questions

  1. ParseException is throw by the parser. Ensuring that ParseException is subclassed from FrontendException requires the generated file to be checked into the source repository. Instead, the ParseException is wrapped inside FrontendParseException and rethrown.

  2. Lexical errors in Grunt will result in a TokenMgrError, resulting in a possible exit from Grunt.

  3. Error messages reported by the Parser will not be overridden with custom error messages till we move to a bottom up parser.


  1. Santhosh Srinivasan, Pig Error Handling Requirements October 30, 2008,

  2. Santhosh Srinivasan, Pig Error Handling Functional Specification, December 8, 2008,

  3. Mika Raento, "What should Exceptions look like?" July 30, 2006,

PigErrorHandlingDesign (last edited 2009-09-20 23:38:19 by localhost)