Differences between revisions 18 and 19
Revision 18 as of 2009-02-02 19:37:17
Size: 6976
Comment:
Revision 19 as of 2009-09-20 23:38:32
Size: 6980
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 20: Line 20:
 * For `Frontend Errors`, throw `FrontendException` or a subclass of `FrontendException`. Catch the exception in main, log it, including the stack, to the client side log. Write failure message to stderr pointing to the log file. The front-end consists of multiple components - parser, type checker, optimizer, translators, etc. All the errors from these components can be categorized as front-end errors. Components that are part of the front end will throw specific exceptions that capture the context. For example, the parser throws a `ParseException`, the type checker will throw a `TypeCheckerException`, etc. A list of the exceptions thrown in the front-end are as follows. More information is available at [#funcspec error handling functional specification].  * For `Frontend Errors`, throw `FrontendException` or a subclass of `FrontendException`. Catch the exception in main, log it, including the stack, to the client side log. Write failure message to stderr pointing to the log file. The front-end consists of multiple components - parser, type checker, optimizer, translators, etc. All the errors from these components can be categorized as front-end errors. Components that are part of the front end will throw specific exceptions that capture the context. For example, the parser throws a `ParseException`, the type checker will throw a `TypeCheckerException`, etc. A list of the exceptions thrown in the front-end are as follows. More information is available at [[#funcspec|error handling functional specification]].
Line 47: Line 47:
 * The parameter `errCode` is the error code unique to the error message. The classification of error codes is documented in [#funcspec error handling functional specification]  * The parameter `errCode` is the error code unique to the error message. The classification of error codes is documented in [[#funcspec|error handling functional specification]]
Line 85: Line 85:
To use Pig with the Eclipse IDE, see ["Eclipse Environment"]. To use Pig with the Eclipse IDE, see [[Eclipse_Environment]].
Line 88: Line 88:
   1. [[Anchor(funcspec)]] Santhosh Srinivasan, "Pig Error Handling Functional Specification" January 23, 2009, http://wiki.apache.org/pig/PigErrorHandlingFunctionalSpecification    1. <<Anchor(funcspec)>> Santhosh Srinivasan, "Pig Error Handling Functional Specification" January 23, 2009, http://wiki.apache.org/pig/PigErrorHandlingFunctionalSpecification

Objective

This document provides development guidelines and other information for contributors to the Pig project.

Error Handling

There are several types of errors in Pig:

  • Internal Errors. This are internal problems with pig code that would be handled with asserts in languages like C/C++. An example would be a unchecked NULL pointer. The desired behavior is to notify the user of the failure and to log the stack trace to the client side log file.

  • Frontend Errors. This are the errors that occur on the Pig client. An example is for instance failure to connect to HOD or access metadata repository when we add support for that. The proper behavior in this case would be to retry a few times and then to the same handling as for Internal Errors. A special case of Frontend errors is User Errors

    • User Errors. This includes invalid syntax, working with non-existent data, referring to non-existent relationships, etc. The desired behavior should be to show a meaningful error message to the user and abort the processing. (This does not mean that in the interactive shell we exit though.)

  • Backend Errors. This are the errors that happened on the backend during the course of the program execution. An example would be failure to read a DFS file. The desired behavior in this case is to propagate the error from the backend to the frontend and then perform the processing similar to the internal error.

It is helpful to be able to separate different types of errors in our code. Here is the proposal on how to handle them:

As a general rule, follow the custom of the current source file when throwing new Exceptions. When in doubt, enquire with other developers on the mailing list.

  • For Internal Errors, throw RuntimeException or its derivation. Catch the exception in main, log it, including the stack, to the client side log. Write failure message to stderr pointing to the log file.

  • For Frontend Errors, throw FrontendException or a subclass of FrontendException. Catch the exception in main, log it, including the stack, to the client side log. Write failure message to stderr pointing to the log file. The front-end consists of multiple components - parser, type checker, optimizer, translators, etc. All the errors from these components can be categorized as front-end errors. Components that are part of the front end will throw specific exceptions that capture the context. For example, the parser throws a ParseException, the type checker will throw a TypeCheckerException, etc. A list of the exceptions thrown in the front-end are as follows. More information is available at error handling functional specification.

    1. FrontendException Generic front-end exception (subclass of PigException). Also used for indicating semantic errors and to wrap user errors from generated class ParseException

    2. JobCreationException Used for indicating errors during Map Reduce job creation (subclass of FrontendException)

    3. LogicalToPhysicalTranslatorException Used for indicating errors during logical plan to physical plan translation (subclass of VisitorException)

    4. MRCompilerException Used for indicating errors during map reduce plan compilation from physical plan (subclass of VisitorException)

    5. OptimizerException Used for indicating errors during logical plan optimization (subclass of FrontendException)

    6. PigException Generic exception in Pig (subclass of IOException and the superclass of all exceptions in Pig)

    7. PlanException Used for indicating errors during plan/graph operations (subclass of FrontendException)

    8. PlanValidationException Used for indicating errors during plan validation (subclass of VisitorException)

    9. SchemaMergeException Used for indicating errors during schema merges (subclass of FrontendException)

    10. TypeCheckerException Used for indicating errors due to type checking (subclass of VisitorException)

    11. VisitorException Generic exception used for indicating errors when visiting a plan (subclass of FrontendException)

    • For User Errors, throw ParseException that contains meaningful message. Wrap the ParseException in the generic FrontendException and rethrow it. For batch and interactive processing, catch the exception in Grunt and log the exception message to stderr. The same can be done with the developer in the embedded case. In debug mode, log the exception stack into the client side log.

  • For Backend Errors there will need to be backend specific way to get the error from the backend to the frontend. Once this is done, log the error into client side log file and throw ExecuteException. Catch the exception in main, log it, including the stack, to the client side log. Write failure message to stderr pointing to the log file.

Using the exception classes

The PigException and its subclasses have an extensive suite of constructors. The important constructors are noted below:

  1. PigException (String message, int errCode, byte errSrc)

  2. PigException (String message, int errCode, byte errSrc, Throwable cause)

  • The parameter message denotes the error message associated with the exception. The error message should be easy to read and minimize the use of technical jargon to the extent possible.

  • The parameter errCode is the error code unique to the error message. The classification of error codes is documented in error handling functional specification

  • The parameter errSrc indicates the source or nature of the error. The input sources are one of:

    1. PigException.BUG

    2. PigException.INPUT

    3. PigException.REMOTE_ENVIRONMENT

    4. PigException.USER_ENVIRONMENT

  • In cases where exceptions are wrapped or embedded, the fourth parameter cause will be a reference to the originating exception.

A couple of examples will illustrate the point.

   1 throw new PigException("Expected bytearray but received int", 1300, PigException.INPUT);
   2 
   3 //alternatively
   4 int errCode = 1300;
   5 String msg = "Expected bytearray but received int";
   6 throw new PigException(msg, errCode, PigException.INPUT);
   7 
   8 try {
   9 ...
  10 } catch (RemoteException re) {
  11    throw new PigException("Internal error. Caught some other exception", 5500, PigException.REMOTE_ENVIRONMENT, re);
  12 }
  13 
  14 //alternatively
  15 try {
  16 ...
  17 } catch (RemoteException re) {
  18    int errCode = 5500;
  19    String msg = "Internal error. Caught some other exception";
  20    throw new PigException(msg, errCode, PigException.REMOTE_ENVIRONMENT, re);
  21 }

Pig and Eclipse

To use Pig with the Eclipse IDE, see Eclipse_Environment.

References

  1. Santhosh Srinivasan, "Pig Error Handling Functional Specification" January 23, 2009, http://wiki.apache.org/pig/PigErrorHandlingFunctionalSpecification

PigDeveloperCookbook (last edited 2009-09-20 23:38:32 by localhost)