As part of the OLAPOperations effort, we'd like to add support for null ordering. Jira issue DERBY-2887 tracks this effort.

Rationale

In earlier versions of the SQL standard, the ordering of NULL values was implementation-defined.

The SQL 2003 standard allows the user to explicitly specify how to sort NULL values, by adding the new NULLS FIRST or NULLS LAST specification in the ORDER BY clause.

In Derby versions 10.0 through 10.3, NULL values always sort high, and thus are always ordered after all non-NULL values in ascending order and before all non-NULL values in descending order. In the language of the standard, NULLS LAST is the default for ascending order, and NULLS FIRST is the default for descending order.

Syntax

In the SQL 2003 standard, the <sort specification list> has the following grammar:

<sort specification list> ::=
    <sort specification> [ { <comma> <sort specification> }... ]

<sort specification> ::=
    <sort key> [ <ordering specification> ] [ <null ordering> ]

<sort key> ::= <value expression>

<ordering specification> ::=
    ASC
  | DESC

<null ordering> ::=
    NULLS FIRST
  | NULLS LAST

The new element here is the <null ordering> specification, which is optional.

If the <null ordering> is not specified, then an implementation-defined <null ordering> is used. For Derby, this will continue to be NULLS LAST if the sort is ASCending, and NULLS FIRST if the sort is DESCending.

The tokens FIRST and LAST are already reserved keywords in the Derby SQL grammar; the new token NULLS needs to be added as a new non-reserved keyword.

FIRST and LAST, not LOW and HIGH

Note that the standard carefully uses the terms FIRST and LAST, rather than LOW and HIGH. This is intentional, and highlights that the null ordering interacts with the ascending/descending nature of the sort. It may make it more clear to look at this truth table:

ORDER BY says

ASC

DESC

NULLS FIRST

less

greater

NULLS LAST

greater

less

where "less" means "NULL values shall be considered to be less than non-NULL values", and "greater" means "NULL values shall be considered to be greater than non-NULL values.

Interaction of NULLS FIRST/LAST and ASC/DESC

The results below indicate what I believe to be the intended behavior of NULLS FIRST when combined with ASC/DESC. Note that NULLS FIRST means "the NULL values should be the first ones in the result", regardless of whether the result is ordered ASC or DESC, and NULLS LAST behaves correspondingly.

ij> select * from t1;
C1         |C2
-----------------------
NULL       |NULL
1          |1
NULL       |NULL
2          |1
3          |1
10         |10

6 rows selected
ij> select * from t1 order by c1; -- since it was not specified, the implementation chooses NULLS LAST here
C1         |C2
-----------------------
1          |1
2          |1
3          |1
10         |10
NULL       |NULL
NULL       |NULL

6 rows selected
ij> select * from t1 order by c1 NULLS LAST;
C1         |C2
-----------------------
1          |1
2          |1
3          |1
10         |10
NULL       |NULL
NULL       |NULL

6 rows selected
ij> select * from t1 order by c1 nulls first;
C1         |C2
-----------------------
NULL       |NULL
NULL       |NULL
1          |1
2          |1
3          |1
10         |10

6 rows selected
ij> select * from t1 order by c1 desc; -- since it was not specified, the implementation chooses NULLS FIRST here
C1         |C2
-----------------------
NULL       |NULL
NULL       |NULL
10         |10
3          |1
2          |1
1          |1

6 rows selected
ij> select * from t1 order by c1 desc nulls last;
C1         |C2
-----------------------
10         |10
3          |1
2          |1
1          |1
NULL       |NULL
NULL       |NULL

6 rows selected
ij> select * from t1 order by c1 desc nulls first;
C1         |C2
-----------------------
NULL       |NULL
NULL       |NULL
10         |10
3          |1
2          |1
1          |1

6 rows selected

Null Ordering versus Ordered Nulls

Null Ordering and Ordered Nulls are similar, but not identical, concepts. Ordered Nulls terminology is used when we are discussing whether or not we want the Derby engine to use three-valued logic while performing comparison operations. When using three-valued logic for comparison operations, comparison operators always return the special value unknown if either of the two operands is NULL. Thus the Derby comparison operations can either operate in Ordered Nulls mode, or in three-valued logic mode. Null Ordering is only relevant in Ordered Nulls mode, and it further refines Ordered Nulls mode to describe whether the NULL values should be compared lower than non-NULL values or higher than non-NULL values.

Current Implementation

Currently, Derby always sorts NULL values as greater than non-NULL values.

This logic appears to be located in the various subclasses of org.apache.derby.iapi.types.DataType, such as NumberDataType, SQLDate, and SQLChar.

These classes all implement the compare(DataValueDescriptor other) method, and each class contains logic similar to this snippet from SQLDate:

/*
 * thisNull otherNull   return
 *  T       T           0   (this == other)
 *  F       T           -1  (this < other)
 *  T       F           1   (this > other)
 */
if (thisNull || otherNull)
{
    if (!thisNull)      // otherNull must be true
        return -1;
    if (!otherNull)     // thisNull must be true
        return 1;
    return 0;
}

(As an aside, I wonder if there once was a time when NULL values sorted low in Derby, rather than sorting HIGH. The reason I think this might have been true is that the JavaDoc for DataValueDescriptor.compare() is incorrect, and says that null will be treated as less than all other values, and also the code comment in NumberDataType.java is backward. Thirdly, the code in SQLBoolean.java and in XML.java seems backward, which might be an un-noticed bug because the boolean data type is not surfaced to users in current Derby, and the XML type is new and not much used yet?)

Proposed Changes

There are several parts to implementing the <null ordering> feature for Derby:

OLAPNullOrdering (last edited 2009-09-20 22:11:52 by localhost)