In the Derby "cost-based optimization" code you will find that the code often manipulates information about tables by using "table numbers". Table numbers are integers which stand in as surrogates for particular tables during the optimization process.

Note that by the time we start optimization, constructs like views and synonyms have already been transformed and replaced by their underlying "real" tables. Transformations and table resolution occur during the "binding" and "preprocessing" stages of query compilation--and both of those stages occur before cost-based optimization begins. So at this point a view will be represented by a ProjectRestrictNode whose child is a SelectNode, and a synonym will be represented by whatever FromTable it (the synonym) is actually referring to.

Table numbers are also assigned during binding, so by the time we get to the code involved with costing and plan selection, all FromTables (aka "Optimizables") in the entire query will have an assigned table number (if required--in some cases it's not necessary and thus will be -1). Additionally any column reference which points to one of those FromTables will have the table number for that FromTable stored locally (namely, in ColumnReference.tableNumber).

Note that when a ColumnReference is "remapped" to point to a different FromTable, its local information--including tableNumber--is updated accordingly. Note also that a "FromTable" is not required to be a base table--rather, anything that can be specified in the FROM list of a SELECT query will be represented by some instance of FromTable, whether it be a subquery, a base table, a union node, etc. Every FromTable has its own "table number", with the exception of ProjectRestrictNodes. For a PRN, if the PRN's child is itself a FromTable (as opposed to, say, a SelectNode) then the PRN's table number will be -1 and any attempts to "get" the PRN's table number will return the table number of the PRN's child. If the PRN's child is not a FromTable, then the PRN will have it's own table number.

The thing to note here is that "table number" is strictly a language-created, compilation time value to allow binding, preprocessing, optimization, and code generation to distinguish between the various FromTables in the original query. A table number is not stored on disk and it is independent of the access path decisions (including whether or not an index is used) made by the optimizer. Furthermore, there is no link between a given table number and the actual on-disk table that it points to. Table number 0 could be for T1 in one query, T2 in another query, and T100 in a third query.

As a simple (but admittedly meaningless) example, take the following query:

At bind time Derby will assign every item in the FROM list a table number. So in this case, "T1" gets table number 0 and "T1 X1" gets table number 1. The fact that both FromTables are really pointing to the same base table doesn't matter. For the duration of compilation/optimization, they are represented by two different instances of FromTable and are considered two different "tables", each having its own table number. (For the record, in this particular example the different FromTables will in fact point to the same underlying tableDescriptor field).

Given that, the predicate t1.i = x1.j will have a left ColumnReference pointing to a FromBaseTable representing T1 with table number "0" and a right ColumnReference pointing to a different FromBaseTable representing X1 (i.e. T1 again) with table number "1".

If the optimizer then decides to use an index for T1, the table number doesn't change--the optimizer just decides that for "the FromBaseTable whose table number is 0 we will use an index". In fact, once assigned, the table number for a specific FromTable remains the same for the duration of the compilation of the statement.

The only time a FromTable's table number can be -1 during optimization is if it is in fact a ProjectRestrictNode with a non-FromTable subquery as its child.

There are a number of classes that extend FromTable. For many of these classes, table numbers are always set during binding (and thus will not be -1 when it comes time to optimize).

In some cases, classes which extend FromTable are only instantiated AFTER optimization has completed, and thus even though their table numbers can be -1, that won't affect optimization. Two classes for which this is true are IndexToBaseRowNode and SingleChildResultSetNode.

There are two subclasses of FromTable that can be instantiated during preprocessing but that may not have their table numbers set (these are the ones of interest to the current discussion):

For details on how table numbers are used in the various phases of Derby optimization, interested readers may want to read the ReferencedTableMaps and/or PredicatePushdown pages of this wiki.

OptimizerTableNumbers (last edited 2009-09-20 22:11:51 by localhost)