This web page describes how to use the optional optimizerTracing and optimizerTracingViews tools to produce and view an xml-formatted trace of the optimizer's analysis of a query. This page starts out with a short example of how to use the tools. A more detailed explanation of the tools follows the example.
These tools were introduced by work on DERBY-6211.
You load the optimizerTracing tool as follows:
call syscs_util.syscs_register_tool( 'optimizerTracing', true, 'xml' );
Then you issue the queries which you want to trace. When you are done tracing queries, you unload the tool as follows...
call syscs_util.syscs_register_tool( 'optimizerTracing', false, 'myTrace.xml' );
...where myTrace.xml is the name of the file which will hold the trace output. You can then browse the trace file with your favorite tool. I like using the Firefox browser because it lets me expand and collapse xml elements in order to highlight important information and suppress noise.
At this point, you can load the optimizerTracingViews tool, which lets you search and filter the trace output using SQL. The tool installs the planCost view (backed by a table function). By querying this view, you can display the plan shapes which the optimizer considered and the costs which the optimizer calculated for those plans. Here's how you load this tool...
call syscs_util.syscs_register_tool( 'optimizerTracingViews', true, 'myTrace.xml' );
...where myTrace.xml is the name of the trace output file dumped by the optimizerTracing tool. A sample query against planCost might look like this:
select distinct summary, estimatedCost from planCost where complete and qbID = 1 order by summary;
When you are done querying plan shapes and costs, you unload the planCost view as follows:
call syscs_util.syscs_register_tool( 'optimizerTracingViews', false );
The rest of this web page explains these tools in greater detail.
Before describing the tools, it's useful to understand a couple concepts which are important to the Derby optimizer:
Statement - The optimizer performs a separate analysis for each statement. Once a statement has been optimized, its compiled plan is stored in the Derby statement cache. The next time you run the statement, compilation will be short-circuited and the already-compiled plan will simply be read from the statement cache. Short-circuiting applies to optimization too. If the statement has already been compiled, then optimization will be skipped. You can force recompilation by adding a vacuous space somewhere in the statement text.
Query Block - The optimizer further divides each statement into one or more query blocks and then optimizes each query block separately. Query blocks are created in the following situations:
- Each branch of a UNION is treated as a separate query block.
- Materialized views are treated as separate query blocks.
- Materialized subqueries are treated as separate query blocks.
- Outer joins which can't be transformed into inner joins are treated as separate query blocks.
Optimizable - For each query block, the optimizer analyzes the query block's row sources, called optimizables. These are the tables, table functions, and nested query blocks which must be joined.
Join Order - For each query block, the optimizer considers all possible left-to-right orders of the optimizables. A left-to-right join order gives rise to a corresponding left-deep join tree. Join orders start out as partial, incomplete lists of optimizables and are built up incrementally. If a partial join order survives short-circuiting (described below), the optimizer adds another optimizable to the right end of the partial join order and continues analysis.
Slot - Each position in the join order is called a slot.
Conglomerate - A table is stored on disk as a number of files, also called conglomerates. Each table has a heap conglomerate, which is just the raw, unordered set of rows. There is a separate, btree conglomerate for every index on the table. There is also a separate, btree conglomerate for every primary, unique, and foreign key on the table.
Join Strategy - A join strategy indicates how an optimizable joins to the optimizables to its left in the join order. Derby currently supports two join strategies: NestedLoop and HashJoin. The NestedLoop strategy means that the whole optimizable is read for every composite row which percolates up out of the slots to the left. The HashJoin strategy means that the whole optimizable is read once and cached in a temporary table (hopefully in memory) indexed by some useful key; as each composite row percolates up out of the joined slots to the left, a key is built from the left row and used to probe into the temporary table on the right.
Decoration - For each slot in a join order, the optimizer considers every possible combination of conglomerate and join strategy. Conglomerates are only relevant for optimizables which are tables. Join strategies are only relevant for the right slots of the join order; the leftmost slot does not have a meaningful join strategy. A ( conglomerate, join strategy ) pair is called a decoration.
Short-circuiting - Most join orders are never completely evaluated. The cost of a partial join order is always less than the cost of a more complete join order. Once the optimizer finds the cheapest set of decorations for a partial join order, the optimizer compares that cost to the the cheapest complete plan found so far. The optimizer abandons the whole join order if the cheapest, partial decorated join order costs more than the cheapest complete join order found so far.
The optimizerTracing tool produces xml output which includes the following important elements:
optimizerTrace - This is the outermost (top) element.
statement - The top element's children are all statement elements. Each of these represents a single SQL statement.
queryBlock - The statement element's children are its query blocks. Each query block is represented by its own queryBlock element.
joinOrder - A queryBlock element has several kinds of child elements. The most important are the joinOrder elements. Each partial or complete join order considered by the optimizer is represented by a separate joinOrder element.
decoration - Under the joinOrder element are a series of decoration child elements, one for each decoration considered for a slot in the join order.
planCost - For every partial or complete join order which it considers, the optimizer calculates a cost. This is represented as a planCost element, a child of the joinOrder element. For every query block, the optimizer remembers the cheapest plan found for it. This is represented as a planCost child of the queryBlock. So planCost elements appear as children of both joinOrder and queryBlock elements.
pcSummary - Every partial or complete plan can be viewed as a human readable summary. These are represented as pcSummary elements. They are children of the corresponding planCost elements.
A plan summary (pcSummary element) looks like this:
( ( "SYS"."SYSSCHEMAS_HEAP" * "SYS"."SYSTABLES_INDEX1" ) * "SYS"."SYSCOLUMNS_INDEX1" )
It shows how the optimizables join to each other, including the details of the chosen decorations. Tables appear as schema-qualified conglomerate names. The * operator means NestedLoop and the # operator means HashJoin. Parentheses indicate the left-deep nesting of the plan.
The planCost View
The optimizerTracingViews tool creates a planCost view on the optimizer trace file. The rows in the view represent information from the planCost elements in the trace file. That view has the following shape:
text varchar( 32672 ) - The SQL text of the statement being traced.
stmtID int - An id for the statement. Statements are numbered starting at 1.
qbID int - An id for a query block inside a statement. Query blocks are numbered starting at 1.
complete boolean - True if the planCost element corresponds to a complete join order, that is, one representing all of the optimizables in the query block.
summary varchar( 32672 ) - The compact pcSummary text described above.
type varchar( 50 ) - The type of plan being considered. Costs are calculated for various types of plans, including "withoutSortAvoidance" and "withSortAvoidance". For each query block, a "bestPlan" is marked.
estimatedCost double - The cost which the optimizer calculated for the plan. Higher numbers represent more expensive (less desirable) plans.
estimatedRowCount bigint - The number of rows which the optimizer thinks will be returned by the partial or complete plan.
Here is a sample script showing how to use these tools:
connect 'jdbc:derby:trunk/generated/toursdb/toursdb'; call syscs_util.syscs_register_tool( 'optimizerTracing', true, 'xml' ); select * from cities, countries where cities.country_iso_code = countries.country_iso_code and 1=2 order by city_name; select * from sys.sysschemas s, sys.systables t, sys.syscolumns c where s.schemaid = t.schemaid and t.tableid = c.referenceid and 1=2 order by s.schemaname, t.tablename, c.columnname; call syscs_util.syscs_register_tool( 'optimizerTracing', false, 'myTrace.xml' ); call syscs_util.syscs_register_tool( 'optimizerTracingViews', true, 'myTrace.xml' ); select distinct summary, estimatedCost from planCost where complete and qbID = 1 order by summary; call syscs_util.syscs_register_tool( 'optimizerTracingViews', false );