Differences between revisions 2 and 3
Revision 2 as of 2009-04-11 19:19:46
Size: 6623
Comment: Capture useful suggestion from Mamta
Revision 3 as of 2009-09-20 22:11:50
Size: 6625
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 62: Line 62:
For example, if we only check for the join order of the "targeted" subquery X2, meaning we pass ["T2", "T1"] into the proposed method and ignore X1 altogether, then the test would IN-correctly PASS for the following query plan: For example, if we only check for the join order of the "targeted" subquery X2, meaning we pass [[T2", "T1]] into the proposed method and ignore X1 altogether, then the test would IN-correctly PASS for the following query plan:
Line 89: Line 89:
Sometimes, it can be convenient to control the join order. For example, when debugging a problem, it can be useful to force the optimizer to consider only a particular join order. In [https://issues.apache.org/jira/browse/DERBY-3926 DERBY-3926] such a case arose, and Mamta noted that it was useful to control the Optimizer's behavior using query overrides: Sometimes, it can be convenient to control the join order. For example, when debugging a problem, it can be useful to force the optimizer to consider only a particular join order. In [[https://issues.apache.org/jira/browse/DERBY-3926|DERBY-3926]] such a case arose, and Mamta noted that it was useful to control the Optimizer's behavior using query overrides:

Understanding Join Order displays in the Statement Execution Plan

One of the results of running the query optimizer is the determination of the join order. For a particular join there is a "left" table and a "right" table, and overall there is an ordering of the various intermediate joins into an overall join order.

Looking at the output from the statement execution plan one might ask :

  • what is the overall join order?
  • and, for a particular join, which table is the left table and which is the right table?

As a general rule, when looking at a given query plan the join order is reflected by the order in which you see the table names as you scroll from the top of the plan downward.

In the plan "T1" appears first (i.e. closer to the beginning of the plan), then "T3" appears afterward, so that's a general indication that the join order is "T1", "T3".

Note that the query plan shows the underlying base table names, not correlation names or view names or other ways of aliasing base tables, so with respect to the top-level query in question, i.e. to:

  select x1.j, x2.b from
    (select distinct i,j from t1) x1,
    (select  distinct a,b from t3) x2
  where x1.i = x2.a
  order by x1.j, x2.b

the join order is _technically_ { X1, X2 }. But the query plan only shows base table access, so we have to look to see what tables X1 and X2 access. In this case X1 accesses T1 while X2 accesses T3, so when we scan the plan and see { T1, T3 }, that effectively implies a join order of { X1, X2 }.

On a more general level, the "scan downward" approach to finding the join order works because the query plan is written in terms of "left" and "right" result sets, as Knut Anders mentioned. If I'm joining three tables T1, T2, T3 and the join order chosen by the optimizer is {T2, T3, T1} the final query tree will look something like:

       JOIN_0
        /  \
    JOIN_1  T1
      /  \
     T2   T3

Notice how each join node has a "left" and a "right" child. The query plan is generated in depth-first traversal order (starting with the root), so the query plan for the above tree would look something like:

JoinResultSet_0:
+++ LeftResultSet:
++++++ JoinResultSet_1:
+++++++++ LeftResultSet:  T2
+++++++++ RightResultSet: T3
+++ RightResultSet:       T1

From this we can see that the order in which the tables appear in the query plan (reading top to bottom) will match the order that comes from reading the leaf nodes of the join tree left-to-right, and that in turn reflects the join order chosen by the optimizer.

Extracting join order information automatically from the RUNTIMESTATISTICS output

Kathey proposed enhancing the RuntimeStatisticsParser to automatically determine join order by searching for the specified strings in order and return true if they are there in the order they are in the array.

Army observed that when using this to write automated tests, the test author must be careful to ensure that the argument array passed to the new function includes *ALL* tables in the query, not just a subset. If the query is of the form:

  select ... from
    (select ... from t2, t1, t3 where ...) X1
    (select ... from t1, t2 where ...) X2
  where ...

Assume a test wants to verify that the tables in subquery X2 have a join order of { T2, T1 }, but doesn't really care about the join order of the subquery in X1, nor does it care about the order of X1 w.r.t. X2. You'd *still* have to make sure that the array passed into the ordered search method includes the join order for X1, as well, otherwise the test might incorrectly pass.

For example, if we only check for the join order of the "targeted" subquery X2, meaning we pass T2", "T1 into the proposed method and ignore X1 altogether, then the test would IN-correctly PASS for the following query plan:

  ProjectRestrict:
  +++ JoinNode_0:
  ++++++ LeftResultSet:                  <== This corresponds to X1
  +++++++++ JoinNode_1:
  ++++++++++++ LeftResultSet:
  +++++++++++++++ JoinNode_2:
  ++++++++++++++++++ LeftResultSet: T3
  ++++++++++++++++++ RightResultSet: T2
  ++++++++++++ RightResultSet: T1
  ++++++ RightResultSet:                 <== This corresponds to X2
  +++++++++ JoinNode_3:
  ++++++++++++ LeftResultSet: T1
  ++++++++++++ RightResultSet: T2

If you just search for "T1" followed by "T2", the test will pass because the join order for X1 matches--but that's wrong because it's really X2 that we wanted to check.

If instead of {"T1", "T2"} you pass in {"T3", "T2", "T1", "T2", "T1"}--i.e. include *ALL* tables in the query, even the ones that aren't necessarily targeted--then I think you'd get the desired behavior. The downside to this is that the test will fail if a join order about which we "don't care" changes (ex. the join order for X1 in this case). But that's how things work today with the canon-based test, as well, so even if it's not ideal, at least it wouldn't really be any worse...

To get the ideal behavior (where the test fails if and only if the "targeted" subquery's join order is not what is expected) with the proposed orderedSearchStrings() approach, one would have to ensure that the table names used in the targeted subquery do not appear anywhere else in the query. My guess is that you would have to rewrite a good number of tests to guarantee that, which would probably be non-trivial.

This page was compiled with information posted to the derby-dev mailing list by Manjula, Kathey, Army and Knut.

Controlling Join Order in a test situation

Sometimes, it can be convenient to control the join order. For example, when debugging a problem, it can be useful to force the optimizer to consider only a particular join order. In DERBY-3926 such a case arose, and Mamta noted that it was useful to control the Optimizer's behavior using query overrides:

SELECT table1.id, m0.value, m1.value FROM --DERBY-PROPERTIES joinOrder=FIXED
table2 m1 -- DERBY-PROPERTIES index=key3
, table2 m0 -- DERBY-PROPERTIES index=key3
, table1
WHERE table1.id=m0.id AND m0.name='PageSequenceId' AND table1.id=m1.id AND m1.name='PostComponentId' AND m1.value='21857' ORDER BY m0.value;

By controlling the join order, you may be able to get a simple repro case (ie you don't have to go through countless iteration of optimizer for all different join orders and different predicate pulling down in different join orders), which may make it easier to focus on the problem join order.

QueryPlanJoinOrder (last edited 2009-09-20 22:11:50 by localhost)