Derby Bug 47 (
http://issues.apache.org/jira/browse/DERBY-47), "Some possible improvements to IN optimization", draws attention to a problem with IN lists that contain parameter markers, though there do appear to be problems with IN lists that contain literals that are not closely grouped in the range of values for the column being filtered. Derby Bug 713 (
http://issues.apache.org/jira/browse/DERBY-713), "Query optimizer should not make poor choices when optimizing IN and WHERE clauses", focuses on the same issue.
I attached a program to Derby Bug 47, Derby47PerformanceTest.java, that evaluates a number of different SELECT statements that achieve the same result as evaluating:
SELECT * FROM someTable WHERE aNonUniqueColumn IN (?, ?, ..., ?)
These alternatives are:
|
Strategy summaries |
|
|
Literals |
1 query, using WHERE c IN ('literal[1]', ..., 'literal[N]') |
|
Literal |
N queries, using WHERE c = 'literal[i]' |
|
Markers |
1 query, using WHERE c IN (?, ..., ?) |
|
Marker |
N queries, using WHERE c = ? |
|
Join Temp |
1 query, store parameters in a temp table, use join query, then delete parameters |
|
Join Scratch |
1 query, store parameters in a table, use join query, then delete parameters |
|
Join Savepoint |
1 query, set savepoint, store parameters in a table, use join query, then rollback savepoint |
|
Nested Temp |
1 query, store parameters in a temp table, use nested query, then delete parameters |
|
Nested Scratch |
1 query, store parameters in a table, use nested query, then delete parameters |
|
Nested Savepoint |
1 query, set savepoint, store parameters in a table, use nested query, then rollback savepoint |
Attached to this page are two optimizer traces:
literals-10-trace.txt for 10 parameters using the Literals strategy.
markers-10-trace.txt for 10 parameters using the Markers strategy.
What is most interesting is how the cost estimate for Markers is so wrong compared to the actual time taken to execute the queries:
|
Strategy |
cost |
rowCount |
Total ms (avg) |
|
Literals |
110472 |
7500 |
1021 |
|
Markers |
14 |
3.4 |
3075 |
The predicate in the Markers query is transformed to effectively be:
WHERE (column >= lowestINListValue AND column <= greatestINListValue AND column IN (?, ..., ?)
This immediately implies that we're examining all values of the column between lowestINListValue (known as the start key) and greatestINListValue (the stop key), even though there is no reason to believe these values are closely clustered. Furthermore, it implies that we aren't doing N probes of the BTree index (i.e. one for each entry in the IN list), but rather that we're scanning a potentially large range of the BTree, applying the IN predicate to each row in the selected range (i.e. if we've got 4 entries in the IN list, and 10,000 index rows to examine, we'd need to perform up to 40,000 comparisons, instead of performing 4 probes of the BTree).
Debugging through the execution of the Markers query, I was very surprised to discover that the situation is worse than the prior paragraph states: After fetching an index row in the range [start key, stop key], Derby fetches the base table row before executing the predicate to filter out unneeded base table rows. This means that in the example above, we load 10,000 base table rows, when we only expect to get 4 * some small number of rows returned. Sigh.
For the case evaluated in the test program, it would be better to treat the IN list rather like the outer table of a nested loop join (as the results below demonstrate), where each parameter in the IN list is used to "probe" the index.
Here are the timings for all of the strategies for different numbers of parameters (the ID Count column below), on a table of 100,000 rows, using embedded Derby. There were 29689 unique values for the column used in the where clause, so on average each value was in a bit more than 3 rows.
|
|
Literals |
Literal |
Markers |
Marker |
Join Temp |
Join Scratch |
Join Savepoint |
Nested Temp |
Nested Scratch |
Nested Savepoint |
||||||||||
|
ID Count |
Total ms |
ms / ID |
Total ms |
ms / ID |
Total ms |
ms / ID |
Total ms |
ms / ID |
Total ms |
ms / ID |
Total ms |
ms / ID |
Total ms |
ms / ID |
Total ms |
ms / ID |
Total ms |
ms / ID |
Total ms |
ms / ID |
|
1 |
40 |
40 |
50 |
50 |
10 |
10 |
0 |
0 |
70 |
70 |
10 |
10 |
0 |
0 |
1352 |
1352 |
1262 |
1262 |
1142 |
1142 |
|
2 |
646 |
323 |
60 |
30 |
390 |
195 |
0 |
0 |
60 |
30 |
20 |
10 |
10 |
5 |
1262 |
631 |
1152 |
576 |
1112 |
556 |
|
3 |
1041 |
347 |
120 |
40 |
1952 |
651 |
10 |
3 |
70 |
23 |
10 |
3 |
10 |
3 |
962 |
321 |
901 |
300 |
891 |
297 |
|
4 |
882 |
221 |
50 |
13 |
2964 |
741 |
0 |
0 |
160 |
40 |
0 |
0 |
10 |
3 |
926 |
232 |
901 |
225 |
1082 |
271 |
|
5 |
941 |
188 |
70 |
14 |
3285 |
657 |
0 |
0 |
60 |
12 |
10 |
2 |
10 |
2 |
926 |
185 |
911 |
182 |
896 |
179 |
|
6 |
821 |
137 |
111 |
19 |
2483 |
414 |
10 |
2 |
110 |
18 |
10 |
2 |
10 |
2 |
911 |
152 |
891 |
149 |
871 |
145 |
|
7 |
986 |
141 |
150 |
21 |
3225 |
461 |
10 |
1 |
30 |
4 |
20 |
3 |
10 |
1 |
942 |
135 |
876 |
125 |
881 |
126 |
|
8 |
941 |
118 |
160 |
20 |
2664 |
333 |
0 |
0 |
40 |
5 |
10 |
1 |
10 |
1 |
991 |
124 |
891 |
111 |
896 |
112 |
|
9 |
931 |
103 |
121 |
13 |
3630 |
403 |
0 |
0 |
20 |
2 |
10 |
1 |
10 |
1 |
906 |
101 |
891 |
99 |
931 |
103 |
|
10 |
1021 |
102 |
100 |
10 |
3075 |
308 |
10 |
1 |
30 |
3 |
10 |
1 |
20 |
2 |
896 |
90 |
982 |
98 |
886 |
89 |
|
20 |
1051 |
53 |
150 |
8 |
3765 |
188 |
10 |
1 |
40 |
2 |
20 |
1 |
20 |
1 |
911 |
46 |
906 |
45 |
891 |
45 |
|
30 |
1041 |
35 |
251 |
8 |
3896 |
130 |
20 |
1 |
40 |
1 |
40 |
1 |
50 |
2 |
931 |
31 |
906 |
30 |
901 |
30 |
|
40 |
1082 |
27 |
310 |
8 |
4687 |
117 |
20 |
1 |
40 |
1 |
30 |
1 |
60 |
2 |
922 |
23 |
921 |
23 |
921 |
23 |
|
50 |
1051 |
21 |
380 |
8 |
4566 |
91 |
20 |
0 |
40 |
1 |
40 |
1 |
50 |
1 |
942 |
19 |
912 |
18 |
901 |
18 |
|
60 |
1036 |
17 |
450 |
8 |
4587 |
76 |
20 |
0 |
50 |
1 |
50 |
1 |
50 |
1 |
921 |
15 |
1061 |
18 |
912 |
15 |
|
70 |
1076 |
15 |
551 |
8 |
4977 |
71 |
30 |
0 |
70 |
1 |
50 |
1 |
50 |
1 |
921 |
13 |
922 |
13 |
921 |
13 |
|
80 |
1082 |
14 |
551 |
7 |
5397 |
67 |
40 |
1 |
40 |
1 |
50 |
1 |
80 |
1 |
931 |
12 |
951 |
12 |
1142 |
14 |
|
90 |
1111 |
12 |
596 |
7 |
5458 |
61 |
30 |
0 |
60 |
1 |
50 |
1 |
100 |
1 |
921 |
10 |
921 |
10 |
936 |
10 |
|
100 |
1086 |
11 |
641 |
6 |
5733 |
57 |
40 |
0 |
60 |
1 |
60 |
1 |
70 |
1 |
916 |
9 |
992 |
10 |
961 |
10 |
|
150 |
1122 |
7 |
951 |
6 |
6935 |
46 |
60 |
0 |
60 |
0 |
90 |
1 |
100 |
1 |
936 |
6 |
966 |
6 |
966 |
6 |
|
200 |
1131 |
6 |
1652 |
8 |
7981 |
40 |
70 |
0 |
80 |
0 |
171 |
1 |
140 |
1 |
951 |
5 |
981 |
5 |
991 |
5 |
|
250 |
1171 |
5 |
1973 |
8 |
9184 |
37 |
70 |
0 |
110 |
0 |
140 |
1 |
190 |
1 |
1012 |
4 |
991 |
4 |
1006 |
4 |
|
300 |
1242 |
4 |
2323 |
8 |
11006 |
37 |
100 |
0 |
121 |
0 |
190 |
1 |
260 |
1 |
981 |
3 |
1106 |
4 |
1051 |
4 |
|
350 |
1181 |
3 |
2774 |
8 |
12328 |
35 |
110 |
0 |
130 |
0 |
195 |
1 |
241 |
1 |
1102 |
3 |
1292 |
4 |
1071 |
3 |
|
400 |
1202 |
3 |
2929 |
7 |
13109 |
33 |
150 |
0 |
150 |
0 |
260 |
1 |
311 |
1 |
966 |
2 |
1262 |
3 |
1096 |
3 |
|
450 |
1247 |
3 |
3249 |
7 |
14772 |
33 |
151 |
0 |
160 |
0 |
971 |
2 |
341 |
1 |
966 |
2 |
1077 |
2 |
1111 |
2 |
|
500 |
1306 |
3 |
3596 |
7 |
16093 |
32 |
160 |
0 |
190 |
0 |
320 |
1 |
341 |
1 |
986 |
2 |
1152 |
2 |
1116 |
2 |
|
750 |
1432 |
2 |
5758 |
8 |
23054 |
31 |
235 |
0 |
255 |
0 |
471 |
1 |
621 |
1 |
1056 |
1 |
1256 |
2 |
1272 |
2 |
|
1000 |
1782 |
2 |
7821 |
8 |
31466 |
31 |
301 |
0 |
351 |
0 |
571 |
1 |
666 |
1 |
1086 |
1 |
1297 |
1 |
1432 |
1 |
|
1250 |
1993 |
2 |
9514 |
8 |
41631 |
33 |
411 |
0 |
425 |
0 |
1021 |
1 |
1062 |
1 |
1136 |
1 |
1392 |
1 |
1602 |
1 |
|
1500 |
2484 |
2 |
11216 |
7 |
54029 |
36 |
456 |
0 |
500 |
0 |
1091 |
1 |
1192 |
1 |
1161 |
1 |
1763 |
1 |
2383 |
2 |
|
1750 |
2859 |
2 |
13389 |
8 |
70354 |
40 |
520 |
0 |
596 |
0 |
1172 |
1 |
1402 |
1 |
1572 |
1 |
1993 |
1 |
2143 |
1 |
|
2000 |
3555 |
2 |
15172 |
8 |
83072 |
42 |
581 |
0 |
691 |
0 |
1312 |
1 |
1292 |
1 |
1247 |
1 |
1933 |
1 |
2383 |
1 |
|
2250 |
4356 |
2 |
17095 |
8 |
107638 |
48 |
701 |
0 |
761 |
0 |
1322 |
1 |
2063 |
1 |
1291 |
1 |
2403 |
1 |
2433 |
1 |
|
2500 |
5468 |
2 |
18777 |
8 |
123211 |
49 |
732 |
0 |
842 |
0 |
1933 |
1 |
2043 |
1 |
1362 |
1 |
2334 |
1 |
2734 |
1 |