Differences between revisions 8 and 9
Revision 8 as of 2006-10-08 17:09:04
Size: 7814
Editor: JamesSynge
Comment:
Revision 9 as of 2009-09-20 22:12:02
Size: 7814
Editor: localhost
Comment: converted to 1.6 markup
No differences found!

Derby Bug 47 (http://issues.apache.org/jira/browse/DERBY-47), "Some possible improvements to IN optimization", draws attention to a problem with IN lists that contain parameter markers, though there do appear to be problems with IN lists that contain literals that are not closely grouped in the range of values for the column being filtered. Derby Bug 713 (http://issues.apache.org/jira/browse/DERBY-713), "Query optimizer should not make poor choices when optimizing IN and WHERE clauses", focuses on the same issue.

I attached a program to Derby Bug 47, Derby47PerformanceTest.java, that evaluates a number of different SELECT statements that achieve the same result as evaluating:

 SELECT * FROM someTable WHERE aNonUniqueColumn IN (?, ?, ..., ?) 

These alternatives are:

Strategy summaries

Literals

1 query, using  WHERE c IN ('literal[1]', ..., 'literal[N]') 

Literal

N queries, using  WHERE c = 'literal[i]' 

Markers

1 query, using  WHERE c IN (?, ..., ?) 

Marker

N queries, using  WHERE c = ? 

Join Temp

1 query, store parameters in a temp table, use join query, then delete parameters

Join Scratch

1 query, store parameters in a table, use join query, then delete parameters

Join Savepoint

1 query, set savepoint, store parameters in a table, use join query, then rollback savepoint

Nested Temp

1 query, store parameters in a temp table, use nested query, then delete parameters

Nested Scratch

1 query, store parameters in a table, use nested query, then delete parameters

Nested Savepoint

1 query, set savepoint, store parameters in a table, use nested query, then rollback savepoint

Attached to this page are two optimizer traces:

literals-10-trace.txt for 10 parameters using the Literals strategy.

markers-10-trace.txt for 10 parameters using the Markers strategy.

What is most interesting is how the cost estimate for Markers is so wrong compared to the actual time taken to execute the queries:

Strategy

cost

rowCount

Total ms (avg)

Literals

110472

7500

1021

Markers

14

3.4

3075

The predicate in the Markers query is transformed to effectively be:

 WHERE (column >= lowestINListValue AND column <= greatestINListValue AND column IN (?, ..., ?) 

This immediately implies that we're examining all values of the column between lowestINListValue (known as the start key) and greatestINListValue (the stop key), even though there is no reason to believe these values are closely clustered. Furthermore, it implies that we aren't doing N probes of the BTree index (i.e. one for each entry in the IN list), but rather that we're scanning a potentially large range of the BTree, applying the IN predicate to each row in the selected range (i.e. if we've got 4 entries in the IN list, and 10,000 index rows to examine, we'd need to perform up to 40,000 comparisons, instead of performing 4 probes of the BTree).

Debugging through the execution of the Markers query, I was very surprised to discover that the situation is worse than the prior paragraph states: After fetching an index row in the range [start key, stop key], Derby fetches the base table row before executing the predicate to filter out unneeded base table rows. This means that in the example above, we load 10,000 base table rows, when we only expect to get 4 * some small number of rows returned. Sigh.

For the case evaluated in the test program, it would be better to treat the IN list rather like the outer table of a nested loop join (as the results below demonstrate), where each parameter in the IN list is used to "probe" the index.

Here are the timings for all of the strategies for different numbers of parameters (the ID Count column below), on a table of 100,000 rows, using embedded Derby. There were 29689 unique values for the column used in the where clause, so on average each value was in a bit more than 3 rows.

Literals

Literal

Markers

Marker

Join Temp

Join Scratch

Join Savepoint

Nested Temp

Nested Scratch

Nested Savepoint

ID Count

Total ms

ms / ID

Total ms

ms / ID

Total ms

ms / ID

Total ms

ms / ID

Total ms

ms / ID

Total ms

ms / ID

Total ms

ms / ID

Total ms

ms / ID

Total ms

ms / ID

Total ms

ms / ID

1

40

40

50

50

10

10

0

0

70

70

10

10

0

0

1352

1352

1262

1262

1142

1142

2

646

323

60

30

390

195

0

0

60

30

20

10

10

5

1262

631

1152

576

1112

556

3

1041

347

120

40

1952

651

10

3

70

23

10

3

10

3

962

321

901

300

891

297

4

882

221

50

13

2964

741

0

0

160

40

0

0

10

3

926

232

901

225

1082

271

5

941

188

70

14

3285

657

0

0

60

12

10

2

10

2

926

185

911

182

896

179

6

821

137

111

19

2483

414

10

2

110

18

10

2

10

2

911

152

891

149

871

145

7

986

141

150

21

3225

461

10

1

30

4

20

3

10

1

942

135

876

125

881

126

8

941

118

160

20

2664

333

0

0

40

5

10

1

10

1

991

124

891

111

896

112

9

931

103

121

13

3630

403

0

0

20

2

10

1

10

1

906

101

891

99

931

103

10

1021

102

100

10

3075

308

10

1

30

3

10

1

20

2

896

90

982

98

886

89

20

1051

53

150

8

3765

188

10

1

40

2

20

1

20

1

911

46

906

45

891

45

30

1041

35

251

8

3896

130

20

1

40

1

40

1

50

2

931

31

906

30

901

30

40

1082

27

310

8

4687

117

20

1

40

1

30

1

60

2

922

23

921

23

921

23

50

1051

21

380

8

4566

91

20

0

40

1

40

1

50

1

942

19

912

18

901

18

60

1036

17

450

8

4587

76

20

0

50

1

50

1

50

1

921

15

1061

18

912

15

70

1076

15

551

8

4977

71

30

0

70

1

50

1

50

1

921

13

922

13

921

13

80

1082

14

551

7

5397

67

40

1

40

1

50

1

80

1

931

12

951

12

1142

14

90

1111

12

596

7

5458

61

30

0

60

1

50

1

100

1

921

10

921

10

936

10

100

1086

11

641

6

5733

57

40

0

60

1

60

1

70

1

916

9

992

10

961

10

150

1122

7

951

6

6935

46

60

0

60

0

90

1

100

1

936

6

966

6

966

6

200

1131

6

1652

8

7981

40

70

0

80

0

171

1

140

1

951

5

981

5

991

5

250

1171

5

1973

8

9184

37

70

0

110

0

140

1

190

1

1012

4

991

4

1006

4

300

1242

4

2323

8

11006

37

100

0

121

0

190

1

260

1

981

3

1106

4

1051

4

350

1181

3

2774

8

12328

35

110

0

130

0

195

1

241

1

1102

3

1292

4

1071

3

400

1202

3

2929

7

13109

33

150

0

150

0

260

1

311

1

966

2

1262

3

1096

3

450

1247

3

3249

7

14772

33

151

0

160

0

971

2

341

1

966

2

1077

2

1111

2

500

1306

3

3596

7

16093

32

160

0

190

0

320

1

341

1

986

2

1152

2

1116

2

750

1432

2

5758

8

23054

31

235

0

255

0

471

1

621

1

1056

1

1256

2

1272

2

1000

1782

2

7821

8

31466

31

301

0

351

0

571

1

666

1

1086

1

1297

1

1432

1

1250

1993

2

9514

8

41631

33

411

0

425

0

1021

1

1062

1

1136

1

1392

1

1602

1

1500

2484

2

11216

7

54029

36

456

0

500

0

1091

1

1192

1

1161

1

1763

1

2383

2

1750

2859

2

13389

8

70354

40

520

0

596

0

1172

1

1402

1

1572

1

1993

1

2143

1

2000

3555

2

15172

8

83072

42

581

0

691

0

1312

1

1292

1

1247

1

1933

1

2383

1

2250

4356

2

17095

8

107638

48

701

0

761

0

1322

1

2063

1

1291

1

2403

1

2433

1

2500

5468

2

18777

8

123211

49

732

0

842

0

1933

1

2043

1

1362

1

2334

1

2734

1

DerbyBug47 (last edited 2009-09-20 22:12:02 by localhost)