Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

PigMix is a set of queries used test pig performance from release to release. There are queries that test latency (how long does it take to run this query?), and queries that test scalability (how many fields or records can pig handle before it fails?). In addition it includes a set of map reduce java programs to run equivalent map reduce jobs directly. These will be used to test the performance gap between direct use of map reduce and using pig. In Jun 2010, we release PigMix2, which include 5 more queries in addition to the original 12 queries into PigMix to measure the performance of new Pig features. We will publish the result of both PigMix and PigMix2.

...

Usage

To run PigMix

...

, run the following command from PIG_HOME:

Code Block

ant -Dharness.hadoop.home=$HADOOP_HOME pigmix-deploy (generate test dataset)
ant -Dharness.hadoop.home=$HADOOP_HOME pigmix (run the PigMix benchmark)

You can optionally set HADOOP_CONF_DIR before run.

If you want to change the default size of test dataset, change test/perf/pigmix/conf/config.sh.

Note the PigMix is checked in to Pig 0.12 and beyond. If you want to run it in earlier version of Pig, Please go to https://issues.apache.org/jira/browse/PIG-200 and use PIG-200-0.12.patch.

Runs

PigMix

The following table includes runs done of the pig mix. All of these runs have been done on a cluster with 26 slaves plus one machine acting as the name node and job tracker. The cluster was running hadoop version 0.18.1. (TODO: Need to get specific hardware info on those machines).

The tests were run against two versions of pig: top of trunk, and top of types branch both as of Nov 21 2008.

The tests were run three times for each version and the results averaged.

tot = top of trunk
totb = top of types branch

Version

Map Reduce Java Code

tot 11/21/08

totb 11/21/08

totb 1/20/09

tot 2/23/09

Date Run

11/22/08

11/21/08

11/21/08

The following table includes runs done of the pig mix. All of these runs have been done on a cluster with 26 slaves plus one machine acting as the name node and job tracker. The cluster was running
hadoop version 0.18.1. (TODO: Need to get specific hardware info on those machines).

The tests were run against two
versions of pig: top of trunk, and top of types branch both as of Nov 21 2008.

The tests were run three times for each version and the results averaged.

tot = top of trunk
totb = top of types branch

Version

Map Reduce Java Code

tot 11/21/08

totb 11/21/08

totb 1/20/09

tot 2/23/09

Date Run

11/22/08

11/21/08

11/21/08

1/20/09

2/23/09

L1 explode

116

261

283

218

205

L2 fr join

41

1665

253

168

89

L3 join

97

1912

320

258

254

L4 distinct agg

68

254

193

110

116

L5 anti-join

90

1535

281

209

112

L6 large group by key

61

294

226

126

120

L7 nested split

72

243

204

107

102

L8 group all

56

462

194

104

103

L9 order by 1 field

286

5294

867

851

444

L10 order by multiple fields

634

1403

565

469

447

L11 distinct + union

120

316

255

164

154

L12 multi-store

150

fails

781

499

804

Total time

1791

13638

4420

3284

2950

Compared to hadoop

1.0

7.6

2.5

1.8

1.6

Weighted Average

1.0

11.2

3.26

2.20

1.97

The totb run of 1/20/09 includes the change to make BufferedPositionedInputStream use a buffer instead of relying on hadoop to buffer.

tot run of 2/23/09, top of trunk is now what was on the types branch (that is proto 0.2.0). This run includes fragment replicate join and rework of partitioning for order by.

Run of 5/28/09, placed in a separate table because there were underlying cluster changes, thus the map reduce tests needed to be rerun. This is the same code base that became 0.3.0.

Version

Map Reduce Java code

tot 5/27/09

Date Run

5/28/09

5/28/09

L1 explode

119

205

L2 fr join

44

110

L3 join

113

314

L4 distinct agg

76

153

L5 anti-join

96

128

L6 large group by key

67

148

L7 nested split

67

133

L8 group all

64

115

L9 order by 1 field

329

563

L10 order by multiple fields

607

532

L11 distinct + union

106

203

L12 multi-store

139

159

Total time

1826

2764

Compared to hadoop

N/A

1.5

Weighted average

N/A

1.83

split

72

243

204

107

102

L8 group all

56

462

194

104

103

L9 order by 1 field

286

5294

867

851

444

L10 order by multiple fields

634

1403

565

469

447

L11 distinct + union

120

316

255

164

154

L12 multi-store

150

fails

781

499

804

Total time

1791

13638

4420

3284

2950

Compared to hadoop

1.0

7.6

2.5

1.8

1.6

Weighted Average

1.0

11.2

3.26

2.20

1.97

The totb run of 1/20/09 includes the change to make BufferedPositionedInputStream use a buffer instead of relying on hadoop to buffer.

tot run of 2/23/09, top of trunk is now what was on the types branch (that is proto 0.2.0). This run includes fragment replicate join and rework of partitioning for order by.

Run of 5/28/09, placed in a separate table because there were underlying cluster changes, thus the map reduce tests needed to be rerun. This is the same code base that became 0.3.0.

Version

Map Reduce Java code

tot 5/27/09

Date Run

5/28/09

5/28/09

L1 explode

119

205

L2 fr join

44

110

L3 join

113

314

L4 distinct agg

76

153

L5 anti-join

96

128

L6 large group by key

67

148

L7 nested split

67

133

L8 group all

64

115

L9 order by 1 field

329

563

L10 order by multiple fields

607

532

L11 distinct + union

106

203

L12 multi-store

139

159

Total time

1826

2764

Compared to hadoop

N/A

1.5

Weighted average

N/A

1.83

Run date: June 28, 2009, run against top of trunk as of that day.
Note that the columns got reversed in this one (Pig then MR)

Test

Pig run time

Java run time

Multiplier

PigMix_1

204

117.33

1.74

PigMix_2

110.33

50.67

2.18

PigMix_3

292.33

125

2.34

PigMix_4

149.67

85.33

1.75

PigMix_5

131.33

105

1.25

PigMix_6

146.33

65.33

2.24

PigMix_7

128.33

82

1.57

PigMix_8

126.33

63.67

1.98

PigMix_9

506.67

312.67

1.62

PigMix_10

555

643

0.86

PigMix_11

206.33

136.67

1.51

PigMix_12

173

161.67

1.07

Total

2729.67

1948.33

1.40

Weighted avg

 

 

1.68

Run date: August 27, 2009, run against top of trunk as of that day.

Test

Pig run time

Java run time

Multiplier

PigMix_1

218

133.33

1.635

PigMix_2

99.333

48

2.07

PigMix_3

272

127.67

2.13

PigMix_4

142.33

76.333

1.87

PigMix_5

127.33

107.33

1.19

PigMix_6

135.67

73

1.86

PigMix_7

124.67

78.333

1.59

PigMix_8

117.33

68

1.73

PigMix_9

356.33

323.67

1.10

PigMix_10

511.67

684.33

0.75

PigMix_11

180

121

1.49

PigMix_12

156

160.67

0.97

Total

2440.67

2001.67

1.22

Weighted avg

 

 

1.53

Run date: October 18, 2009, run against top of trunk as of that day.
With this run we included a new measure, weighted average. Our previous multiplier that we have been publishing takes the total time of running all 12 Pig Latin scripts and compares it to the total time of running all 12 Java Map Reduce programs. This is a valid way to measure, as it shows the total amount of time to do all these operations on both platforms. But it has the drawback that it gives more weight to long running operations (such as joins and order bys) while masking the performance in faster operations such as group bys. The new "weighted average" adds up the multiplier for each Pig Latin script vs. Java program separately and then divides by 12, thus weighting each test equally. In past runs the weighted average had significantly lagged the overall average (for example, in the run above for August 27 it was 1.5 even though the total difference was 1.2). With this latest run it still lags some, but the gap has shrunk noticably.

Test

Pig run time

Java run time

Multiplier

PigMix_1

135.0

133.0

1.02

PigMix_2

46.67

39.33

1.19

PigMix_3

184.0

98.0

1.88

PigMix_4

71.67

77.67

0.92

PigMix_5

70.0

83.0

0.84

PigMix_6

76.67

61.0

1.26

PigMix_7

71.67

61.0

1.17

PigMix_8

43.33

47.67

0.91

PigMix_9

184.0

209.33

0.88

PigMix_10

268.67

283.0

0.95

PigMix_11

145.33

168.67

0.86

PigMix_12

55.33

95.33

0.58

Total

1352.33

1357

1.00

Weighted avg

 

 

1.04

Run date: January 4, 2010, run against 0.6 branch as of that day

Test

Pig run time

Java run time

Multiplier

PigMix_1

138.33

112.67

1.23

PigMix_2

66.33

39.33

1.69

PigMix_3

199

83.33

2.39

PigMix_4

59

60.67

0.97

PigMix_5

80.33

113.67

0.71

PigMix_6

65

77.67

0.84

PigMix_7

63.33

61

1.04

PigMix_8

40

47.67

0.84

PigMix_9

214

215.67

0.99

PigMix_10

284.67

284.33

1.00

PigMix_11

141.33

151.33

0.93

PigMix_12

55.67

115

0.48

Total

1407

1362.33

1.03

Weighted Avg

 

 

1.09

PigMix2

Run date: May 29, 2010Run date: June 28, 2009, run against top of trunk as of that day.
Note that the columns got reversed in this one (Pig then MR)

Test

Pig run time

Java run time Multiplier time

Multiplier

PigMix_1

122.33

117

1.05

PigMix_2

50.33

42.67

1.18

PigMix_1 3

204 189

117 100.33

1.74 88

PigMix_2 4

75.67

61

1.24

PigMix_5

64

138

110.33

50.67

2 0.18 46

PigMix_3 6

65.67

69 292.33

125

2 0.34 95

PigMix_4 7

149 88.67 33

85 84.33

1.75 .05

PigMix_8

39

47.67

0.82

PigMix_5 9

131 274.33

105 215.33

1.25 27

PigMix_6 10

146 333.33

65 311.33

2 1.24 07

PigMix_7 11

128 151.33

82 157

1 0.57 96

PigMix_8 12

126 70.33 67

63 97.67

1 0.98 72

PigMix_9

506.67

312.67

1.62

PigMix_10

555

643

0.86

PigMix_11

206.33

136.67

1.51

PigMix_12

173

161.67

1.07

Total

2729.67

1948.33

1.40

13

80

33

2.42

PigMix_14

69

86.33

0.80

PigMix_15

80.33

69.33

1.16

PigMix_16

82.33

69.33

1.19

PigMix_17

286

229.33

1.25

Total

2121.67

1929.67

1.10

Weighted Avg Weighted avg

 

 

1.68 15

Run date: August 27Jun 11, 20092011, run against top of trunk as of that day.

Test

Pig run time

Java run time

Multiplier

PigMix_1

218 130

133.33 139

1 0.635 94

PigMix_2

99.333 66

48.67

2 1.07 36

PigMix_3

272 138

127 107.67 33

2 1.13 29

PigMix_4

106

142 78.33 76.333

1.35

PigMix_5

135.67

114

1.87 19

PigMix_5 6

127 103.33 67

107 74.33

1.19 39

PigMix_6 7

135 77.67

73 77.33

1.86 .00

PigMix_8

56.33

57

0.99

PigMix_7 9

124 384.67

78 280.333 33

1.59 37

PigMix_8 10

380

117 354.33 68 67

1.73 07

PigMix_9 11

356.33

164

141 323.67

1.10 16

PigMix_10 12

511 109.67

684 187.33

0.75 59

PigMix_11 13

180 78 121

44.33

1.49 76

PigMix_12 14

156 105.33

160 111.67

0.97 94

Total PigMix_15

2440 89.67 2001.67

87

1.22 03

PigMix_16

87.67

75.33

Weighted avg

 

 

1.53

Run date: October 18, 2009, run against top of trunk as of that day.
With this run we included a new measure, weighted average. Our previous multiplier that we have been publishing takes the total time of running all 12 Pig Latin scripts and compares it to the total time of running all 12 Java Map Reduce programs. This is a valid way to measure, as it shows the total amount of time to do all these operations on both platforms. But it has the drawback that it gives more weight to long running operations (such as joins and order bys) while masking the performance in faster operations such as group bys. The new "weighted average" adds up the multiplier for each Pig Latin script vs. Java program separately and then divides by 12, thus weighting each test equally. In past runs the weighted average had significantly lagged the overall average (for example, in the run above for August 27 it was 1.5 even though the total difference was 1.2). With this latest run it still lags some, but the gap has shrunk noticably.

Test

Pig run time

Java run time

Multiplier

PigMix_1

135.0

133.0

1.02

PigMix_2

46.67

39.33

1.19

PigMix_3

184.0

98.0

1.88

PigMix_4

71.67

77.67

0.92

PigMix_5

70.0

83.0

0.84

PigMix_6

76.67

61.0

1.26

PigMix_7

71.67

61.0

1.17

PigMix_8

43.33

47.67

0.91

PigMix_9

184.0

209.33

0.88

PigMix_10

268.67

283.0

0.95

PigMix_11

145.33

168.67

0.86

PigMix_12

55.33

95.33

0.58

Total

1352.33

1357

1.00

Weighted avg

 

 

1.04

Run date: January 4, 2010, run against 0.6 branch as of that day

16

PigMix_17

171.33

152.33

1.12

Total

2383.67

2130

1.12

Weighted Avg

 

 

1.16

Pig 0.9.2

Test

Pig run time

Java run time

Multiplier

PigMix_1

146

147

0.993197278911565

PigMix_2

73

61

1.19672131147541

PigMix_3

134

158

0.848101265822785

PigMix_4

91

87

1.04597701149425

PigMix_5

81

153

0.529411764705882

PigMix_6

91

81

1.12345679012346

PigMix_7

71

86

0.825581395348837

PigMix_8

56

61

0.918032786885246

PigMix_9

302

192

1.57291666666667

PigMix_10

312

226

1.38053097345133

PigMix_11

207

222

0.932432432432432

PigMix_12

96

163

0.588957055214724

PigMix_13

76

127

0.598425196850394

PigMix_14

94

157

0.598726114649682

PigMix_15

86

92

0.934782608695652

PigMix_16

80

82

0.975609756097561

PigMix_17

196

176

1.11363636363636

Total

2192

2271

0.965213562

Weighted Avg

 

 

0.951558634

Pig 0.10.1

Test

Pig run time

Java run time

Multiplier

PigMix_1

147

146

1.00684931506849

PigMix_2

74

62

1.19354838709677

PigMix_3

140

158

0.886075949367089

PigMix_4

87

86

1.01162790697674

PigMix_5

81

153

0.529411764705882

PigMix_6

92

262

0.351145038167939

PigMix_7

76

86

0.883720930232558

PigMix_8

62

61

1.01639344262295

PigMix_9

303

187

1.62032085561497

PigMix_10

303

232

1.30603448275862

PigMix_11

188

218

0.862385321100917

PigMix_12

101

157

0.643312101910828

PigMix_13

82

132

0.621212121212121

PigMix_14

99

158

0.626582278481013

PigMix_15

82

91

0.901098901098901

PigMix_16

82

82

1

PigMix_17

206

177

1.1638418079096

Total

2205

2448

0.900735294117647

Test

Pig run time

Java run time

Multiplier

PigMix_1

138.33

112.67

1.23

PigMix_2

66.33

39.33

1.69

PigMix_3

199

83.33

2.39

PigMix_4

59

60.67

0.97

PigMix_5

80.33

113.67

0.71

PigMix_6

65

77.67

0.84

PigMix_7

63.33

61

1.04

PigMix_8

40

47.67

0.84

PigMix_9

214

215.67

0.99

PigMix_10

284.67

284.33

1.00

PigMix_11

141.33

151.33

0.93

PigMix_12

55.67

115

0.48

Total

1407

1362.33

1.03

Weighted Avg

 

 

1 0.09

PigMix2

919032977

Pig 0.11.1Run date: May 29, 2010, run against top of trunk as of that day.

Test

Pig run time

Java run time

Multiplier

PigMix_1

122.33 163

117 141

1.05 15602836879433

PigMix_2

50.33

66

61 42.67

1.18 08196721311475

PigMix_3

189 141

100.33 158

1 0.88 892405063291139

PigMix_4

75.67 87

61 86

1.24 01162790697674

PigMix_5

64 82

138.67 158

0.46 518987341772152

PigMix_6

65.67

69.33

92

81

1.1358024691358 0.95

PigMix_7

88.33

84.33

82

87

0.942528735632184 1.05

PigMix_8

39 63

47.67 62

0 1.82 01612903225806

PigMix_9

274.33

320

207 215.33

1.27 54589371980676

PigMix_10

333.33

311

226 311.33

1.07 37610619469027

PigMix_11

151.33 184

157 218

0.96 844036697247706

PigMix_12 70.67 12

97 .67

158

0.72 613924050632911

PigMix_13

80 78

33 127

2 0.42 614173228346457

PigMix_14

69 101

86.33 158

0.80 639240506329114

PigMix_15

80.33

69.33

87

91

0.956043956043956 1.16

PigMix_16

82 .33 69.33

87

1 0.19 942528735632184

PigMix_17

286 203

229.33 176

1.25 15340909090909

Total

2121.67

1929.67

2239

2282

0.981156879929886 1.10

Weighted Avg

 

 

1 0.15 967107783

Pig 0.12 (4/4/2013)Run date: Jun 11, 2011, run against top of trunk as of that day.

Test

Pig run time

Java run time

Multiplier

PigMix_1

130 168

139 142

0 1.94 1830985915493

PigMix_2

66 71

48.67 62

1.36 14516129032258

PigMix_3

138 141

107.33 158

1 0.29 892405063291139

PigMix_4

106 93

78.33 87

1.35 06896551724138

PigMix_5

135.67 87

114 158

1 0.19 550632911392405

PigMix_6

103.67

93

81 74.33

1.39 14814814814815

PigMix_7

77 .67 77.33

87

1 0.00 885057471264368

PigMix_8

56.33 62

57

0 1.99 08771929824561

PigMix_9

384.67

310

192 280.33

1.37 61458333333333

PigMix_10

380 311

354.67 221

1.07 40723981900452

PigMix_11

164 190

141 217

1 0.16 875576036866359

PigMix_12

109.67

102

158 187.33

0.59 645569620253165

PigMix_13

78 77

44.33 133

1 0.76 578947368421053

PigMix_14

105.33

101

343 111.67

0.94 294460641399417

PigMix_15

89.67 87 87

86

1.03 01162790697674

PigMix_16

87.67

82

82 75.33

1 .16

PigMix_17

171.33

207

177 152.33

1.12 16949152542373

Total

2383.67 2259

2130 2441

1 0.12 925440393281442

Weighted Avg

 

 

1 0.16 974040267

Features Tested

Based on a sample of user queries, PigMix includes tests for the following features.

...

Code Block
register pigperf.jar;
A = load '$page/user/pig/tests/data/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
    as (user, action, timespent, query_term, ip_addr, timestamp,
        estimated_revenue, page_info, page_links);
B = foreach A generate user, (int)action as action, (map[])page_info as page_info,
    flatten((bag{tuple(map[])})page_links) as page_links;
C = foreach B generate user,
    (action == 1 ? page_info#'a' : page_links#'b') as header;
D = group C by user parallel $parallelfactor40;
E = foreach D generate group, COUNT(C) as cnt;
store E into '$outL1out';

Script L2

This script tests using a join small enough to do in fragment and replicate (feature 7).

Code Block
register pigperf.jar;
A = load '$page/user/pig/tests/data/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
    as (user, action, timespent, query_term, ip_addr, timestamp,
        estimated_revenue, page_info, page_links);
B = foreach A generate user, estimated_revenue;
alpha = load '$power/user/pig/tests/data/pigmix/power_users' using PigStorage('\u0001') as (name, phone,
        address, city, state, zip);
beta = foreach alpha generate name;
C = join B by user, beta by name, Ausing by'replicated' userparallel $parallelfactor40;
store C into '$outL2out';

Script L3

This script tests a join too large for fragment and replicate. It also contains a join followed by a group by on the same key,
something that pig could potentially optimize by not regrouping.

Code Block
register pigperf.jar;
A = load '$page/user/pig/tests/data/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
    as (user, action, timespent, query_term, ip_addr, timestamp,
        estimated_revenue, page_info, page_links);
B = foreach A generate user, (double)estimated_revenue;
alpha = load '$users/user/pig/tests/data/pigmix/users' using PigStorage('\u0001') as (name, phone, address,
        city, state, zip);
beta = foreach alpha generate name;
C = join beta by name, AB by user parallel $parallelfactor40;
D = group C by $0 parallel $parallelfactor40;
E = foreach D generate group, SUM(C.estimated_revenue);
store E into '$outL3out';

Script L4

This script covers foreach generate with a nested distinct (feature 10).

Code Block
register pigperf.jar;
A = load '$page '/user/pig/tests/data/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
    as (user, action, timespent, query_term, ip_addr, timestamp,
        estimated_revenue, page_info, page_links);
B = foreach A generate user, action;
C = group B by user parallel $parallelfactor40;
D = foreach C {
    aleph = B.action;
    beth = distinct aleph;
    generate group, COUNT(beth);
}
store D into '$outL4out';

Script L5

This script does an anti-join. This is useful because it is a use of cogroup that is not a regular join (feature 9).

Code Block
register pigperf.jar;
A = load '$page/user/pig/tests/data/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
    as (user, action, timespent, query_term, ip_addr, timestamp,
        estimated_revenue, page_info, page_links);
B = foreach A generate user;
alpha = load '$users/user/pig/tests/data/pigmix/users' using PigStorage('\u0001') as (name, phone, address,
        city, state, zip);
beta = foreach alpha generate name;
C = cogroup beta by name, AB by user parallel $parallelfactor40;
D = filter C by COUNT(beta) == 0;
E = foreach D generate group;
store E into '$outL5out';

Script L6

This script covers the case where the group by key is a significant percentage of the row (feature 12).

Code Block
register pigperf.jar;
A = load '$page/user/pig/tests/data/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
    as (user, action, timespent, query_term, ip_addr, timestamp,
        estimated_revenue, page_info, page_links);
B = foreach A generate user, action, (int)timespent as timespent, query_term, ip_addr, timestamp;
C = group B by (user, query_term, ip_addr, timestamp) parallel $parallelfactor40;
D = foreach C generate flatten(group), SUM(B.timespent);
store D into '$outL6out';

Script L7

This script covers having a nested plan with splits (feature 11).

Code Block
register pigperf.jar;
A = load '$page/user/pig/tests/data/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader() as (user, action, timespent, query_term,
            ip_addr, timestamp, estimated_revenue, page_info, page_links);
B = foreach A generate user, timestamp;
C = group B by user parallel $parallelfactor40;
D = foreach C {
    morning = filter B by timestamp < 43200;
    afternoon = filter B by timestamp >= 43200;
    generate group, COUNT(morning), COUNT(afternoon);
}
store D into '$outL7out';

Script L8

This script covers group all (feature 13).

Code Block
register pigperf.jar;
A = load '$page'/user/pig/tests/data/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
    as (user, action, timespent, query_term, ip_addr, timestamp,
        estimated_revenue, page_info, page_links);
B = foreach A generate user, (int)timespent as timespent, (double)estimated_revenue as estimated_revenue;
C = group B all;
D = foreach C generate SUM(B.timespent), AVG(B.estimated_revenue);
store D into '$outL8out';

Script L9

This script covers order by of a single value (feature 15).

Code Block
register pigperf.jar;
A = load '$page/user/pig/tests/data/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
    as (user, action, timespent, query_term, ip_addr, timestamp,
        estimated_revenue, page_info, page_links);
B = order A by query_term parallel $parallelfactor40;
store B into '$outL9out';

Script L10

This script covers order by of multiple values (feature 15).

Code Block
register pigperf.jar;
A = load '$page/user/pig/tests/data/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
    as (user, action, timespent:int, query_term, ip_addr, timestamp,
        estimated_revenue:double, page_info, page_links);
B = order A by query_term, estimated_revenue desc, timespent parallel $parallelfactor40;
store B into '$outL10out';

Script L11

This script covers distinct and union and reading from a wide row but using only one field (features: 1, 14).

Code Block
register pigperf.jar;
A = load '$page/user/pig/tests/data/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
    as (user, action, timespent, query_term, ip_addr, timestamp,
        estimated_revenue, page_info, page_links);
B = foreach A generate user;
C = distinct B parallel $parallelfactor40;
alpha = load '$widerow/user/pig/tests/data/pigmix/widerow' using PigStorage('\u0001');
beta = foreach alpha generate $0 as name;
gamma = distinct beta parallel $parallelfactor40;
D = union C, gamma;
E = distinct D parallel $parallelfactor40;
store E into '$outL11out';

Script L12

This script covers multi-store queries (feature 16).

Code Block
register pigperf.jar;
A = load '$page '/user/pig/tests/data/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
    as (user, action, timespent, query_term, ip_addr, timestamp,
        estimated_revenue, page_info, page_links);
B = foreach A generate user, action, (int)timespent as timespent, query_term,
    (double)estimated_revenue as estimated_revenue;
split B into C if user is not null, alpha if user is null;
split C into D if query_term is not null, aleph if query_term is null;
E = group D by user parallel $parallelfactor40;
F = foreach E generate group, MAX(D.estimated_revenue);
store F into 'highest_value_page_per_user';
beta = group alpha by query_term parallel $parallelfactor40;
gamma = foreach beta generate group, SUM(alpha.timespent);
store gamma into 'total_timespent_per_term';
beth = group aleph by action parallel $parallelfactor40;
gimel = foreach beth generate group, COUNT(aleph);
store gimel into 'queries_per_action';

...

Code Block
register pigperf.jar;
A = load '/user/pig/tests/data/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
        	as (user, action, timespent, query_term, ip_addr, timestamp, estimated_revenue, page_info, page_links);
B = foreach A generate user, estimated_revenue;
alpha = load ':INPATH:/user/pig/tests/data/pigmix/power_users_samples' using PigStorage('\\u0001') as (name, phone, address, city, state, zip);
beta = foreach alpha generate name, phone;
C = join B by user left outer, beta by name parallel $parallelfactor40;
store C into '$outL13out';

Script L14 (PigMix2 only)

...

Code Block
register pigperf.jar;
A = load '/user/pig/tests/data/pigmix/page_views_sorted' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
    as (user, action, timespent, query_term, ip_addr, timestamp, estimated_revenue, page_info, page_links);
B = foreach A generate user, estimated_revenue;
alpha = load '/user/pig/tests/data/pigmix/users_sorted' using PigStorage('\\u0001') as (name, phone, address, city, state, zip);
beta = foreach alpha generate name;
C = join B by user, beta by name using "'merge"';
store C into '$outL14out';

Script L15 (PigMix2 only)

...

Code Block
register pigperf.jar;
A = load ' load '/user/pig/tests/data/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
    as (user, action, timespent, query_term, ip_addr, timestamp, estimated_revenue, page_info, page_links);
B = foreach A generate user, action, estimated_revenue, timespent;
C = group B by user parallel $parallelfactor40;
D = foreach C {
    beth = distinct B.action;
    rev = distinct B.estimated_revenue;
    ts = distinct B.timespent;
    generate group, COUNT(beth), SUM(rev), (int)AVG(ts);
}
store D into '$outL15out';

Script L16 (PigMix2 only)

...

Code Block
register pigperf.jar;
A = load '/user/pig/tests/data/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
    as (user, action, timespent, query_term, ip_addr, timestamp, estimated_revenue, page_info, page_links);
B = foreach A generate user, estimated_revenue;
C = group B by user parallel $parallelfactor40;
D = foreach C {
    E = order B by estimated_revenue;
    F = E.estimated_revenue;
    generate group, SUM(F);
}

store D into '$outL16out';

Script L17 (PigMix2 only)

...

Code Block
register pigperf.jar;
A = load '/user/pig/tests/data/pigmix/widegroupbydata' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
    as (user, action, timespent, query_term, ip_addr, timestamp,
        estimated_revenue, page_info, page_links, user_1, action_1, timespent_1, query_term_1, ip_addr_1, timestamp_1,
        estimated_revenue_1, page_info_1, page_links_1, user_2, action_2, timespent_2, query_term_2, ip_addr_2, timestamp_2,
        estimated_revenue_2, page_info_2, page_links_2);
B = group A by (user, action, timespent, query_term, ip_addr, timestamp,
        estimated_revenue, user_1, action_1, timespent_1, query_term_1, ip_addr_1, timestamp_1,
        estimated_revenue_1, user_2, action_2, timespent_2, query_term_2, ip_addr_2, timestamp_2,
        estimated_revenue_2) parallel $parallelfactor40;
C = foreach B generate SUM(A.timespent), SUM(A.timespent_1), SUM(A.timespent_2), AVG(A.estimated_revenue), AVG(A.estimated_revenue_1), AVG(A.estimated_revenue_2);
store C into '$outL17out';

Features not yet covered: 5 (bzip data)

Data Generation

If you want to run this queires yourselfknow the details of data generation, please , see https://issues.apache.org/jira/browse/PIG-200Image Removed on how to generate the data.
See DataGeneratorHadoop for information on how to run data generator in hadoop mode.