Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

PigMix is a set of queries used test pig performance from release to release. There are queries that test latency (how long does it take to run this query?), and queries that test scalability (how many fields or records can pig handle before it fails?). In addition it includes a set of map reduce java programs to run equivalent map reduce jobs directly. These will be used to test the performance gap between direct use of map reduce and using pig. In Jun 2010, we release PigMix2, which include 5 more queries in addition to the original 12 queries into PigMix to measure the performance of new Pig features. We will publish the result of both PigMix and PigMix2.

Usage

To run PigMix, run the following command from PIG_HOME:

Code Block

ant -Dharness.hadoop.home=$HADOOP_HOME pigmix-deploy (generate test dataset)
ant -Dharness.hadoop.home=$HADOOP_HOME pigmix (run the PigMix benchmark)

You can optionally set HADOOP_CONF_DIR before run.

If you want to change the default size of test dataset, change test/perf/pigmix/conf/config.sh.

Note the PigMix is checked in to Pig 0.12 and beyond. If you want to run it in earlier version of Pig, Please go to https://issues.apache.org/jira/browse/PIG-200Image Added and use PIG-200-0.12.patch.

Runs

PigMix

The following table includes runs done of the pig mix. All of these runs have been done on a cluster with 26 slaves plus one machine acting as the name node and job tracker. The cluster was running hadoop version 0.18.1. (TODO: Need to get specific hardware info on those machines).

...

Data Generation

If you want to run this queires yourselfknow the details of data generation, please , see https://issues.apache.org/jira/browse/PIG-200Image Removed on how to generate the data. See DataGeneratorHadoop for information on how to run data generator in hadoop mode.