Local Execution Mode using LocalJobRunner from Hadoop

Currently we have an additional Local execution engine which handles the local execution mode. The physical plan for this has to be built separately and its execution should be done differently. This also introduces different operators for local and map reduce.

Instead of this, we can reuse Hadoop's LocalJobRunner to execute the same Map Reduce Physical plans locally. So we compile the logical plan into a map reduce physical plan and create the jobcontrol object corresponding to the mapred plan. We just need to write a separate launcher which will submit the job to the LocalJobRunner instead of submitting to an external Job Tracker.

Pros

Cons

Pi Song had some interesting observations:

I guess the choice is harder now :) The choice now depends on what we want to do for the full blown foreach. Since I would like to implement choice (ii), I would vote for using LocalJobRunner.

[pi] I think whether to do dynamic execution engine selection might not be a factor in this decision making process.

The main point is “Does LocalJobRunner perform as good as LocalEngine in most cases?”. My concern would be the case where we have a lot of small inner bags in our processing.

I vote (i) to neutralize your vote.