How to transition Hadoop to JDK7 and JDK8

There has been discussion on common-dev about how we're going to bump the minimum supported JDK version for Hadoop.

A number of proposals have surfaced, but let's start by going over some of the concerns to frame the discussion.

Background

Our compatibility guidelines are documented at http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html

branch-2 right now still supports JDK6. branch-2 works on JDK7, and there are anecdotal reports that it works on JDK8 too, with a little work.

From a vendor point of view, both Cloudera and Hortonworks released branch-2 based distros this year (2014), so it's likely that they will be doing development on branch-2 for the next year or two. Both seem to ask customers to use JDK7 already for branch-2 for HDP2/CDH5.

Looking at the big picture, it's reasonable to believe that the users of Apache Hadoop would be better served by us if we prioritized operational aspects such as rolling upgrades, wire-compatibility etc. for a couple of years. Since not everyone has moved to hadoop-2 yet, talk of more incompatibility between hadoop-2/hadoop-3 or between hadoop-3/hadoop-4 within the next 12 months would certainly be a big issue for users - especially w.r.t rolling upgrades, wire-compat etc.

Some improvements have also been made to support multiple versions of MR on the same cluster with their own classpaths. See MAPREDUCE-4421 and MAPREDUCE-1700 for further details. Furthermore, we also support allowing user applications to use a JDK different from Hadoop itself - for e.g. HDFS/YARN can run JDK7 while MR applications can run JDK6 if they chose right now.

Proposals

Listed in roughly chronological order. Not attaching names to keep this disinterested.

Proposal A

JDK7 is also EOL in April 2015, so we would like to avoid another JDK6 situation where we're stuck on JDK7 after it's EOL.

Proposal B

Proposal C

Proposal D

Discussion

Proposal A

Proposal B

Proposal C

Proposal D