Working with Hadoop under Eclipse

Here are instructions for setting up a development environment for Hadoop under the Eclipse IDE. Please feel free to make additions or modifications to this page.

This document assumes you already have Eclipse downloaded, installed, and configured to your liking. It also assumes that you are aware of the HowToContribute page and have given that a read.

Quick Start

We will begin by downloading the Hadoop source. The hadoop-common source tree has three subprojects underneath it that you will see after you pull down the source code: hadoop-common, hdfs, and mapreduce.

Let's begin by getting the latest source from Git (Note there is a a copy mirrored on github but it lags the Apache read-only git repository slightly).

git clone git://git.apache.org/hadoop-common.git

This will create a hadoop-common folder in your current directory, if you "cd" into that folder you will see all the available subprojects. Now we will build the code to get it ready for importing into Eclipse.

From this directory you just 'cd'-ed into (Which is also known as the top-level directory of a branch or a trunk checkout), perform:

$ mvn install -DskipTests
$ mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=true

Note: This may take a while the first time, as all libraries are fetched from the internet, and the whole build is performed.

In Eclipse

After the above, do the following to finally have projects in Eclipse ready and waiting for you to go on that scratch-itching development spree:

For Common

For HDFS

For MapReduce

For YARN

Note: in the case of MapReduce the testjar package is broken. This is expected since it is a part of a testcase that checks for incorrect packaging. This is not to be worried about.

To run tests from Eclipse you need to additionally do the following:

Footnotes