Working with Hadoop under Eclipse

Here are instructions for setting up a development environment for Hadoop under the Eclipse IDE. Please feel free to make additions or modifications to this page.

This document assumes you already have Eclipse downloaded, installed, and configured to your liking. It also assumes that you are aware of the HowToContribute page and have given that a read.

Quick Start

We will begin by downloading the Hadoop source. The hadoop-common source tree has three subprojects underneath it that you will see after you pull down the source code: hadoop-common, hdfs, and mapreduce.

Let's begin by getting the latest source from Git (Note there is a a copy mirrored on github but it lags the Apache read-only git repository slightly).

git clone git://git.apache.org/hadoop-common.git

This will create a hadoop-common folder in your current directory, if you "cd" into that folder you will see all the available subprojects. Now we will build the code to get it ready for importing into Eclipse.

From this directory you just 'cd'-ed into (Which is also known as the top-level directory of a branch or a trunk checkout), perform:

$ mvn install -DskipTests
$ mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=true

Note: This may take a while the first time, as all libraries are fetched from the internet, and the whole build is performed.

In Eclipse

After the above, do the following to finally have projects in Eclipse ready and waiting for you to go on that scratch-itching development spree:

For Common

  • File -> Import...
  • Choose "Existing Projects into Workspace"
  • Select the hadoop-common-project directory as the root directory
  • Select the hadoop-annotations, hadoop-auth, hadoop-auth-examples, hadoop-nfs and hadoop-common projects
  • Click "Finish"
  • File -> Import...
  • Choose "Existing Projects into Workspace"
  • Select the hadoop-assemblies directory as the root directory
  • Select the hadoop-assemblies project
  • Click "Finish"
  • To get the projects to build cleanly:
  • * Add target/generated-test-sources/java as a source directory for hadoop-common
  • * You may have to add then remove the JRE System Library to avoid errors due to access restrictions

For HDFS

  • File -> Import...
  • Choose "Existing Projects into Workspace"
  • Select the hadoop-hdfs-project directory as the root directory
  • Select the hadoop-hdfs project
  • Click "Finish"

For MapReduce

  • File -> Import...
  • Choose "Existing Projects into Workspace"
  • Select the hadoop-mapreduce-project directory as the root directory
  • Select the hadoop-mapreduce-project project
  • Click "Finish"

For YARN

  • File -> Import...
  • Choose "Existing Projects into Workspace"
  • Select the hadoop-yarn-project directory as the root directory
  • Select the hadoop-yarn-project project
  • Click "Finish"

Note: in the case of MapReduce the testjar package is broken. This is expected since it is a part of a testcase that checks for incorrect packaging. This is not to be worried about.

To run tests from Eclipse you need to additionally do the following:

  • Under project Properties, select Java Build Path, and the Libraries tab
  • Click "Add External Class Folder" and select the build directory of the current project

Footnotes

  • With the new release of the m2e plug-in, this doesn't work anymore as pretty much all targets are not supported by the new 'connector framework' - Yes, it is a giant mess. This means falling back to m2eclipse or just doing the eclipse generation via mvn as mentioned in this page.
  • No labels