Hadoop Eclipse Plug-in

JIRA MAPREDUCE-1280 contains a version of the plugin that works with hadoop 0.20.2 and eclipse 3.5/3.6

The Hadoop Eclipse Plug-in provides tools to ease the experience of Map/Reduce on Hadoop. Among other things, the plug-in provides support to:

Requirements

To ensure a safe behavior of the plug-in, you should consider the following facts and recommendations:

More will come here soon.


Overview

Here is an overview of the Eclipse environment for Hadoop:

drawing:EclipseMain

The environment is accessible through the "Map/Reduce perspective" (the blue elephant icon in the top-right side of the main window). To open this perspective, select the menu: Window, Open Perspective, Other, and finally Map/Reduce. This perspective is roughly a Java edition environment extended with:

Hadoop Map/Reduce locations

The location view allows the creation, edition and deletion of Map/Reduce locations.

To create a new location click on the "New Hadoop location..." button in the view toolbar or in the contextual menu.

drawing:LocationContextualMenu

A wizard pops up and asks for the location parameters.

drawing:NewLocationWizard

You must at least fill the following entries:

The Map/Reduce and the DFS masters are colocated by default (i.e. run on the same host).

A SOCKS proxy can be configured if you cannot access the Map/Reduce location directly because your machine is not directly connected to the location. See section "How to use SOCKS proxies" for more details.

Not implemented yet: user name, validation button, load from file button.

How to use SOCKS proxies

To set up a local proxy through an ssh server PROXY at port 21080, for example:

  ssh -D 21080 PROXY

Note that when using a SOCKS proxy in a local client like the Eclipse plugin, you should ensure that your Hadoop cluster does not inherit your proxy settings, or the Hadoop daemons won't be able to communicate with each other. To override proxy settings in the Hadoop nodes, add the following property setting to each node's hadoop-site.xml:

<property>
    <name>hadoop.rpc.socket.factory.class.default</name>
    <value>org.apache.hadoop.net.StandardSocketFactory</value>
    <final>true</final>
    <description>
      Prevent proxy settings set up by clients in their job configs from affecting our connectivity.
    </description>
</property>

The standard socket factory produces RPC sockets with direct connections (ie, without going through any proxies), and the "final" attribute prevents your job configuration from overriding this property's value.

How to build and install the plug-in

To build the Eclipse plug-in, you need the Hadoop source files and a working Eclipse environment (version 3.3+). When compiling Hadoop, the Eclipse plug-in will be built if it founds the Eclipse environment path in the ant property "eclipse.home". The build framework looks for this property in ${hadoop-src-root}/src/contrib/eclipse-plugin/build.properties and in $HOME/eclipse-plugin.build.properties.

A typical $HOME/eclipse-plugin.build.properties file would contain the following entry: eclipse.home=/path/to/eclipse

Then the plug-in should be built when compiling Hadoop: ant clean package (from the ${hadoop-src-root} directory), which will produce {hadoop-src-root}/build/contrib/eclipse-plugin/hadoop-${version}-eclipse-plugin.jar

To install the generated plug-in in your Eclipse environment, remove first all previous versions of the plug-in from your Eclipse environment and copy the hadoop-${version}-eclipse-plugin.jar file generated as described above in your ${eclipse.home}/plugins/ directory. When you restart Eclipse, the Map/Reduce perspective should be available.