Eclipse is a powerful IDE for java development. Since Hadoop and Mapreduce programming is done in java, it would be better to do our programming in a well-featured Integrated Development Environment (IDE). So, In this post, we are going to learn how to install eclipse on Ubuntu machine and configure it for Hadoop and Mapreduce programming. Let’s start with downloading and installing Eclipse on ubuntu machine.

1. Install Eclipse:

Download latest version of Eclipse IDE for java EE developers from Eclipse downloads page http://www.eclipse.org/downloads/.In this post, we have described installation of Eclipse Kepler which is latest version at the time of writing this post.

Extract the *.tar.gz file into your preferred location of installation directory. Usually into /opt/eclipse.

Set up environment variable ECLIPSE_HOME in .bashrc file with installation directory and add the installation directory into existing of directories in PATH environment variable.

Below are the useful terminal commands to perform above actions in the same sequence. Add the below two entries into .bashrc file.

Entries into .bashrc file

Shell

1

2

3

export ECLIPSE_HOME="/opt/eclipse"

export PATH="$ECLIPSE_HOME:$PATH"

Now we can start eclipse from terminal with $ eclipse command.

2. Eclipse Configuration for Hadoop/Mapreduce:

Eclipse configuration for Hadoop can be done in two methods. One by creating eclipse plugin for the currently using hadoop version and copying it into eclipse plugins folder. And another way by installing Maven plugin for integration of eclipse with hadoop and performing necessary setup.

Creation of Hadoop Eclipse Plugin:

For creation of customized hadoop eclipse plugin for hadoop version currently being used. In this post, we have created plugin for hadoop-2.3.0 release.

Prerequisites:

ant – We need ant building tool to be installed on our machine to create plugin jar file. To install ant on Ubuntu machine use the below command.

ant

Shell

1

2

$sudo apt-getinstall ant

2. git – git needs to be installed on our machine to clone the source code required to build the jar file from github. git can be installed with below command.

Shell

1

2

$sudo apt-getinstall git

Plugin creation:

Download the the required source code from git hub into our preferred location.

Here in the above ant jar command, -Dversion=2.3.0 property is provided to specify the version number of hadoop release. It is specific to hadoop-2.3.0 release. The same source files can be used for other releases as well by changing the version number in this parameter and providing appropriate hadoop’s home directory.

In this example, hadoop’s home directory is mentioned with

-Dhadoop.home=/usr/lib/hadoop/hadoop-2.3.0/ property. This can be changed as per your hadoop installation directory.

Also we have changed libraries.properties file in hadoop-eclipse-plugin/ivy/ directory to avoid the version mismatch errors.(required version files are not present in hadoop home directory).

For building eclipse-plugin for hadoop-2.3.0 release, the above source code and commands work pretty well. No changes are needed for hadoop-2.3.0. Changes will be needed accordingly only if we needed to generate plugin for other versions.

5. Now copy this plugin jar file from hadoop-eclipse-plugin/build/contrib/eclipse-plugin/hadoop-eclipse-plugin-2.3.0.jar to /opt/eclipse/plugins directory.

6. After restart of Eclipse, the Map/Reduce perspective will be available.

Maven plugin for Integration of Eclipse with Hadoop:

Prerequisites:

For this option, maven needs to be installed on our machine and this can be done with the below command if it is not installed already.

2. Install m2e plugin by navigating through Help –> Install New Software. As shown in below screen, enter http://download.eclipse.org/technology/m2e/releases into “Work with” box and select the plugin and click next button and complete the installation.

As discussed above, choosing the option 1 for creation of hadoop eclipse plugin will be easier than resolving the errors in option 2. So, we preferred using option 1 to create hadoop-eclipse-plugin-2.3.0 and copied into /opt/eclipse/plugins folder.

For Example Mapreduce program WordCount development under Eclipse IDE please refer the next post –> Sample Mapreduce Program In Eclipse.

10 thoughts on “Eclipse Configuration for Hadoop”

Hi,
I have compiled this for hadoop 2.4.1 and loaded the plugin into eclipse kepler. However, not able to add any locations to it. The UI that takes location name, MRV2 master host/port details does not appear at all. The behaviour is same with eclipse Luna.

Can you please let me know how did you get your hadoop-eclipse plugin, whether you have created as shown in Option 1 of the post or you get it from anywhere on the net. Because there will be version inconsistencies if you try to use the different versions hadoop and its eclips plugin.

Hi, There will be some missing jar files in your eclipse download and install, please find them based on the compilation error messages and copy them into eclipse/plugin folder. Especially check org.eclipse.jdt.ui.wizards are availables are not.

Yes, it has miscellaneous behaviour while adding DFS locations to it and even adding DFS locations to eclipse is not always preferred. So, Eclipse is better for coding and building jars at one place and then finally copying our jar file into datanode from which we will plan to submit the job.

There is some issue with the integration of DFS locations into Eclipse but this plugin works well for Mapreduce perspective. You can use it for construction of mapreduce programs and building jars. Once jars are created, these can be copied to any datanode from which we want to submit our job. ($hadoop jar my.jar Mainclass i/o o/p – command)