How to install Apache Hadoop 2.6.0 in Ubuntu (Single node setup)

Since we know it’s the time for parallel computation to tackle large amount of dataset, we will require Apache Hadoop (here the name is derived from Elephant). As Apache Hadoop is the top most contributed Apache project, more and more features are implemented as well as more and more bugs are getting fixed in new coming versions. So, by considering this situation we need to follow slightly different steps than previous version. Here, I am trying to covering full fledge Hadoop installation steps for BigData enthusiasts who wish to install Apache Hadoop on their Ubuntu – Linux machine.

This blog post teaches how to install Apache Hadoop 2.6 over Ubuntu machine. (You can follow the same blog post for installation over Ubuntu server machine). To get started with Apache Hadoop install, I recommend that you should have knowledge of basic Linux commands which will be helpful in normal operations while installation task.

Prerequisites

Installing Oracle Java 8

Apache Hadoop is java framework, we need java installed on our machine to get it run over operating system. Hadoop supports all java version greater than 5 (i.e. Java 1.5). So, Here you can also try Java 6, 7 instead of Java 8.

Installing SSH
SSH (“Secure SHell”) is a protocol for securely accessing one machine from another. Hadoop uses SSH for accessing another slaves nodes to start and manage all HDFS and MapReduce daemons.

vignesh@pingax:~$ sudo apt-get install openssh-server

Now, we have installed SSH over Ubuntu machine so we will be able to connect with this machine as well as from this machine remotely.

Configuring SSH
Once you installed SSH on your machine, you can connect to other machine or allow other machines to connect with this machine. However we have this single machine, we can try connecting with this same machine by SSH. To do this, we need to copy generated RSA key (i.e. id_rsa.pub) pairs to authorized_keys folder of SSH installation of this machine by the following command,

In case you are configuring SSH for another machine (i.e. from master node to slave node), you have to update the above command by adding the hostname of slave machine.

Disabling IPv6
Since Hadoop doesn’t work on IPv6, we should disable it. One of another reason is also that it has been developed and tested on IPv4 stacks. Hadoop nodes will be able to communicate if we are having IPv4 cluster. (Once you have disabled IPV6 on your machine, you need to reboot your machine in order to check its effect. In case if you don’t know how to reboot with command use sudo reboot )

For getting your IPv6 disable in your Linux machine, you need to update /etc/sysctl.conf by adding following line of codes at end of the file,

Tip:- You can use nano, gedit, and Vi editor for updating all text files for this configuration purpose.

Installation Steps

Download latest Apache Hadoop source from Apache mirrors
First you need to download Apache Hadoop 2.6.0 (i.e. hadoop-2.6.0.tar.gz)or latest version source from Apache download Mirrors. You can also try stable hadoop to get all latest features as well as recent bugs solved with Hadoop source. Choose location where you want to place all your hadoop installation, I have chosen /usr/local/hadoop

Instead both of these above command you can also use start-all.sh, but its now deprecated so its not recommended to be used for better Hadoop operations.

Track/Monitor/Verify

Verify Hadoop daemons:

hduser@pingax: jps

Monitor Hadoop ResourseManage and Hadoop NameNode

If you wish to track Hadoop MapReduce as well as HDFS, you can try exploring Hadoop web view of ResourceManager and NameNode which are usually used by hadoop administrators. Open your default browser and visit to the following links.

If you are getting output as shown in the above snapshot then Congratulations! You have successfully installed Apache Hadoop in your Ubuntu and if not then post your error messages in comments. We will be happy to help you. Happy Hadooping.!!