We are sharing our experience about Apache Hadoop Installation in Linux based machines (Multi-node). Here we will also share our experience about different troubleshooting also and make update in future.

User creation and other configurations step -

We start by adding a dedicated Hadoop system user in each cluster.

$ sudo addgroup hadoop
$ sudo adduser –ingroup hadoop hduser

Next we configure the SSH (Secure Shell) on all the cluster to enable secure data communication.

user@node1:~$ su – hduser hduser@node1:~$ ssh-keygen -t rsa -P “”

The output will be something like the following:

Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa):
Created directory '/home/hduser/.ssh'.
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
The key fingerprint is:
9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2 hduser@ubuntu
.....

Next we need to enable SSH access to local machine with this newly created key:

Repeat the above steps in all the cluster nodes and test by executing the following statement

hduser@node1:~$ ssh localhost

This step is also needed to save local machine’s host key fingerprint to the hduser user’s known_hosts file.

Next we need to edit the /etc/hosts file in which we put the IPs and Name of each system in the cluster.

In our scenario we have one master (with IP 192.168.0.100) and one slave (with IP 192.168.0.101)

$ sudo vi /etc/hosts

and we put the values into the host file as key value pair.

192.168.0.100 master192.168.0.101 slave

Providing the SSH Access

The hduser user on the master node must be able to connect

to its own user account on the master via ssh master in this context not necessarily ssh localhost.

to the hduser account of the slave(s) via a password-less SSH login.

So we distribute the SSH public key of hduser@master to all its slave, (in our case we have only one slave. If you have more execute the following statement changing the machine name i.e. slave, slave1, slave2).

hduser@master:~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@slave

Try by connecting master to master and master to slave(s) and check if everything is fine.

Configuring Hadoop

Let us edit the conf/masters (only in the masters node)

and we enter master into the file.

Doing this we have told Hadoop that start Namenode and secondary NameNodes in our multi-node cluster in this machine.

The primary NameNode and the JobTracker will always be on the machine we run bin/start-dfs.sh and bin/start-mapred.sh.

Let us now edit the conf/slaves(only in the masters node) with

master slave

This means that, we try to run datanode process on master machine also – where the namenode is also running. We can leave master to act as slave if we have more machines as datanode at our disposal.

if we have more slaves, then to add one host per line like the following:

master slave slave2 slave3

etc….

Lets now edit two important files (in all the nodes in our cluster):

a. conf/core-site.xmlb. conf/core-hdfs.xml

a) conf/core-site.xml

We have to change the fs.default.parameter which specifies NameNode host and port. (In our case this is the master machine)

We have to change the dfs.replication parameter which specifies default block replication. It defines how many machines a single file should be replicated to before it becomes available. If we set this to a value higher than the number of available slave nodes (more precisely, the number of DataNodes), we will start seeing a lot of “(Zero targets found, forbidden1.size=1)” type errors in the log files.

The default value of dfs.replication is 3. However, as we have only two nodes available (in our scenario), so we set dfs.replication to 2.

Working material of this article is primarily gathered by Debopom Mitra, who is also a J2ee Programmer from Year 2012 associated with us. Piyas De helped him to learn hadoop and the areas of troubleshooting to set-up hadoop in multiple clusters and edited the article content finally.

Featured Posts

Previously we have posted the article for Single Page Application with Angular.js, Node.js and MongoDB (MongoJS Module). Current post is a proof of concept towards make a web application with Javascript based Web Server with CouchDB with Angular.js and Node.js. This time we have tried with CouchDB, by definition which is from Apache CouchDB site - Apache [...]

We are recently working to convert our client side javascript code to convert in Object Oriented approach. So we have gone through learning process of javascript OOP and written our previous article – Javascript in OOP Way – some interesting study and while learning the d3.js graph creation, we have posted D3.js in Angular.js Directive – [...]

We have came through D3.js coding for Graph generation in our projects. To me personally, this coding is somewhat complex for startup programmers. Recently I am going through learning Angular.js and Angular Directive. So I had a thought to introduce an angular directive which will wrap the complex functionality of D3.js. So here is the result [...]

Previously we have posted the communication for Developing Web Application with Node.js, Express.js and MongoDB (MongoJS module). Current post is a proof of concept towards make a web application with Javascript based Web Server. This technological stack is popularly known as MEAN Stack. To make this possible, we have selected – 1> Angular.js for client side [...]

We had an idea to add a Daily Task List as Google Chrome Extension with News Feed Reader related to daily interested news to us. So here is a Chrome Extension made with following integrated tools – 1> Angular.js to make the Task List and Task Adding Functionalities. 2> HTML5 Localstorage to keep the task [...]

This article is a continuation of our previous articles about Single Page Application with Angular.js, Node.js and MongoDB part 1 and part 2. In general, we appreciate unit testing in application, because at the time of delivery, we have much confidence about our piece of software regarding non-breaking of the code-base in production. There are [...]

We had posted article for Single Page Application with Angular.js, Node.js and MongoDB (MongoJS Module) – (Updated for Express 4) – Part 1 where we mainly discussed the node.js server side section of the article. In this post we will describe about the Angularjs client part of the application. As this article is a continuation of the [...]

In the previous articles we have went through many features in angular.js. If you observe them, you can see that in those articles we have never used server side scripting. All of them were based on client side scripting. Why we are in need of server side scripting? Every SPA framework out there relies on [...]

We had posted article for Developing Web Application with Node.js, Express.js and MongoDB (MongoJS module) . Here we have updated the application with Express 4, added testing for both Angular.js and node.js related methods. We will cover all the testing related works in our other posts, which will come eventually. So our application is now as [...]

Once you dive into Angular, creating custom directives is a daily chore and having good understanding of transclusion becomes imperative. To explain it in one sentence, transclusion consists of plucking out the content of a custom directive, processing it against right scope and then placing it at a marked position in the template of that [...]