Saturday, December 15, 2012

Having worked for quite some time on mobile technologies from past few days I have been on Big Data and have been trying out quite a few frameworks related to that.Its been a while since I blogged and thought of writing this step by step tutorial on installing hadoop in windows.

I definitely suggest that if you are taking big data seriously its better you setup hadoop on linux rather than windows ( in case if you are looking for production-like environment).

Apache Hadoop is an open-source framework which is used for distributed processing ,performing computations of large data sets on clusters by distributing computations to each of the node.This framework mainly comes with a hadoop kernel , ability to run distributed MapReduce jobs and a filesystem-HDFS.
There are many tutorials which help you install hadoop on windows but most of them have some issues .After referring few tutorials I am writing this to solve what is missed in other ones.
Since I said earlier that this tutorial is to install Hadoop on windows and the fact that hadoop contains lot of shell scripts to be executed we need a *nix shell for windows. Cygwin is one of them and the best as well.download from here.
Run the setup.exe as an administrator and after selecting the mirror for download remember to select ssh package for installation.refer image below :

once done open the cygwin terminal as an administrator.

step 1: configure ssh using the command ssh-host-config

here i have configured a username and password

once you have given the password you will recv the confirmation as in the above image..
so now once ssh is configured you can test it using the command ssh localhost

now generate a key to configure the authentication mechanisms of ssh using the command ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
so that you need not give it everytime u invoke.

and once done copy it using the command cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys as show above.

now time to download the hadoop framework , i used this mirror and select the hadoop-1.1.1 release. from the link http://mirror.catn.com/pub/apache/hadoop/common/hadoop-1.1.1/ download the hadoop-1.1.1-bin.tar.gz

( for list of other mirrors you can always check http://www.apache.org/dyn/closer.cgi/hadoop/common/ )

now extract the hadoop-1.1.1-bin.tar.gz file to c:\cygwin\usr\local and rename c:\cygwin\usr\local\hadoop-1.1.1 folder to c:\cygwin\usr\local\hadoop
now go to the path C:\cygwin\usr\local\hadoop\conf and open the hadoop-env.sh
go to line9 and u will find an entry for export JAVA_HOME=something
change it toexport JAVA_HOME=/cygdrive/c/Program\ Files/Java/jdk1.6.0_11
and do not forget to uncomment the line ( remove the # from the beginning of the line )

if you want to get rid of the escape sequence hassle for the space in "program files" you can always install jdk in c:\java\jre or something or use this /cygdrive/c/Program\ Files/Java/jdk1.6.0_11 .it worked for me !!

below are few snap shots of errors which you might get if you dont configure your JAVA_HOME properly.

( JAVA_HOME errors )

once your JAVA_HOME is configured ,

open the C:\cygwin\usr\local\hadoop\conf\hdfs-site.xml to configure the hdfs .....add the property tags between the configuration tags so that your file looks like below

and open the C:\cygwin\usr\local\hadoop\conf\mapred-site.xml to configure the mapreduce service :

once this is done we can now format the hdfs filesystem using the commandbin/hadoop namenode -format

and start the dfs subsystems using the commandbin/start-dfs.sh

and you can see hadoop running in http://localhost:yourportnumber/dfshealth.jsp

now we have successfully installed hadoop on windows.I will most more as on when I learn and explore.