AdSense

Monday, 10 February 2014

Solutions for issues faced during hadoop configuration

Solutions for issues faced during hadoop configuration - starting as
begineer for Hadoop use is not so straight forward. There are many steps
and issues that one has to overcome. I have done it and so putting this
up for everyone to refer. Step by step guide for Hadoop configuration
is also available here - STEP 1, STEP 2, STEP 3 and STEP 4.

Solutions for issues faced during hadoop configuration

1. Which Hadoop to fetch:
There are two flavors of hadoop - 1.x and 2.x.
The 1.x is the initial one while 2.x was a parallel version which had YARN engine in it. So, go for Hadoop 2.x version
You can find more about hadoop here

2. Which machine to use:
Initial options are Windows and Linux. Since SSH will be extensively
used, prefer a flavor of Linux for Hadoop. It will also eliminate the
need to licence each instance/node that you will create.
Prefer Ubuntu if you are a extensive Windows user since you will not
feel completely lost in the Unix like environment. Also, there is lot of
online help on Ubuntu.
Use this guide for downloading Ubuntu and installing it on VM

3. Actual machines or Virtual machines:
I guess this is pretty easy to decide. Virtual machines offcorse. Will
need atleast one actual machine with latest configurations and atleast
4GB RAM for VMs to run.

4. Which Virtualization environment:
There are many options but most popular will be Virtual Box by Oracle
and VMWare. Virtual box is free and open source. Support wise it is good
enough online so prefer Virtual Box.
You can find how to set up the box here

4. Which Java to use:
Most common Java versions for Linux based systems are OpenJDK; and there
is always Oracle JDK available. As per the hadoop docomentation, choose
a java version. It is best to go for Oracle JDK but an older and test
version of Java.

1. Java and Ubuntu - 32 bit or 64 bit
If your machine is latest one as 64 bit, you may be tempted to go fir a
64 bit version of OS as well as Java. But just don't go for it yet.

Hadoop native libraries are compiled for 32 bit and if you are using 64 bit OS, you may run into problems and errors such as:
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform...using builtin-java classes where applicable

2. Virtual Box - low graphics mode
Virtual box may run into error as Low graphics is On if you are using 32
bit Ubuntu. This is due to a missing guest plugin which comes with
Virtual box.
You will have to run the Linux Guest CD image and load it. For this, the
initial step is setting the Ubuntu to run kernel commands

sudo apt-get install dkms

then load the Guest addon CD

sudo mount /dev/cdrom /cdrom
sudo sh ./VBoxLinuxAdditions.run

3. Virtual Box - mouse pointer appears little above the point
This is due to a missing patch. You can have a look at it here.

For fixing this, download the VBoxGuest-linux.c.patch patch file from above link. Then run these commands on your Ubuntu virtual machine