If it helps someone.

Menu

A simple way to install Hadoop and HBase

I was not able to install hadoop without spending at least two days and re-install of OS few times. It may not be the case always. I just wanted to run a few MR jobs and access HBase and HDFS. Instructions available online for hadoop hbase installation were mostly about installing hadoop on a production or clustered environment. I have never found a way to install all these three components for testing purpose. I was using Cloudera on ubuntu. Instructions may be useful for a production environment. But was never easy for me. So, I decided to download and install Hadoop and Hbase and configure them. Here is what I have done.

Note: This a old article, copied manually when moving to new hosting provider.

1. Decide where you want to keep your haddop installation.

I have user created by name hdtest. For testing purpose, I am keeping all the hadoop components in /home/hdtest/installs/. I also have to setup ssh. Connect to user hdtest (or the user you want to use for hadoop) and run the following commands.

Edit the configuration files for hadoop. As this is a basic setup we just need to edit the file core-site.xml found in $HADOOP_HOME/etc/hadoop. Add the following content between the configration tags.

hadoop.tmp.dir/home/hdtest/hadoop/tmpA base for other temporary directories.fs.default.namehdfs://localhost:54310The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation.

4. Update /etc/hosts file

Sometimes hadoop does not work because if some unexpected entry in /etc/hosts file. Check if you have the following entry in /etc/hosts file

127.0.1.1 myhostname

If you see one such line, change the first part to 127.0.0.1 instead of 127.0.1.1.

We need to create a folder /home/hdtest/hadoop/tmp/

$mkdir /home/hdtest/hadoop/tmp/

Next step is to run the format command.

$hadoop namenode -format

5. Start the hadoop processes

Run the following command.

$start-yarn.sh

Once the command is run succesfully, let us verify if everything is working fine. You should run the following command see the result.

$jps

Expected output

9818 ResourceManager
10116 Jps
9933 NodeManager

The number on left side may be different. If you an error message saying jps is not found, it means, you don’t have PATH set to access your JDK properly. jps is an executable in JDK installation.

Next step is to run the command start-dfs.sh, which will start namenode, datanode and secondary namenode.

$start-dfs.sh

Once everything is run, check the status using jps command. The expected output is

Hadoop will not be started automatically on reboot. So, you have run the command start-yarn.sh and start-dfs.sh, on every reboot.
We will create two folders in hdfs. They will be used in hbase configuration. To create folders on hdfs,hadoop provides commands that follows regular file system command.

You can also verify the above settings by opening the hadoop URL http://localhost:50070/ on your browser. You can also see the files in hdfs file system. Click the Browse File System link to see the folders sample and zookeeper created.

6. Extract and setup HBase

Similar to what you have done for Hadoop, extract HBase and rename the folder. Run the following command from the location where you have saved hbase files

Note: If you are using Hbase 0.96 version
Hbase 0.96 version uses older version of hadoop libraries. They are not compatible with Hadoop-2.2, we have downloaded. It uses beta version of hadoop common jar files. If you continue using, you will get errors like,org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcServerException): Unknown out of band call #xxxxxx
in hbase log files. To fix the problem, we need to remove all beta jar files from hbase and use the correct set of files. To start with we remove all the files which are not compatible

$cd /home/hdtest/hbase/lib
$rm -rf hadoop*.jar

Once the files are removed, copy the correct files from hadoop installation

Update the Hbase configuration file at hbase/conf/hbase-site.conf. Add the following content between configuration tag.

hbase.rootdirhdfs://127.0.0.1:54310/samplehbase.zookeeper.property.dataDirhdfs://127.0.0.1:54310/zookeeperhbase.zookeeper.property.clientPort2181Property from ZooKeeper's config zoo.cfg.
The port at which the clients will connect.

If you want to use a local file system instead of HDFS, replace the URL with file:///your/preferred/path/. Now lets start the HBase instance bu running teh following command.

$cd /home/hdtest/hbase/bin
$./start-hbase.sh

Hbase will not be started automatically in this configuration. You have to run this command again on your next reboot. Once this command is executed and you are back in shell prompt, you can check the log files if something is wrong. You can find the log files under /home/hdtest/hbase/logs folder. If you don’t see any issue, lets try to use hbase. Open the hbase shell and run a simple list command.