How to Setup Single Node Hadoop Cluster Using Docker

In this article, I will show you how to setup a single node hadoop cluster using Docker. Before I start with the setup, let me briefly remind you what Docker and Hadoop are.

Docker is a software containerization platform where you package your application with all the libraries, dependencies, environments in a container. This container is called docker container. With Docker, you can build, ship, run an application (software) on the fly.

For example, if you want to test an application on an ubuntu system, you need not setup a complete operating system on your laptop/desktop or start a virtual machine with ubuntu os. That will take a lot of time and space. You can simply start an ubuntu docker container which will have the environment, libraries you need to test your application on the fly.

Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of computers. These days it is one of the most important technology in the industry. Now to use Hadoop to store and analyse huge amount of data, you need to setup a hadoop cluster. If you have done setting up of hadoop cluster before, you know its not a easy task.

What if I say, setting up a hadoop cluster is hardly 5-10 minutes job, will you believe me? I guess not!

Here is where Docker comes into picture, and using docker you can setup a hadoop cluster in no time.

Benefits of using Docker for setting up a hadoop cluster

Installs and runs hadoop in no time.

Uses the resources as per need, so no wastage of resource.

Easily scalable, best suited for testing environments in hadoop cluster.

No worries of hadoop dependencies, libraries etc. , docker will take care of it.

Setup a Single Node Hadoop Cluster Using Docker

So let us see now how to setup a single node hadoop cluster using Docker. I am using Ubuntu 16.04 system and docker is already installed and configured on my system.

Before I setup a single node hadoop cluster using docker, let me just run simple example to see that docker is working correctly on my system.

This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:

The Docker client contacted the Docker daemon.

The Docker daemon pulled the "hello-world" image from the Docker Hub.

The Docker daemon created a new container from that image which runs the

executable that produces the output you are currently reading.

The Docker daemon streamed that output to the Docker client, which sent it

to your terminal.

To try something more ambitious, you can run an Ubuntu container with:

$ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker Hub account:

https://hub.docker.com

For more examples and ideas, visit:

https://docs.docker.com/engine/userguide/

So now you know that docker is working properly. Let us go ahead and install hadoop in a docker container. To do so, we need a hadoop docker image. The below command will get me a hadoop-2.7.1 docker image.

Go back to you docker container terminal, and run below command to get the ip address of the docker container.

bash-4.1# ifconfig

eth0 Link encap:Ethernet HWaddr 02:42:AC:11:00:02

inet addr:172.17.0.2 Bcast:0.0.0.0 Mask:255.255.0.0

inet6 addr: fe80::42:acff:fe11:2/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:56 errors:0 dropped:0 overruns:0 frame:0

TX packets:31 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:6803 (6.6 KiB) TX bytes:2298 (2.2 KiB)

lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

inet6 addr: ::1/128 Scope:Host

UP LOOPBACK RUNNING MTU:65536 Metric:1

RX packets:28648 errors:0 dropped:0 overruns:0 frame:0

TX packets:28648 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1

RX bytes:4079499 (3.8 MiB) TX bytes:4079499 (3.8 MiB)

bash-4.1#

After running jps command, we already saw that all the services were running, let us now check the namenode ui on the browser. Go to 172.17.0.2 :50070 in the browser, and there you go, namenode ui of a hadoop cluster running in a docker container.

Just to make sure that the hadoop cluster is working fine, let us run a hadoop mapreduce example in the docker container.

Conclusion

We successfully ran a single node hadoop cluster using docker. You saw, we had to do nothing to setup the hadoop cluster, and within no time we had an up and running hadoop cluster. As mentioned before, Docker is mostly used for testing environments, so if you want to test an hadoop application, setting up hadoop cluster in a docker container and testing the hadoop application is the easiest and fastest way.

About Linoxide

Related Posts

This article covers how to setup wordpress with docker using docker-compose quickly. WordPress requires LAMP or LEMP stack but the official wordpress docker image contains PHP and Apache, leaving us to install only MySQL docker image. But for managing MySQL through [...]

Ceph is one of the most exciting open source storage technologies to come out in recent years. Scalable to exabytes, and extendable to multiple datacenters, the Ceph developers have made it easy for System Administrators and Infrastructure Architects to deploy [...]

Nginx announced the release of the first beta of Unit - Nginx Unit, a dynamic web application server, designed to run applications in multiple languages. It is an application server that supports Python, PHP and Go, with support coming for Java, Node.js [...]