Create a Cloudera + Hadoop Cluster in Skytap Cloud

As you’ve likely heard, Skytap Cloud now offers a set of pre-configured templates that support Cloudera® and Apache™ Hadoop®. So for anyone who wants to try out the cluster—from small to large—it can now be easily accomplished in Skytap Cloud.

The goal of this tutorial will be to spin up a 10-node cluster in Skytap Cloud. To begin, let’s talk about the two new cluster templates. The first is the “CDH4 cluster”: a 2-node cluster template that supports Hadoop. It includes 2 nodes and a management node/server. The second is called the “Cloudera CDH4 Hadoop” host template. This second template is not intended to run by itself in a configuration—rather, it contains a host VM that is ready to become another Hadoop node in the CDH4 cluster template-based configuration.

Once all the VMs start (in about 90 seconds), you’ll have a working cluster with all the normal services (HDFS, Apache Hbase™, Hue, Hadoop Map/Reduce, Apache Oozie, and Apache ZooKeeper™). This cluster is a 2-node cluster (host 1 and host 2) with a management server (manager). While a 2-node cluster is enough to get going with Cloudera and Hadoop, it’s possible in Skytap Cloud to ratchet this up to a cluster of any size. For this blog post, we’ll expand this into a 10-node cluster.

The Cloudera Manager is hosted on the manager VM on port 7180. However, none of the VMs in this configuration have a web browser, so we need a way to interact with Cloudera Manager. This can be accomplished in a few different ways: 1) Use a Skytap published service, 2) Use ICNR (inter-configuration network routing) with a configuration that has a graphical web browser, 3) Use a public IP, or 4) Use Skytap VPN to connect your local network to this configuration. For production use, VPN is probably your best bet, but for this blog post I’m going to use a published service. To add the published service, do the following:

Click Settings.

Click VM Settings.

Select manager in the Select a VM menu.

Under Network Adapters choose Add Published Service.

In the dropdown, select By Port:

Enter 7180 in the text box.

Click Add Published Service.

Expand the Show Published Services link and note the url and port number. Example – services.cloud.skytap.com:25693

Now you can put that URL into your local web browser and get the Cloudera Manager (Free Edition) login page. You should then be able to use the username of ‘admin’ and the password found in the credentials tab of the manager VM settings for the ‘admin’ account.

Now that everything is running, the Cloudera Manager is accessible, and I’m logged in, it’s time to expand our cluster from 2 nodes to 10 nodes. To do that:

Click Back to configuration to get back to the 2-node configuration.

Click Add VMs.

In the search box, type hadoop.

Select Cloudera CDH4 Hadoop host.

Click Add.

Redo steps 2-5 another 7 times (to take our host count up to 10).

Notice that although the titles for all of these new nodes are shown as ‘host-n’ their network names have been automatically incremented.

Optionally, to make the configuration easier to view, I can rename all node hosts from host-n to their corresponding host-x number.

Click Run.

After about 90 seconds, everything will start up and we’ll have all the hosts we need for our 10-node cluster. It’s now time to go back to Cloudera Manager to finish setting up the nodes.

Go back into Cloudera Manager. (Note: You may need to log in again.)

Click Hosts at the top of the web page.

Click Add Hosts.

Click Continue.

In the search form, type host-[3-10].hadoop.local

This will search DNS to ping and find all the new host nodes.

Leave all hosts selected and click Install CDH on selected hosts.

Keep all defaults on the next page, then click Continue.

Leave radio button defaults and use the root password found in the credentials tab of any of the host-n VMs. (Note: They all have the same password.)

Click Start Installation.

Wait for all of the nodes to finish installing. (Note: It could take 10-15 minutes for everything to install.)

If for any reason the web page times out, or something just doesn’t seem right, you can redo steps 2-9 to validate that all the software was installed properly.

When the installations are done, click Continue.

The UI will now inspect all hosts.

All hosts should resolve as green. (Note: It is OK if you have one yellow relating to mismatched versions.)

If all looks good, click Continue.

If not, run steps 2-11 again.

Click Continue again to finish the Wizard.

It should forward you to the hosts page where all your hosts (1 through 10 and manager) should show up in good health.

At this point, we have a 10-node Cloudera and Hadoop cluster, but we want to put these new nodes to work just like nodes 1 and 2. So, to accomplish that:

Click Cloudera Manager (Free Edition) at the top left of the web UI. This will bring you back to the services page.

Click the upside-down triangle next to each server, then click Instances.

Click Add.

In the Add Role Instances view, check the same boxes for hosts 3-10 that are checked for hosts 1 and 2.

In the case of HDFS, this would be the ‘region server’ column.

Click Continue.

Click Accept.

Wait for the commands to complete.

Repeat steps 1-6 for all the different services.

Note: Some services may not utilize nodes 1 and 2, in which case you can safely leave out nodes 3-10 as well. For example, the Hue service is only hosted on the manager VM and there are no settings for nodes 1 and 2. If you would like to make manager fault tolerant, you will want to follow all the steps in this blog post to create a second manager node and that is identical to the existing manager node.

And there you have it—a 10-node CDH4 cluster.

As always, if you have any questions, don’t hesitate to leave comments below, or get in touch with Skytap Supportor your Skytap representative.