Automated install of CDH5 Hadoop on your laptop with Ansible

Installing CDH5 from the tarball distribution is not a really difficult, but getting the pseudo-distributed configuration right is all but straightforward. And since there are a few bugs that need fixing and configuring that needs to be done I automated it.

Automating the steps

All I needed to do was write some Ansible configuration scripts to perform these steps. For now I automated the steps to download and install CDH5, Spark, Hive, Pig and Mahout. Any extra packages are left as an exercise to the reader. I welcome your pull requests.

Configuration

Ansible needs some information from the user about the directory to install the software into. I first tried to use ansible's vars_prompt module. this kind of works, but the scope of the variable is within the same yml file only. And I need it to be a global variable. After testing several of ansibles ways to provide variables I decided upon using a bash script to get the user's input and provide ansible with that information through the --extra-vars command line option.

Next to that we want to use ansible to run a playbook. This means that we need to have the ansible-playbook command available. We assume ansible-playbook is on the PATH and will work.

Getting the install scripts

Getting the install scripts is done by issuing a git clone command:

$ git clone git@github.com:krisgeus/ansible_local_cdh_hadoop.git

Install

Installing the software has become a single line command:

$ start-playbook.sh

The script will ask the user for a directory to install the software into. Then it will start to download the packages into the $HOME\.ansible-downloads directory. And it will unpack into the install directory the user provided.

In the install directory the script will create a bash_profile add-on to set the correct aliases.

$ source${INSTALL_DIR}/.bash_profile_hadoop

Testing Hadoop in local mode

$ switch_local_cdh5

Now all the familiar hadoop commands should work. There is no notion of HDFS other then your local filesystem so the hadoop fs -ls / command will show you the same output as ls /

The current version of the ansible scripts are set to install the CDH version 5.0.2 packages. When a new version becomes available this version is easily changed by updating the vars/common.yml Yaml file.

If you have created ansible files to add other packages I welcome you to send me a pull request.