How to install Bigtop 0.8.0 Hadoop on CentOS 6 with Puppet

This page expects that you have already provisioned cluster of CentOS 6.=
x machines where you would like to deploy 0.8.0 release of Bigtop Hadoop di=
stribution using Puppet recipes from bigtop-deploy.

You can use this url when you will configure puppet config file (as show=
n in detail below), or you can go to Bigtop jenkins and downl=
oad all packages so that you can setup you own local repository. Anothe=
r option would be to build it yourself, but this topic is not covered in th=
is page.

Puppet deployment scrips are not packaged and you can get it from either=
:

source tarball so that you use =
puppet recipes which are part of 0.8.0 release

puppet recipes from current master branch of bigtop repository

While it's safer to use recipes from the release in theory, most people =
uses puppet scripts from git. The reasoning behind this is that while curre=
nt recipes changed a lot, it still can deploy Bigtop Hadoop from 0.8.0 rele=
ase and so it makes sense to learn and use the newest version from the star=
t.

Note: the only 0.8.0 Bigtop Hadoop component which can't be deployed wit=
h scripts from git is sqoop.

Installation of requirements

There are basically only two requirements:

install puppet on all machines of the cluster:

version 2.7+ if you are using recipes from the release

version 3.x if you are using scripts from git

have packages with java 1.6 available in the repos of your distro

For the CentOS 6.x machines, this means:

openjdk 1.6 which is shipped with the distro

puppet package (depends on which recipes you are going to use):

either puppet 2.7 from EPEL (so I need to enable EPEL first)

or puppet 3.x from the upstream

Note that:

You don't have to setup bigtop yum repository or install java - all the=
se steps and more are automated via puppet.

The only catch is that if you have java installed already on the machin=
es, you would need to check that correct java is activated via alternatives=
. Pupped wouldn't notice that another java version is activated. See detail=
s below.

T=
he deployment

Make sure that content of Bigtop 0.8.0 release tarball or bigtop repo (a=
s discussed above) is available in /opt/bigtop on all machines=
. You may like to use NFS share for this.

Now it's the time to configure puppet site.csv file (the on=
ly puppet file you need to touch). It defines cluster roles, hadoop compone=
nts installed and other details:

hadoop_head_node is a master (it runns eg. YARN Resource M=
anager), you need to specify fqdn there (otherwise no node will be configur=
ed as master).

bigtop_yumrepo_uri url of bigtop repo you will use, puppet=
will create yum repofile for it. I'm using local mirror here, but it shoul=
d be possible to use repo hosted on s3 (as shown in previous section).

components list of hadoop components to install. The examp=
le shows minimal list of components for you to be able execute mapreduce jo=
bs. See puppet recipes for list of all components.

jdk_package_name - name of the package with java, puppet w=
ill install it

Bigtop uses puppet in masterless mode, so you need to distribute new ver=
sion of site.csv and then run puppet locally on all machines.