About The Author

Sebastien Goasguen is an Apache CloudStack committer and member of the CloudStack Project Management Committee (PMC). His day job is to be a Senior Open Source Solutions Architect for the Open Source Business Office at Citrix. He will never call himself an expert or a developer but is a decent Python programmer. He is currently active in Apache Libcloud and SaltStack salt-cloud projects to bring better support for CloudStack. He blogs regularly about cloud technologies and spends lots of time testing and writing about his experiences. Prior to working actively on CloudStack he had a life as an academic, he authored over seventy international publications on grid computing, high performance computing, electromagnetics, nanoelectronics and of course cloud computing. He also taught courses on distributed computing, network programming, ethical hacking and cloud.

Introduction

Clients and high level Wrappers are critical to the ease of use of any API, even more so Cloud APIs. In this book we present the basics of the CloudStack API and introduce some low level clients before diving into more advanced wrappers.
The first chapter is dedicated to clients and the second chapter to wrappers or what I considered to be high level tools built on top of a CloudStack client.

In the first chapter, we start by illustrating how to sign requests with the native API -for the sake of completeness- and
because it is a very nice exercise for beginners. We then introduce CloudMonkey the CloudStack CLI and shell which boasts a 100% coverage of
the API. Then jclouds is discussed. While jclouds is a java library, it can also be used as a cli or interactive shell, we present jclouds-cli to contrast it to
CloudMonkey and introduce jclouds. Apache libcloud is a Python module that provides a common API on top of many Cloud providers API, once installed, a developer can use libcloud to talk to multiple cloud providers and cloud APIs, it serves a similar role as jclouds but in Python. Finally, we present Boto, the well-known Python Amazon Web Service interface, and show how it can be used with a CloudStack cloud running the AWS interface.

In the second chapter we introduce several high level wrappers for configuration management and automated provisioning.
The presentation of these wrappers aim to answer the question "I have a cloud now what ?". Starting and stopping virtual machines is the core functionality of a cloud,
but it empowers users to do much more. Automation is the key of today's IT infrastructure. The wrappers presented here show you how you can automate configuration management and automate provisioning of infrastructures that lie within your cloud. We introduce Salt-cloud for Saltstack, a Python alternative to the well known Chef and Puppet systems. We then introduce the knife CloudStack plugin for Chef and show you how easy it is to deploy machines in a cloud and configure them. We finish with another Apache project based on jclouds: Whirr. Apache Whirr simplifies the on-demand provisioning of clusters of virtual machine instances, hence it allows you to easily provision big data infrastructure on-demand, whether you need a HADOOP cluster, an Elasticsearch cluster or even a Cassandra cluster.

The CloudStack API

All functionalities of the CloudStack data center orchestrator are exposed
via an API server. Github currently has over twenty clients for this
API, in various languages. In this section we introduce this API and the
signing mechanism. The follow on sections will introduce clients that
already contain a signing method. The signing process is only
highlighted for completeness.

Basics of the API

The CloudStack API is a query based API using http which returns results in XML or JSON. It is used to implement the default web UI. This API is not a standard like OGF OCCI or DMTF CIMI but is easy to learn. A mapping exists between the AWS API and the CloudStack API as will be seen in the next section. Recently a Google Compute Engine interface was also developed that maps the GCE REST API to the CloudStack API described here. The API docs are a good start to learn the extent of the API. Multiple clients exist on github to use this API, you should be able to find one in your favourite language. The reference documentation for the API and changes that might occur from version to version is available on-line. This short section is aimed at providing a quick summary to give you a base understanding of how to use this API. As a quick start, a good way to explore the API is to navigate the dashboard with a firebug console (or similar developer console) to study the queries.

In a succinct statement, the CloudStack query API can be used via http GET requests made against your cloud endpoint (e.g http://localhost:8080/client/api). The API name is passed using the command key and the various parameters for this API call are passed as key value pairs. The request is signed using the secret key of the user making the call. Some calls are synchronous while some are asynchronous, this is documented in the API docs. Asynchronous calls return a jobid, the status and result of a job can be queried with the queryAsyncJobResult call. Let's get started and give an example of calling the listUsers API in Python.

First you will need to generate keys to make requests. Going through the dashboard, go under Accounts select the appropriate account then click on Show Users select the intended user and generate keys using the Generate Keys icon. You will see an API Key and Secret Key field being generated. The keys will be of the form:

Open a Python shell and import the basic modules necessary to make the request. Do note that this request could be made many different ways, this is just a low level example. The urllib* modules are used to make the http request and do url encoding. The hashlib module gives us the sha1 hash function. It is used to generate the hmac (Keyed Hashing for Message Authentication) using the secretkey. The result is encoded using the base64 module.

Define the endpoint of the Cloud, the command that you want to execute, the type of the response (i.e XML or JSON) and the keys of the user. Note that we do not put the secretkey in our request dictionary because it is only used to compute the hmac.

Compute the signature with hmac, do a 64 bit encoding and a url encoding, the string used for the signature is similar to the base request string shown above but the keys/values are lower cased and joined in a sorted order

All the clients that you will find on github will implement this signature technique, you should not have to do it by hand. Now that you have explored the API through the UI and that you understand how to make low level calls, pick your favourite client or use CloudMonkey. CloudMonkey is a sub-project of Apache CloudStack and gives operators/developers the ability to use any of the API methods. It has nice auto-completion, history and help features as well as an API discovery mechanism since 4.2.

CloudMonkey

CloudMonkey is the CloudStack Command Line Interface (CLI). It is written
in Python. CloudMonkey can be used both as an interactive shell and as a
command line tool which simplifies CloudStack configuration and management.
It can be used with CloudStack 4.0-incubating and above.

Installing CloudMonkey

CloudMonkey is dependent on readline, pygments, prettytable, when
installing from source you will need to resolve those dependencies.
Using the cheese shop, the dependencies will be automatically installed.

There are two ways to get CloudMonkey. Via the official CloudStack source
releases or via a community maintained distribution at the cheese
shop. CloudMonkey now lives within its own repository but it used to be part of the CloudStack release. Developers could get
it directly from the CloudStack git repository in tools/cli/. Now, it is better to use the CloudMonkey specific repository.

Configuration

To configure CloudMonkey you can edit the ~/.cloudmonkey/config file in
the user's home directory as shown below. The values can also be set
interactively at the cloudmonkey prompt. Logs are kept in
~/.cloudmonkey/log, and history is stored in ~/.cloudmonkey/history.
Discovered apis are listed in ~/.cloudmonkey/cache. Only the log and
history files can be custom paths and can be configured by setting
appropriate file paths in ~/.cloudmonkey/config

You can use CloudMonkey to interact with a local cloud, and even with a
remote public cloud. You just need to set the host value properly and
obtain the keys from the cloud administrator.

API Discovery

Note

In CloudStack 4.0.* releases, the list of api calls available will be
pre-cached, while starting with CloudStack 4.1 releases and above an API
discovery service is enabled. CloudMonkey will discover automatically
the api calls available on the management server. The sync command in
CloudMonkey pulls a list of apis which are accessible to your user
role. This allows CloudMonkey to be adaptable to
changes in management server, so in case the sysadmin enables a plugin such
as Nicira NVP for that user role, the users can get those changes.

To discover the APIs available do:

> sync
324 APIs discovered and cached

Tabular Output

The number of key/value pairs returned by the api calls can be large
resulting in a very long output. To enable easier viewing of the output,
a tabular formatting can be setup. You may enable tabular listing and
even choose set of column fields, this allows you to create your own
field using the filter param which takes in comma separated argument. If
argument has a space, put them under double quotes. The create table
will have the same sequence of field filters provided

To find out the required parameters value, using a debugger console on
the CloudStack UI might be very useful. For instance using Firebug on
Firefox, you can navigate the UI and check the parameters values for
each call you are making as you navigate the UI.

Starting a Virtual Machine instance with CloudMonkey

To start a virtual machine instance we will use the deploy
virtualmachine call.

The ids that you will use will differ from this example. Make sure
you use the ones that corresponds to your CloudStack cloud.

Scripting with CloudMonkey

All previous examples use CloudMonkey via the interactive shell, however
it can be used as a straightfoward CLI, passing the commands to the
cloudmonkey command like shown below.

$cloudmonkey list users

As such it can be used in shell scripts, it can received commands via
stdin and its output can be parsed like any other unix commands as
mentioned before.

jClouds CLI

jclouds is a Java wrapper for many Cloud Providers APIs, it used in a
large number of Cloud application to access providers that do not offer
a standard APIs. jclouds-cli is the command line interface to jclouds
and in CloudStack terminology could be seen as an equivalent to
CloudMonkey.

I edited the output of jclouds-cli to gain some space, there a lot
more providers available

Using jclouds CLI

The CloudStack API driver is not installed by default. Install it with:

jclouds> features:install jclouds-api-cloudstack

For now we will only test the virtual machine management functionality.
Pretty basic but that's what we want to do to get a feel for
jclouds-cli. If you have set your endpoint and keys properly, you should
be able to list the location of your cloud like so:

We need to define the name of a group and give the number of instance
that we want to start. Plus the hardware and image id. In terms of
hardware, we are going to use the smallest possible hardware and for image we give a uuid from the previous list.

With this short intro, you are well on your way to using jclouds-cli.
Check out the interactive shell, the blobstore and the chef facility to automate VM configuration. Remember that jclouds is also and actually foremost a java library that you can use to write other applications.

Apache Libcloud

There are many tools available to interface with the CloudStack API, we just saw jClouds. Apache
Libcloud is another one, but this time Python based. In this section we provide a basic example of
how to use Libcloud with CloudStack. It assumes that you have access to a
CloudStack endpoint and that you have the API access key and secret key of
a user.

Installation

To install Libcloud refer to the libcloud
website. If you are familiar with Pypi
simply do:

Then, using your keys and endpoint, create a connection object. Note
that this is a local test and thus not secured. If you use a CloudStack
public cloud, make sure to use SSL properly (i.e secure=True).

The create_node method will take an instance name, a template and an
instance type as arguments. It will return an instance of a
CloudStackNode that has additional extensions methods, such as
ex_stop and ex_start.

Management of security groups was also added. Below we show how to list,
create and delete security groups. As well as add an ingree rule to open
port 22 to the world. Both keypair and security groups are key for
access to a CloudStack Basic zone like Exoscale.

Development of the CloudStack driver in Libcloud is very active, there is also support for advanced zone via calls to do SourceNAT and StaticNAT.

Multiple Clouds

One of the interesting use cases of Libcloud is that you can use
multiple Cloud Providers, such as AWS, Rackspace, OpenNebula, vCloud and
so on. You can then create Driver instances to each of these clouds and
create your own multi cloud application. In the example below we
instantiate to libcloud CloudStack driver, one on
Exoscale and the other one on
Ikoula.

In the example above, I set my access and secret keys as well as the
endpoints as environment variable. Also note the libcloud security
module and the VERIFY_SSL_CERT. In the case of iKoula the SSL
certificate used was not verifiable by the CERTS that libcloud checks.
Especially if you use a self-signed SSL certificate for testing, you
might have to disable this check as well.

From this basic setup you can imagine how you would write an application
that would manage instances in different Cloud Providers. Providing more
resiliency to your overall infrastructure.

Python Boto

There are many tools available to interface with a AWS compatible API.
In this section we provide a short example that users of CloudStack can
build upon using the AWS interface to CloudStack.

Boto Examples

Boto is one of them. It is a Python package available at
https://github.com/boto/boto. In this section we provide two examples of
Python scripts that use Boto and have been tested with the CloudStack AWS
API Interface.

First is an EC2 example. Replace the Access and Secret Keys with your
own and update the endpoint.

With boto you can also interact with other AWS services like S3. CloudStack has an S3 tech preview but it
is backed by a standard NFS server and therefore is not a true scalable distributed block store. To provide an S3
service in your Cloud I recommend to use other software like RiakCS, Ceph radosgw or Glusterfs S3 interface. These
systems handle large scale, chunking and replication.

Wrappers

In this paragraph we introduce several CloudStack wrappers. These tools
are using client libraries presented in the previous chapter (or their own built-in request mechanisms) and add
additional functionality that involve some high-level orchestration. For
instance knife-cloudstack uses the power of
Chef, the configuration management system, to
seamlessly bootstrap instances running in a CloudStack cloud. Apache
Whirr uses
jclouds to boostrap
Hadoop clusters in the cloud and SaltStack does configuration management in the Cloud using Apache libcloud.

Knife CloudStack

Knife is a command line utility for Chef, the configuration management system from OpsCode.

Install, Configure and Feel

The Knife family of tools are drivers that automate the provisioning and
configuration of machines in the Cloud. Knife-cloudstack is a CloudStack
plugin for knife. Written in ruby it is used by the Chef community. To
install Knife-CloudStack you can simply install the gem or get it from
github:

gem install knife-cloudstack

If successful the knife command should now be in your path. Issue
knife at the prompt and see the various options and sub-commands
available.

If you want to use the version on github simply clone it:

git clone https://github.com/CloudStack-extras/knife-cloudstack.git

If you clone the git repo and do changes to the code, you will want to
build and install a new gem. As an example, in the directory where you
cloned the knife-cloudstack repo do:

With the endpoint and credentials configured as well as knife-cloudstack
installed, you should be able to issue your first command. Remember that
this is simply sending a CloudStack API call to your CloudStack based Cloud
provider. Later in the section we will see how to do more advanced
things with knife-cloudstack. For example, to list the service offerings
(i.e instance types) available on the iKoula Cloud, do:

If you only have user privileges on the Cloud you are using, as
opposed to admin privileges, do note that some commands won't be
available to you. For instance on the Cloud I am using where I am a
standard user I cannot access any of the infrastructure type commands
like:

$ knife cs pod list
Error 432: Your account does not have the right to execute this command or the command does not exist.

Similarly to CloudMonkey, you can pass a list of fields to output. To
find the potential fields enter the --fieldlist option at the end of
the command. You can then pick the fields that you want to output by
passing a comma separated list to the --fields option like so:

Knife will automatically allocate a Public IP address and associate it
with your running instance. If you additionally pass some port forwarding
rules and firewall rules it will set those up. You need to specify an
instance type, from the list returned by knife cs service list as well
as a template, from the list returned by knife cs template list. The
--no-boostrap option will tell knife to not install chef on the
deployed instance. Syntax for the port forwarding and firewall rules are
explained on the knife
cloudstack
website. Here is an example on the iKoula cloud
in France:

Bootstrapping Instances with Hosted-Chef

Knife is taking it's full potential when used to bootstrap Chef and use
it for configuration management of the instances. To get started with
Chef, the easiest is to use Hosted
Chef. There is some great
documentation on
how to do it. The
basic concept is that you will download or create cookbooks locally and
publish them to your own hosted Chef server.

Using Knife with Hosted-Chef

With your hosted Chef account created and your local chef-repo
setup, you can start instances on your Cloud and specify the cookbooks
to use to configure those instances. The boostrapping process will fetch
those cookbooks and configure the node. Below is an example that does
so, it uses the exoscale cloud which runs on
CloudStack. This cloud is enabled as a Basic zone and uses ssh keypairs
and security groups for access.

Chef will then configure the machine based on the cookbook passed in the
--run-list option, here I setup a simple web server. Note the keypair
that I used and the security group. I also specify --no-public-ip
which disables the IP address allocation and association. This is
specific to the setup of exoscale which automatically uses a public IP
address for the instances.

Note

The latest version of knife-cloudstack allows you to manage keypairs
and securitygroups. For instance listing, creation and deletion of
keypairs is possible, as well as listing of securitygroups:

When using a CloudStack based cloud in an Advanced zone setting, knife
can automatically allocate and associate an IP address. To illustrate
this slightly different example I use iKoula a
french Cloud Provider which uses CloudStack. I edit my knife.rb file to
setup a different endpoint and the different API and secret keys. I
remove the keypair, security group and public ip option and I do not
specify an identity file as I will retrieve the ssh password with the
--cloudstack-password option. The example is as follows:

You will want to review the security implications of doing the
bootstrap as root and using the default password to do so.

In Advanced Zone, your cloud provider may also have decided to block
all egress traffic to the public internet, which means that contacting
the hosted Chef server would fail. To configure the egress rules
properly, CloudMonkey can be used. List the networks to find the id of
your guest network, then create an egress firewall rule. Review the
CloudMonkey section to find the proper API calls and their arguments.

Salt

Salt is a configuration management system
written in Python. It can be seen as an alternative to Chef and Puppet.
Its concept is similar with a master node holding states called salt
states (SLS) and minions that get their configuration from the master.
A nice difference with Chef and Puppet is that Salt is also a remote
execution engine and can be used to execute commands on the minions by
specifying a set of targets. In this chapter we dive straight
into SaltCloud, an open source software to
provision Salt masters and minions in the Cloud. SaltCloud can be
looked at as an alternative to knife-cs but certainly with less
functionality. In this short walkthrough we intend to boostrap a Salt master (equivalent to a Chef server) in the cloud and then add minions that will get their configuration from the master.

SaltCloud installation and usage.

To install Saltcloud one simply clones the git repository. To develop
Saltcloud, just fork it on github and clone your fork, then commit
patches and submit pull request. SaltCloud depends on libcloud,
therefore you will need libcloud installed as well. See the previous
chapter to setup libcloud. With Saltcloud installed and in your path,
you need to define a Cloud provider in ~/.saltcloud/cloud. For
example:

The apikey, secretkey, host, path and provider keys are mandatory. The
securitygroup key will specify which security group to use when starting
the instances in that cloud. The user will be the username used to
connect to the instances via ssh and the private_key is the ssh key to
use. Note that the optional parameter are specific to the Cloud that
this was tested on. Cloud in advanced zones especially will need a
different setup.

Warning

Saltcloud used libcloud. Support for advanced zones in libcloud is
still experimental, therefore using SaltCloud in advanced zone will
likely need some development of libcloud.

Once a provider is defined, we can start using saltcloud to list the
zones, the service offerings and the templates available on that cloud
provider. So far nothing more than what libcloud provides. For example:

To start creating instances and configuring them with Salt, we need to
define node profiles in ~/.saltcloud/config. To illustrate two
different profiles we show a Salt Master and a Minion. The Master would
need a specific template (image:uuid), a service offering or instance
type (size:uuid). In a basic zone with keypair access and security
groups, one would also need to specify which keypair to use, where to
listen for ssh connections and of course you would need to define the
provider (e.g exoscale in our case, defined above). Below if the node
profile for a Salt Master deployed in the Cloud:

The W.X.Y.Z IP address above should be the IP address of the master that was deployed previously. On the master you will need to have port 4505 and 4506 opened, this is best done in basic zone using security groups. Once this security group is properly setup the minions will be able to contact the master. You will then accept the keys from the minion and be able to talk to them from your Salt master.

Apache Whirr

Apache Whirr is a set of libraries to run
cloud services, internally it uses
jclouds that we introduced
earlier via the jclouds-cli interface to CloudStack, it is java based and
of interest to provision clusters of virtual machines on cloud
providers. Historically it started as a set of scripts to deploy
Hadoop clusters on Amazon EC2. We introduce
Whirr has a potential CloudStack tool to provision Hadoop cluster on
CloudStack based clouds.

Installing Apache Whirr

To install Whirr you can follow the Quick Start
Guide,
download a tarball or clone the git repository. In the spirit of this
document we clone the repo:

git clone git://git.apache.org/whirr.git

And build the source with maven that we now know and love...:

mvn install

The whirr binary will be available in the bin directory that we can
add to our path

export PATH=$PATH:/Users/sebgoa/Documents/whirr/bin

If all went well you should now be able to get the usage of whirr:

$ whirr --help
Unrecognized command '--help'
Usage: whirr COMMAND [ARGS]
where COMMAND may be one of:
launch-cluster Launch a new cluster running a service.
start-services Start the cluster services.
stop-services Stop the cluster services.
restart-services Restart the cluster services.
destroy-cluster Terminate and cleanup resources for a running cluster.
destroy-instance Terminate and cleanup resources for a single instance.
list-cluster List the nodes in a cluster.
list-providers Show a list of the supported providers
run-script Run a script on a specific instance or a group of instances matching a role name
version Print the version number and exit.
help Show help about an action
Available roles for instances:
cassandra
elasticsearch
ganglia-metad
ganglia-monitor
hadoop-datanode
...

From the look of the usage you clearly see that whirr is about more
than just hadoop and that it can be used to configure elasticsearch
clusters, cassandra databases as well as the entire hadoop ecosystem
with mahout, pig, hbase, hama, mapreduce and yarn.

Using Apache Whirr

To get started with Whirr you need to setup the credentials and endpoint
of your CloudStack based cloud that you will be using. Edit the
~/.whirr/credentials file to include a PROVIDER, IDENTITY, CREDENTIAL
and ENDPOINT. The PROVIDER needs to be set to cloudstack, the IDENTITY
is your API key, the CREDENTIAL is your secret key and the ENDPPOINT is
the endpoint url. For instance:

With the credentials and endpoint defined you can create a properties
file that describes the cluster you want to launch on your cloud. The
file contains information such as the cluster name, the number of
instances and their type, the distribution of hadoop you want to use,
the service offering id and the template id of the instances. It also
defines the ssh keys to be used for accessing the virtual machines. In
the case of a cloud that uses security groups, you may also need to
specify it. A tricky point is the handling of DNS name resolution. You
might have to use the whirr.store-cluster-in-etc-hosts key to bypass
any DNS issues. For a full description of the whirr property keys, see
the
documentation.

The example shown above is specific to a CloudStackion
Cloud setup as a basic zone. This cloud uses
security groups for isolation between instances. The proper rules had
to be setup by hand. Also note the use of
whirr.store-cluster-in-etc-hosts. If set to true whirr will edit the
/etc/hosts file of the nodes and enter the IP adresses. This is
handy in the case where DNS resolution is problematic.

Note

To use the Cloudera Hadoop distribution (CDH) like in the example
above, you will need to copy the
services/cdh/src/main/resources/functions directory to the root of
your Whirr source. In this directory you will find the bash scripts
used to bootstrap the instances. It may be handy to edit those
scripts.

After the boostrapping process finishes, you should be able to login to
your instances and use hadoop or if you are running a proxy on your
machine, you will be able to access your hadoop cluster locally. Testing
of Whirr for CloudStack is still under
investigation and the
subject of a Google Summer of Code 2013 project. We currently identified
issues with the use of security groups. Moreover this was tested on a
basic zone. Complete testing on an advanced zone is future work.

Running Map-Reduce jobs on Hadoop

Whirr gives you the ssh command to connect to the instances of your
hadoop cluster, login to the namenode and browse the hadoop file system
that was created:

Conclusions

The CloudStack API is very rich and easy to use. You can write your own client by following the section on how to sign requests, or you can use an existing client in the language of your choice. Well known libraries developed by the community work well with CloudStack, such as Apache libcloud and Apache jclouds. Configuration management systems also have plugins to work transparently with CloudStack, in this little book we presented SaltStack and Knife-cs. Finally, going a bit beyond simple clients we presented Apache Whirr that allows you to create Hadoop clusters on-demand (e.g elasticsearch, cassandra also work). Take your pick and write your applications on top of CloudStack using one of those tools. Based on these tools you will be able to deploy infrastructure easily, quickly and in a reproducible manner. Lately CloudStack has seen the number of tools grow, just today, I learned about a Fluentd plugin and last week a Cloudfoundry BOSH interface was released. I also committed a straightforward dynamic inventory script for Ansible and a tweet just flew by about a vagrant-cloudstack plugin. The list goes on, pick what suits you and answers your need, then have fun.