Posts Tagged ‘ensemble’

I've seen Ensemble evolve from a series of design-level conversations (Brussels May 2010), through a year of fast-paced Canonical-style development, and participated in Ensemble sprints (Cape Town March 2011, and Dublin June 2011). I've observed Ensemble at first as an outsider, then provided feedback as a stake-holder, and have now contributed code as a developer to Ensemble and authored Formulas.

Think about bzr or gitcirca 2004/2005, or apt circa 1998/1999, or even dpkg circa 1993/1994... That's where we are today with Ensemble circa 2011.

Ensemble is a radical, outside-of-the-box approach to a problem that the Cloud ecosystem is just starting to grok: Service Orchestration. I'm quite confident that in a few years, we're going to look back at 2011 and the work we're doing with Ensemble and Ubuntu and see an clear inflection point in the efficiency of workload management in The Cloud.

From my perspective as the leader of Canonical's Systems Integration Team, Ensemble is now the most important tool in our software tool belt when building complex cloud solutions.

Period.

Juan, Marc, Brian, and I are using Ensemble to generate modern solutions around new service deployments to the cloud. We have contributed many formulas already to Ensemble's collection, and continue to do so every day.

There's a number of novel ideas and unique approaches in Ensemble. You can deep dive into the technical details here. For me, there's one broad concept in Ensemble that just rocks my world... Ensemble deals in individual service units, with the ability to replicate, associate, and scale those units quite dynamically. Service units in practice are cloud instances (or if you're using Orchestra + Ensemble, bare metal systems!). Service units are federated together to deliver a (perhaps large and complicated) user facing service.

Okay, that's a lot of words, and at a very high level. Let to me try to break that down into something a bit more digestable...

I've been around Red Hat and Debian packaging for many years now. Debian packaging is particularly amazing at defining prerequisites packages, pre- and post- installation procedures, and are just phenomenal at rolling upgrades. I've worked with hundreds (thousands?) of packages at this point, including some mind bogglingly complex ones!

It's truly impressive how much can be accomplished within traditional Debian packaging. But it has its limits. These limits really start to bare their teeth when you need to install packages on multiple separate systems, and then federate those services together. It's one thing if you need to install a web application on a single, local system: depend on Apache, depend on MySQL, install, configure, restart the services...

sudo apt-get install your-web-app
...

Profit!

That's great. But what if you need to install MySQL on two different nodes, set them up in a replicating configuration, install your web app and Apache on a third node, and put a caching reverse proxy on a fourth? Oh, and maybe you want to do that a few times over. And then scale them out. Ummmm.....

sudo apt-get errrrrrr....yeah, not gonna work :-(

But these are exactly the type(s) of problems that Ensemble solves! And quite elegantly in fact.

We are quite literally at the edge of something amazing here, and we welcome your contributions! All of Ensemble and our Formula Repository are entirely free software, building on years of best practice open source development on Ubuntu at Canonical. Drop into the #ubuntu-ensemble channel in irc.freenode.net, introduce yourself, and catch one of the earliest waves of something big. Really, really big.

Juju is a next generation service orchestration framework. It has been likened to APT for the cloud. With juju, different authors are able to create service charms independently, and make those services coordinate their communication through a simple protocol. Users can then take the product of different authors and very comfortably deploy those services in an environment. The result is multiple machines and components transparently collaborating towards providing the requested service.

HPCC (High Performance Computing Cluster) is a massive parallel-processing computing platform that solves Big Data problems. The platform is now Open Source!

Now that we are all caught up, let's delve right into it. I will be discussing the details of my newly created hpcc juju charm.

The hpcc charm has been one of the trickiest one to date to get working properly so, I want to take some time to explain some of the challenges that I encountered.

hpcc seems to use ssh keys for authentication and a single xml file to hold it's configuration. All nodes that are part of the cluster should have identical keys and xml configuration file.

The ssh keys are pretty easy to do ( there is even a script that will do it all for you located at /opt/HPCCSystems/sbin/keygen.sh ). You can just run: ssh-keygen -f path_where_to_save_keys/id_rsa -N "" -q

The configuration file environment.xml is a lot trickier to configure so, I will use cheetah templates to help make a template out of this enormous file.

According to their website:

Cheetah is an open source template engine and code generation tool, written in Python. It can be used standalone or combined with other tools and frameworks. Web development is its principle use, but Cheetah is very flexible and is also being used to generate C++ game code, Java, sql, form emails and even Python code.

With cheetah, I can create self contained templates that can be generated into their intended file by just calling cheetah. This is because we can embed python code inside the template itself, making the template ( environment.tmpl in our case ) more or less a python program that generates a fully functional environment.xml file ready for hpcc to use.

Another, very important, reason to use a template engine is the ability to create identical configuration files from each node without having to pass them around. In other words, each node can create it's own configuration file and, since all nodes are using the same methods and data to create the file, they will all be exactly the same.

The hpcc configuration file is huge so, I'll just talk about some of the interesting bits of it here:

The above piece of code is what I am currently using to populate a list with the FQDN and address of each cluster member sorted by install time. This puts the "master" of the cluster at the top of the list which will become useful when populating certain parts of the configuration file.

As we can see by the code above, the main piece of information that we use in this template is the node list. Here is a sample of how we use it in the environment.tmpl template file:

#for $netAddress, $name in $nodes:

<Computer computerType="linuxmachine"

domain="localdomain"

name="$name"

netAddress="$netAddress"/>

#end for

I encourage you to download the charm here and examine the environment.tmpl file in the templates directory.

Here is the complete environment.tmpl file... I know it's pretty small and, you can just download the charm and read the file at your leasure but, I wanted to give you an idea of the size and complexity of hpcc's configuration file.

Even just scrolling past the file takes a while!! This behemoth of a file was tamed thanks to cheetah, I highly encourage you to read up on it.

This charm may require some changes to your environment.yaml file in ~/.juju as hpcc will only run on 64-bit instances. Make sure that your juju environment has been properly shutdown before you edit this file ( juju destroy-environment ). Here is my environment.yaml file where I show you the important part to check:

juju: environments

environments:

sample:

type: ec2

access-key: ( removed ... get your own :) )

secret-key: ( removed ... get your own :) )

control-bucket: juju-fbb790f292e14a0394353bb4b63a3403

admin-secret: 604d18a77fd24e3f91e1df398fcbe9f2

The emphasized parts are the important ones. You can just copy them from here and paste them into your ~/.juju/environment.yaml file.

Now, let's take a look at the charm starting with the metadata.yaml file:

name: hpcc

revision: 1

summary: HPCC (High Performance Computing Cluster)

description: |

HPCC (High Performance Computing Cluster) is a massive

parallel-processing computing platform that solves Big Data problems.

provides:

hpcc:

interface: hpcc

requires:

hpcc-thor:

interface: hpcc-thor

hpcc-roxie:

interface: hpcc-roxie

peers:

hpcc-cluster:

interface: hpcc-cluster

There are various provides and requires interfaces in this metadata.yaml file but, for now, only the peers interface is being used. I'll work on the other ones as the charm matures.

Let's look at the hpcc-cluster interface. More specifically the hpcc-cluster-relation-changed hook where the new configuration is created:

You can access the web interface of your node by pointing your browser to http://<FQDN>:8010 Where FQDN is the Fully Qualified Domain Name or Public IP Address of your hpcc instance. On the left side, there should be a menu, explore the items on the Topology section. The Target Clusters section should look something similar to this:

To experience the true power of hpcc, you should probably throw in some more nodes at it. Let's do just that with:

A lot of work is going into making sure Ensemble is more secure and enterprise ready. As part of that, all deployed services are now firewalled by default and for a formula deployed service to be publically accessible, the formula author has to specify which ports are open and when, as well as the operator needs to signal wanting to open that port. All formulas that expose ports should use open-port (and optionally close-port) diligently. Here’s what you need to know

Updating formulas for the new expose functionality

This is the only change necessary for the WordPress/MySQL example, in example/wordpress/hooks/db-relation-changed:

# Make it publicly visible, once the wordpress service is exposed
open-port 80/tcp

It is important that formulas open ports only when ready. So in the WordPress example, you wouldn’t want to do this port opening until Apache has been successfully configured and restarted. Otherwise, there’s a chance that users might see “It works!” before the desired page is available.

Firewall changes also are a two-step process. The hooks for a service unit need to open ports (and they can also close ports), but the Ensemble administrator must also expose the service. For the WordPress example, you can expose it any time after the service has been deployed with the following:

ensemble expose wordpress

Just expose the services you’re interested in exposing, possibly as soon as immediately after deployment. Again, it’s the formula author’s responsibility to ensure that port opening is done at the right time.

The service can be subsequently unexposed with

ensemble unexpose wordpress

You can see if a service is exposed with ensemble status. This would result in output similar to the following:

This work is only part of the effort to ensure Ensemble uses secure mechanisms in its operations. Recent work also made sure all state information between cloud nodes are properly access controlled to avoid leaking any confidential data. Ensemble is rapidly progressing, and now is a great time to start playing with the technology, and to start writing your own formulas!

Interested? Join the friendly Ensemble community at #ubuntu-ensemble on IRC freenode, drop in, say hi, and grab me (kim0) for any questions

MongoDB is such a great piece of open-source technology. It supports some very interesting features such as sharding and replica-sets. I have seen demos of MongoDB, where the speaker happily calls creating the replica-set cluster a “one hour thing“! I decided to sprinkle some Ensemble magic on this problem, using Jaun’s formulas, the problem becomes a “10 second thing” basically! Spinning up a Mongo replica-set cluster could not be easier! Check this video out

Yep that’s how simple it is! If you want to create more read-slaves, you only need to ask Ensemble to do it for you:

$ ensemble add-unit mongodb

If you’re interested to learn more about exactly how this “magic” works, check out this indepth guide dissecting how the Mongo Ensemble formulas exactly works by “Juan Negron” the formula author.

So was this useful? Will you be deploying your next mongodb servers with Ensemble?
Leave me a comment, let me know your thoughts! Also let me know what you’d like to see deployed next with Ensemble. Be sure to drop in to #ubuntu-ensemble on freenode irc and say hi

I am by no means an expert on Cassandra but, I have done some medium size deployments on Amazon's cloud so, I wanted to translate my knowledge of Cassandra "rings" and develop an Ensemble formula that could use their peers interfaces to expand and contract the ring as needed.

I am by no means an expert on Cassandra but, I have done some medium size deployments on Amazon's cloud so, I wanted to translate my knowledge of Cassandra "rings" and develop an Juju charm that could use their peers interfaces to expand and contract the ring as needed.

Don't try to solve all deployment scenarios just concentrate on the above ones for now.

Let's start with the stand-alone deployment first and, we'll add the other functionality a bit later.

Before we go into creating the directories and files, I should probably mention Principia Tools. Principia Tools is ( as the name implies ) a set of tools that facilitates the creation of formulas for ensemble.

You can get principia-tools on most supported release of Ubuntu in the Ensemble ppa:

After installing principia-tools, go to the directory where you will be creating your formulas and type the following to get started:

principia formulate mongodb

The above commands will look in your cache for a package called mongodb and create a "skeleton" structure with the metadata.yaml, hooks and descriptions already done for you into a directory called ( you guessed it ), mongodb.

################################################################################### Set some variables that we'll need for later##################################################################################

################################################################################### Change the default mongodb configuration to bind to relfect that we are a master##################################################################################

################################################################################### Reconfigure the upstart script to include the replica-set option.# We'll need this so, when we add nodes, they can all talk to each other.# Replica sets can only talk to each other if they all belong to the same# set. In our case, we have defaulted to "myset".##################################################################################sed -i -e "s/ -- / -- --replSet ${DEFAULT_REPLSET_NAME} /" /etc/init/mongodb.conf

################################################################################### stop then start ( *** not restart **** ) mongodb so we can finish the configuration##################################################################################service mongodb stop# There is a bug in the upstart script that leaves a lock file orphaned.... Let's wipe that file outrm -f /var/lib/mongodb/mongod.lockservice mongodb start

#!/bin/bash# This will be run when the service is being torn down, allowing you to disable# it in various ways..# For example, if your web app uses a text file to signal to the load balancer# that it is live... you could remove it and sleep for a bit to allow the load# balancer to stop sending traffic.# rm /srv/webroot/server-live.txt && sleep 30

service mongodb stoprm -f /var/lib/mongodb/mongod.lock

This is the script that Ensemble calls when it needs to stop a service.

hooks/relation-name-relation-[joined|changed|broken|departed]

These files are templates for the relationships ( provides, requires, peers, etc. ) declared in the metadata.yaml file. Here is a look at the ones that I have for mongodb:

Per the metadata.yaml, we need to define the following relationships:

database

replica-set

Based on that information, here are the files that I created for this formula:

database-relation-joined

#!/bin/bash

# This must be renamed to the name of the relation. The goal here is to

# affect any change needed by relationships being formed

# This script should be idempotent.

set -ux

relation-set hostname=`hostname -f` replset=`facter replset-name`

echo $ENSEMBLE_REMOTE_UNIT joined

replica-set-relation-joined

#!/bin/bash

# This must be renamed to the name of the relation. The goal here is to

And that's all that is needed to add a new mongodb node that will automatically create a replica set with the existing node. You can continue to "add-unit" to add more nodes to the replica set. Notice that all of the configuration, is taken care of with the replica-set-relation-joined and replica-set-relation-changed hook scripts that we wrote above.

The beauty of this formula is that the user doesn't really have to know exactly what is needed to get a replica set cluster up and running. Ensemble formulas are self-contained and idempotent. This means portability.

After installing charm-tools, go to the directory where you will be creating your charms and type the following to get started:

charm create mongodb

The above commands will look in your cache for a package called mongodb and create a "skeleton" structure with the metadata.yaml, hooks and descriptions already done for you into a directory called ( you guessed it ), mongodb.

################################################################################### Set some variables that we'll need for later##################################################################################

################################################################################### Change the default mongodb configuration to bind to relfect that we are a master##################################################################################

################################################################################### Reconfigure the upstart script to include the replica-set option.# We'll need this so, when we add nodes, they can all talk to each other.# Replica sets can only talk to each other if they all belong to the same# set. In our case, we have defaulted to "myset".##################################################################################sed -i -e "s/ -- / -- --replSet ${DEFAULT_REPLSET_NAME} /" /etc/init/mongodb.conf

################################################################################### stop then start ( *** not restart **** ) mongodb so we can finish the configuration##################################################################################service mongodb stop# There is a bug in the upstart script that leaves a lock file orphaned.... Let's wipe that file outrm -f /var/lib/mongodb/mongod.lockservice mongodb start

#!/bin/bash# This will be run when the service is being torn down, allowing you to disable# it in various ways..# For example, if your web app uses a text file to signal to the load balancer# that it is live... you could remove it and sleep for a bit to allow the load# balancer to stop sending traffic.# rm /srv/webroot/server-live.txt && sleep 30

service mongodb stoprm -f /var/lib/mongodb/mongod.lock

This is the script that Juju calls when it needs to stop a service.

hooks/relation-name-relation-[joined|changed|broken|departed]

These files are templates for the relationships ( provides, requires, peers, etc. ) declared in the metadata.yaml file. Here is a look at the ones that I have for mongodb:

Per the metadata.yaml, we need to define the following relationships:

database

replica-set

Based on that information, here are the files that I created for this charm:

database-relation-joined

#!/bin/bash

# This must be renamed to the name of the relation. The goal here is to

# affect any change needed by relationships being formed

# This script should be idempotent.

set -ux

relation-set hostname=`hostname -f` replset=`facter replset-name`

echo $JUJU_REMOTE_UNIT joined

replica-set-relation-joined

#!/bin/bash

# This must be renamed to the name of the relation. The goal here is to

And that's all that is needed to add a new mongodb node that will automatically create a replica set with the existing node. You can continue to "add-unit" to add more nodes to the replica set. Notice that all of the configuration, is taken care of with the replica-set-relation-joined and replica-set-relation-changed hook scripts that we wrote above.

The beauty of this charm is that the user doesn't really have to know exactly what is needed to get a replica set cluster up and running. Juju charms are self-contained and idempotent. This means portability.

A while back I started experimenting with Ensemble and was intrigued by the notion of services instead of machines.

A bit of background on Ensemble from their website:

Ensemble is a next generation service orchestration framework. It has been likened to APT for the cloud. With Ensemble, different authors are able to create service formulas independently, and make those services coordinate their communication through a simple protocol. Users can then take the product of different authors and very comfortably deploy those services in an environment. The result is multiple machines and components transparently collaborating towards providing the requested service.

I come from a DevOps background and know first hand the troubles and tribulations of deploying production services, webapps, etc. One that's particularly "thorny" is hadoop.

To deploy a hadoop cluster, we would need to download the dependencies ( java, etc. ), download hadoop, configure it and deploy it. This process is somewhat different depending on the type of node that you're deploying ( ie: namenode, job-tracker, etc. ). This is a multi-step process that requires too much human intervention. It is also a process that is difficult to automate and reproduce. Imagine 10, 20 or 50 node cluster using this method. It can get frustrating quickly and it is prone to mistake.

With this experience in mind ( and a lot of reading ), I set out to deploy a hadoop cluster using an Ensemble formula.

First things first, let's install Ensemble. Follow the Getting Started documentation on the Ensemble site here.

According to the Ensemble documenation, we just need to follow some file naming conventions for what they call "hooks" ( executable scripts in your language of choice that perform certain actions ). These "hooks" control the installation, relationships, start, stop, etc of your formula. We also need to summarize the description of the formula in a file called metadata.yaml. The metadata.yaml file describes the formula, it's interfaces, what it requires and provides among other things. More on this file later when I show you the one for hadoop-master and hadoop-slave.

Armed with a bit of knowledge and a desire for simplicity, I decided to split the hadoop cluster in two:

hadoop-master (namenode and jobtracker )

hadoop-slave ( datanode and tasktracker )

I know this is not an all-encompassing list but, this will take care of a good portion of deployments and, the ensemble formulas are easy enough to modify that you can work your changes into them.

One of my colleagues, Brian Thomason did a lot of packaging for these formulas so, my job is now easier. The configuration for the packages has been distilled down to three questions:

Due to the magic of Ubuntu packaging, we can even "preseed" the answers to those questions to avoid being asked about them ( and stopping the otherwise automatic process ). We'll use the utility debconf-set-selections for this. Here is a piece of the code that I use to preseed the values in my formula:

The Hadoop Distributed Filesystem (HDFS) requires one unique server, the

namenode, which manages the block locations of files on the

filesystem. The jobtracker is a central service which is responsible

for managing the tasktracker services running on all nodes in a

Hadoop Cluster. The jobtracker allocates work to the tasktracker

nearest to the data with an available work slot.

provides:

hadoop-master:

interface: hadoop-master

Every Ensemble formula has an install script ( in our case: hadoop-master/hooks/install ). This is an executable file in your language of choice that ensemble will run when it's time to install your formula. Anything and everything that needs to happen for your formula to install, needs to be inside of that file. Let's take a look at the install script of hadoop-master:

# This will be run when the service is being torn down, allowing you to disable

# it in various ways..

# For example, if your web app uses a text file to signal to the load balancer

# that it is live... you could remove it and sleep for a bit to allow the load

# balancer to stop sending traffic.

# rm /srv/webroot/server-live.txt && sleep 30

set -x

ensemble-log "stop script"

service hadoop-0.20-namenode stop

service hadoop-0.20-jobtracker stop

Let's go back to the metadata.yaml file and examin it in more detail:

ensemble: formula

name: hadoop-master

revision: 1

summary: Master Node for Hadoop

description: |

The Hadoop Distributed Filesystem (HDFS) requires one unique server, the

namenode, which manages the block locations of files on the

filesystem. The jobtracker is a central service which is responsible

for managing the tasktracker services running on all nodes in a

Hadoop Cluster. The jobtracker allocates work to the tasktracker

nearest to the data with an available work slot.

provides:

hadoop-master:

interface: hadoop-master

The emphasized section ( provides ) tells ensemble that this formula provides an interface named hadoop-master that can be used in relationships with other formulas ( in our case we'll be using it to connect the hadoop-master with the hadoop-slave formula that we'll be writing a bit later ). For this relationship to work, we need to let Ensemble know what to do ( More detailed information about relationships in formulas can be found here ).

Per the Ensemble documentation, we need to name our relationship hooks hadoop-master-relation-joined and it should also be an executable script in your language of choice. Let's see what that file looks like:

#!/bin/sh

# This must be renamed to the name of the relation. The goal here is to

As you can see, once you have the formula written and tested, deploying the cluster is really a matter of a few commands. The above example gives you one hadoop-master ( namenode, jobtracker ) and one hadoop-slave ( datanode, tasktracker ).

To add another node to this existing hadoop cluster, we add:

ensemble add-unit hadoop-slave # ( this adds one more slave )

Run the above command multiple times to continue to add hadoop-slave nodes to your cluster.

Ensemble allows you to catalog the steps needed to get your service/application installed, configured and running properly. Once your knowledge has been captured in an ensemble formula, it can be re-used by you or others without much knowledge of what's needed to get the application/service running.

In the DevOps world, this code re-usability can save time, effort and money by providing self contained formulas that provide a service or application.

So you wanted to play with hadoop to crunch on some big-data problems, except that, well getting a hadoop cluster up and running in not exactly a one minute thing! Let me show you how to make it “a one minute thing” using Ensemble! Since Ensemble now has formulas for creating hadoop master and slave nodes, thanks to the great work of Juan Negron. Spinning up a hadoop cluster could not be easier! Check this video out

Yep that’s how simple it is! If you want to scale-out the cluster, you only need to ask Ensemble to do it for you:
$ ensemble add-unit hadoop-slave

If you’re interested to learn more about exactly how this “magic” works, check out this indepth guide dissecting how the hadoop Ensemble formulas exactly work by non-other than Juan Negron, the formula author.

So is this easier than configuring a hadoop cluster manually? Leave me a comment, let me know your thoughts! Also let me know what you’d like to see deployed next with Ensemble. Be sure to drop in to #ubuntu-ensemble on freenode irc and say hi