Category: starcluster

As promised here is a tutorial on potentially configuring and running say a 20-node CUDA 5 Multi-GPU cluster on Amazon’s AWS cloud infrastructure. The secret is to not pay the $2.10*20=$42/hour cost by using Spot Instances together with the awesome StarCluster python package which takes the pain out of creating clusters on AWS. For the purpose of this post, we will stick to just 2-nodes and will point out the place where you can easy add more nodes all the way up to 20. So lets get started!

Prerequisites

The first thing we need is to install StarCluster and also configure our Amazon AWS credentials and keys. On my 64-bit Mac OSX, I had to install pycrpto first with the following command (you may need to sudo):
➜ export ARCHFLAGS='-arch x86_64'
➜ easy_install pycrypto
...
➜ easy_install starcluster
...

And once installed we need to run it with the help command to create the config file by pressing 2:
➜ starcluster help
StarCluster - (http://web.mit.edu/starcluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

Now would be a good time to create a key via:
➜ starcluster createkey cuda -o ~/.ssh/cuda.rsa
...
>>> keypair written to /Users/kashif/.ssh/cuda.rsa

and add its location to the .starcluster/config under the [key cuda] section.

Its always good to also create a ~/.aws-credentials-master file and fill it in with the same information so that we can also use the Amazon command line tools:
➜ cat ~/.aws-credentials-master
# Enter the AWS Keys without the < or >
# You can either use the AWS Accounts access keys and they can be found at
# http://aws.amazon.com under Account->Security Credentials
# or you can use the access keys of a user created with IAM
AWSAccessKeyId=blahblah
AWSSecretKey=blahblahblah

Basic Idea

What we are going to do is to use an official StarCluster HVM AMI and update and create an EBS backed AMI of it. Then we will use this new AMI to run the cluster. The updated AMI will hopefully have the latest CUDA 5 as well as other goodies.

Customizing an Image Host

We first launch a new single node cluster called imagehost as a spot instance based of an existing StarCluster AMI on a GPU enabled instance. We need to choose an AMI or machine image which supports HVM so we have access to the GPU. We can list all the StarCluster AMIs via:
➜ starcluster listpublic
StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

The last thing we need to do is to ensure that the device files /dev/nvidia* exist and have the correct file permissions. This can be done by creating a startup script e.g.:
$ cat /etc/init.d/nvidia
#!/bin/bash
PATH=/sbin:/bin:/usr/bin:$PATH

Cluster Template

Now we can setup the cluster template in the StarCluster config file. We need to choose the AMI or machine image which we just created before. The ami-9f6ed8f6 is the AMI which we will use to setup a small cluster template in the StarCluster config file:
...
[cluster smallcluster]
KEYNAME = cuda
CLUSTER_SIZE = 2
CLUSTER_USER = sgeadmin
CLUSTER_SHELL = bash
NODE_IMAGE_ID = ami-9f6ed8f6
NODE_INSTANCE_TYPE = cg1.4xlarge
SPOT_BID = x.xx

Its important to have a SPOT_BID = x.xx or else the actual price will be charged, which is not what we want Also to run a bigger cluster just replace CLUSTER_SIZE = 2 with the number you need.

Finally in the [global] section of the config file we need to tell StarCluster to use this template:
[global]
DEFAULT_TEMPLATE=smallcluster