Friday, February 14, 2014

Chemistry on the Amazon EC2

We are trying out the Amazon EC2 compute cloud for running computations in the Jensen Group. This is a note on how things are going so far.

It was actually extremely easy to set up. Within minutes of having created the Amazon Web Services (AWS) account, I had a free instance of Ubuntu 12.04.3 LTS up and running and was able to SSH into it
You have access to one free virtual box and 750 free hours per month for the first year so it is free to get started. My free instance had some Intel processor, 0.5GB RAM and 8GB disk space (I think the spec change from time to time).

I copied binaries for PHAISTOS (the program we are looking to run) over and they ran successfully, and things pretty much went without a hitch.
After trying out the free instance, I just saved the image (you can do that via the web interface), and every other instance I just started from the same image so no configuration was needed after the first time.
I mounted a folder located on the university server via SSHFS which I use to store output data from the instance directly to our server. This way I don't lose data if the instance is terminated, and I don't have to log in to the instance to check output or log-files.

The biggest problem for me was the vast number of different types of instance. You can select everything form memory-optimized to CPU, storage, interconnect or GPU instances, and these come in several different types each. This takes a bit of research and there is a lot of fine print. E.g. Amazon doesn't specifiy the physical core count, but rather "vCPU" which may or may not include hyperthreading (i.e. the vCPU number may be twice what you actually get!)
Also the price varies depending where the data center where you spawn your instances is located. I picked N. Virginia data center which was the cheapest. I don't know why I would pick one of their other data centers? The closest to me is located in Ireland, but it is about 15% more expensive. Asia seems to be even more expensive.

Managing payment is also surprisingly easy. I had my own free account which I used in the beginning. +Jan Jensen created an account using the university billing account number. From there we used the Consolidated Billing option to add my account to having the bill sent to Jan's account.

Our current project is pretty much only CPU-intensive and barely requires any storage or memory, so naturally I had to benchmark the instance types that are CPU optimized.

I tested out the largest (by CPU count) instances I could launch in the General Purpose (m3 tier), Compute Optimized (c3 tier) and Compute Optimized//previous generation (c1 tier). These are the m3.2xlarge, c3.2xlarge and c1.xlarge instances.

The c1.xlarge didn't support hyper threading from what I could gather. The m3.2xlarge is more expensive, because it has faster disks and more RAM. Initially, I thought the m3.2xlarge had 8 physical cores, but turns out I was merely fooled by the "vCPU" number and several pages of fine print in the pricing list.

As a test, I launched a Metropolis-Hastings simulation in PHAISTOS starting from the native structure of Protein G with the PROFASI force field at 300K with the same seed (666) in all the tests, and noted the iteration speed as a function of cores.

The maximum number of total iterations (all threads, collectively) per day for the three instances was comparable (see below) maxing out at around 500-600 millions/day.

A slight win for the quad core c3.2xlarge instance when it is hyperthreading on 8 cores.

No real benefit to spawn more than 8 concurrent threads either.

What is probably more important is the throughput for each USD you spend. Again, the c3.2xlarge wins (when hyperthreading on 8 cores) and is the cheapest for our purpose.

Thanks for this great post! - This provides good insight. You might also be interested to know more about generating more leads and getting the right intelligence to engage prospects. Techno Data Group implements new lead gen ideas and strategies for generating more leads and targeting the right leads and accounts. Amazon AWS Users Email & Mailing List

Actually it is extremely easy to set up. Within minutes of having created the Amazon Web Services (AWS) account through cheap custom British essay writing service at greatessay.biz. This is a note on how things are going so far. This project is pretty much only CPU intensive and barely requires any storage or memory.

I have completely read your post and the content is crisp and clear. Thank you for posting such an informative article, I have decided to follow your blog so that I can myself updated. Amazon Web Services Training in Chennai

Interested in the use of biologically active proteins and peptides as potential therapeutic agents has grown dramatically in recent years. Although many meaningful studies can be performed, Peptide Design