So yesterday my Windows Server VMs running in Windows Azure (VM Role) were automatically shutdown and then later restarted. I assume this occurred due to an update/update to the host server and/or environment. I have my servers deployed in pairs where each pair is in the same availability set. The idea here is that only one VM per availability set will be taken offline at any one time. As servers are added into an availability set they are done so without adding the server into the same rack/fault domain as the other members. The theory is this should push your SLA from 99.9% to 99.95% (I assume the last .05% is to account for certificate expiration)

When determining how machine machines to add into an availability set you need to ensure the load handled by the machines can be satisfied with x-1 machines were x represents the number of machines in the availability set. So in my case, for this pair, my x was 2 with the idea a single server could handle the load. Of course you would likely want to configure more to ensure there is no single point of failure. It is fairly trivial to add 4, 5, or even more servers into an availability set using PowerShell.

With the promise from Azure that only one server will ever be down at any one time the next question you may have is: So how did my Azure invoked outage yesterday fair?

Server

Offline Date/Time

Online Date/Time

Time offline

SRV-01

11:04:28am CST

11:22:50am CST

~18 minutes

SRV-02

12:02:38pm CST

12:22:12pm CST

~20 minutes

So as you can see Windows Azure’s Availability Sets worked as advertised!

As an unplanned follow up to my previous post I wanted to reply to some of the feedback I received and take another run at this little IO test. The feedback was generally around what I tested rather than the why and how. I had no doubt the why was super clear and I was not interested in debating the how because as I said before this is a very informal test so I am glad I did not receive either of those remarks. As for the what feedback it boiled down as:

You get what you pay for at $9 a month for Azure and free for EC2 what did you expect?

Try testing on a more realistic platform, one that someone may actually expect decent IO.

How about a newer Cloud Ready OS bro?

Hey buddy, we are friends and I work for Rackspace, so why didn’t you include them in the mix?

All are fair comments so lets take another stab at this and see what paying a bit more money can get us.

Rational

For those that did not read the previous post, the reason I am doing this testing is because the general feeling from a few of us using VMs running in the cloud is the IO seems or feels pretty slow. While Amazon, Windows Azure, and RS give you options when it comes to the number CPUs, network speed, disk space, and RAM it seems when it comes to disk IO you get what you get. While Amazon EC2 does give you designators such as low or high IO for some of their instances, there is no real indication of what that actually means or how it compares to other providers.

On an email distribution list yesterday someone commented on the disappointing IO performance they received while running a SharePoint Farm with VM roles in Windows Azure. As an Azure user I too had noticed the IO did feel a bit sluggish but with a super fast SSD in my laptop just about any VM these days feels that way. Just a few days earlier I was checking out Amazon™s EC2 pricing and it appears the cost to run a VM in EC2 vs. Azure appear to be about the same for about the same configuration. So naturally the next question is, of the two cloud services which offers better IO?

I get asked all the time about a good SharePoint reading list. I have always found the SharePoint MCM reading lists a really good start. Some of these items are blogs and papers with additional links which if the reader follows those will normally find it could take a good amount of time to traverse the entire list. There are quite a few SharePoint books out there too but these lists do not include any books. Here are the links to the SharePoint 2010 and SharePoint 2007 Pre-Reading List for the SharePoint MCM program.

One of the really cool things about working on and having the community use your tools is sometimes you get really great feedback and suggestions. Recently Heiko Hatzfeld from Microsoft PFE suggested we include method parameter or argument details into the output details produced by SNAP when it takes a snap of a process.

I have a service which leverages Azure’s Service Bus Queues. Clients post messages with session into a request queue and then wait/block on the response coming back via a response queue. I have a number of Azure Persistent VMs which each have a windows service which monitors the request queue and once they have work it takes about 3 seconds for them to process the request and queue a response into to the response queue for the waiting client.

I have been running with a 256GB Crucial C300 in my Lenovo W520 for a while now and it has been great. I love the performance and I love to see applications pop (vs poop) open. Someone, I forget who, tweeted the other day about a great deal on Amazon on a Crucial M4 for $399 so I hit Amazon, did the 2 day Amazon Prime ship and here I sit with a new drive just populated with Acronis True Image Home 2011.