Blog Post

Who’s the best cloud storage provider? Microsoft, says Nasuni; but it still likes Amazon

So here’s an interesting tidbit. Nasuni, which manages cloud storage for businesses, ran a set of exhaustive tests to assess the performance, availability and scalability of five major cloud storage providers. And the winner? Microsoft(s msft) Windows Azure. Yup. Not Amazon(s amzn) S3, but Azure Blob storage.

“We ran uptime tests and other tests and the long and short of it is that all the vendors got better, but Azure just leapfrogged. It was the fastest, had the best availability and uptime and was the only provider to never register an error,” Connor Fee, VP of marketing for Natick, Mass.-based Nasuni said in an interview.

This according to Nasuni’s new State of Cloud Storage 2013 Industry Report, which also evaluated Google(s goog), Hewlett-Packard(s hpq), and Rackspace(s rax) storage. Nasuni is a pretty good judge of cloud storage provider performance since it assesses the best of the services to use for its customers’ data. It views the various cloud storage players much as EMC(s emc) or NetApp(s ntap) views hard drives — a piece of its overall service.

Azure’s no.1, but Nasuni still dubs Amazon S3 as its primary backend

Now before we get all wrapped around our axles about this glowing Azure endorsement, it’s important to note that Nasuni still counts on Amazon as its primary storage supplier and will continue to use Azure as a secondary supplier in some cases.

So if Azure is so great, why stick with Amazon? “One major thing we evaluate is maturity and experience in the market and Amazon still clearly has the most experience and is the most mature player in this space,” Fee said.

So how to explain Microsoft’s vast improvement? Fee, refers to this Microsoft blog post, which outlines a major upgrade of Azure’s storage layer, as a possible reason. Basically, Microsoft upgraded its storage layer, from a 1 gig to a 10 gig network and from a hierarchial to a flat network. That means it’s faster handling myriad small files.

Microsoft honed performance on handling lots of itty-bitty files

Think of it this way: Every time you want to store something to the cloud, you have alert the cloud that you’re about to write to it; then you write to it; then it acknowledges receipt of what you’ve written. “There’s a lot of back-and-forth there,” said Fee. “With very big files, if you have a very fast network connection that’s usually enough. But with small files, all of that chatter matters, so whatever Azure did, they got really, really good at handling small I/O,”

It’s also important to note that this year’s report differs from Nasuni’s 2102 testing so year over year comparisons aren’t all that useful, although Nasuni was impressed with Azure even then.

To be fair, Azure’s year-over-year improvement is impressive, as indicated by the performance on small files, and the 0-error rate. Amazon’s error rate was tiny, but non-zero.

But, I really do think Azure needs to handle larger sustained I/O better. It smells like they rely on caching rather than providing big pipes (think of a sponge: it’s ok when you only get little bursts, but when you leave the water on the sponge can’t handle the load)

No, the story that “Azure is the leader” is incomplete at best, propaganda at worst. Don’t worry Lucy, I’ll splain (with specifics)

BACKUP (DISASTER RECOVERY)
How can they say Azure leads in Cloud storage, when Amazon Glacier provides backup of S3, and Azure doesn’t offer a D/R solution? With Amazon, I can recover all data in ~4hours, … no matter how much I have.
Replication, as we know, is not a backup because errors (like file deletion or overwrites) propagate to the backups.

CAPACITY
Azure storage is limited to a certain amount (currently 200TB) per storage account. Sure, you can hack an app to use multiple accounts (20 of them by default).
Even if you have one Azure account, good luck with the billing complexity. This post, http://www.nasuni.com/blog/177-azure_fair_and_balanced, by a Nasuni employee says even Azure billing isn’t reliable.

On the other hand Amazon capacity is not limited and can handle individual blobs of up to 5TB. This was also not mentioned. Interesting.

AZURE INTERNAL I/O IS SLOW
I know from my own tests, that I/O within Azure VM hosts flattens out at ~4MB/s after ~30 seconds of writing, and Amazon VM hosts stays steady at ~35MB/s without ever dropping off. Test it yourself, download your own tool (e.g. Iometer or Sqlio). So, that made me suspicious of the report’s numbers. Sure, those are VMs, but it’s the closest an outsider can get to testing Azure’s internal (shared) infrastructure. In any case, that performance is so bad that it made me suspicous enough to read the report details…

LET’S LOOK AT THE REPORThttp://ht.ly/hR301
Check out the time to write 10MB… Rackspace (of the disparaged OpenStack) had the best time to write 10MB. Azure was in 4th place, behind Google and Amazon

Also, Amazon and Azure both lag greatly in their “availability” metric. Why is that not a factor. Interesting.

So, I read the *actual numbers* in the report, and guess what, Azure’s bandwidth/performance drop off precipitously with larger blob/file sizes.
In fact, ***THEY OMIT THE PERFORMANCE NUMBERS FOR AZURE BLOBS GREATER THAN 10MB, BUT INCLUDE THEM FOR ALL OTHER CLOUDS ***
Why? I can only guess.

What I know is that:
Azure can handle blobs greater than 10MB, but metrics for bigger files were not included
Azure internal IO is horrible with sustained IO (see above) and that might mean the performance for files larger than 10MB are poor.
The report does show that performance numbers for Azure dropped off very quickly (faster than the other clouds) as the blob sizes got bigger.
If I wanted to make Azure look good, I’d focus on the small blobs (100KB) and do as they did and omit metrics on blobs larger than 10MB.
It’s possible that most of the files they care about are indeed smaller than 10MB.

Not sure if azure share the same architecture as skydrive, I have a 25gb skydrive account. I love the looks and features it provide but I can’t palate how slow it sync the data between skydrive and my local data and how much resources it would eat up. 500MB would takes days to synch. I heard other people have better experience with it, but mine was just unusable. And yeah I tested google drive, Dropbox, bitcasa and sugar. They all synch quickly.