Introduction

Our application uses the disk perhaps a little more than a typical cloud based service might; the whole purpose of safedrop being to exchange files (large and small) securely, we make use of fairly large amounts of storage and require fast low-latency access to this data.

The performance of our data storage is vital to maintaining the overall responsiveness and utility of our app so before we migrated our service to Amazon's cloud we thought it prudent to do some raw benchmark comparisons between the two most economical instance types available in EC2: m1.small and c1.medium in order to determine which of them would best provide us with our basic cluster node. m1.small is characterised by Amazon as having "One Amazon compute unit" and c1.medium "Five Amazon compute units" but at significant increased cost. Both instances have the same amount of RAM.

The storage arrangements tested were: "local disk" (volatile, so mostly used for comparison), single volume EBS (Elastic Block Store) and two volume EBS arranged in RAID0 (a configuration recommended for its ability to increase overall throughput).

Benchmarks

The first benchmark ("Raw IO") was to intended to test the low-level raw performance of the basic OS interface. This should be the theoretical maximum throughput that would be possible if it weren’t for filesystem implementation specifics, and other things like network traffic limitations.

The results above appear consistent across all configurations because the tests were carried out using the same OS version (CentOS 5.4), drivers, platform (32 bit) etc. The main purpose of this particular benchmark was to show that there was nothing fundamentally differentiating about the platforms being tested and we were testing "like with like".

The next benchmark ("Aggregate bandwidth test") is an averaged measure of actual data throughput for a simulated real-world application involving stressful and simultaneous read and write access of non-sequential disk data i.e. these are real numbers that we might expect to see on a server undergoing some consistent use as our database or file storage platform.

Bigger numbers are better but it should be noted that an overall bigger bar does not represent the superiority of that result; it is the bars that are the same colour that should be compared to one-another in each test. They are simply presented side by side this way for readability.

The surprise here was that m1.small appeared to be the overall best performer regarding the amount of data it was able to read and write on the disk under the simulated normal activity. Both its single EBS volume and 2 volume RAID0 configurations were better than the same configurations running on a c1.medium instance. This is remarkable and the reason why are still unclear. What we have been able to gather from reading around forums is that the EC2 platform experiences natural variations in performance depending on the time of day and availability zone used (eu-west-1 in this case). We think this is a way of saying that due to its extreme distributed and fault tolerant nature it is impossible to guarantee completely consistent access to resources all the time.

The next benchmark ("Minimum bandwidth") determined the lowest values recorded for real data throughput. Bigger numbers are better.

Again m1.small was better, even at the low-end, with c1.medium a close second.

There isn’t much to separate m1.small and c1.medium here, but m1.small does look like its high ceiling is slightly lower than c1.medium.

Conclusion

We would ideally like to repeat these benchmarks at several different times of the day; performance can vary that much and the general superiority of m1.small might be an anomaly.

Local (in reality NAS) disk should not be dallied with, although we already “knew” this. The data shows that it’s the loser in every single benchmark - it is clearly too slow to access and does not have sufficient throughput to be used for real production applications. Coupled with its volatility - it goes away in the event of server shutdown or crash - this makes it good for nothing but storing the boot OS, even though there is usually some 300GiB available to us for free. It was clearly never Amazon’s intention to use local disk for anything but the most basic of purposes.

The inadequacy of local storage should be foremost when planning where to store temporary files (during SafeDrop file upload, for example). Likewise for log files; these we would most likely want to keep, anyway, so they would be in an EBS volume already.

The Baseline IO performance and bandwidth figures seem to indicate that multiple EBS volumes, not necessarily joined into RAID but rather mounted on different filesystem mount points, should be used where possible.

A significant point is that EBS volumes are already being mirrored behind the scenes and this overhead probably explains the less than stellar results from the RAID1 mirroring benchmarks - perversely a RAID mode usually associated with improved random access performance!

The immediate benefit of running a c1.medium instance above an m1.small is not apparent - at least as far as non-CPU intensive applications are concerned. The clear exception to this are in the results of the minimum latency tests, which show the c1.medium instance has much quicker random access to data stored in EBS RAID0 volumes connected to it.

The software used to carry out all of the tests was 'fio', which is available on freshmeat.