Tuesday, May 20, 2014

Cloudy I/O Performance - Deciphering IOPS in IaaS (Part 1 of 2)

Foreword

Disk performance scaling options in the public cloud seem limited
(particularly in Azure as of this writing), but there are ways to increase
your IOPS in IaaS solutions. To add to the performance problem,
transactional costs of running application tests can be not only time
consuming but expensive. To tune your storage performance reliably you will
need a fast, consistent way to test different configurations. This article
will cover that methodology and lead into a results/guidance article for
Azure (but applicable to others) IaaS storage performance.

We'll be doing this testing on Windows, but you could also easily do this on
Linux and the results that I'll be sharing are just as applicable there. To
accomplish this testing we'll be using the following tools:

Execution

Assumptions

If you plan on emulating my tests you'll need to have access to the
following:

Microsoft Windows Azure account (note this methodology will worth with
EC2 or any other platform including standard hardware/on-prem VMs)

IaaS VM Configured. A medium size is recommended for testing 4 disks
or fewer to limit the available memory. More on that below.

Administrator access to your VM.

Your workload is in fact disk I/O bound. If you're not sure of that
you may want to start with this
article.

Awareness that you will incur additional storage transaction costs by
running these tests.

Analysis/Create Workload

Note: If you're just trying to get a general sense for your VM
I/O performance capability, you don't need to collect data for a custom
access specification. IOMeter includes several tests you can use so skip
to the "Install IOMeter..." section below.

The first thing we need to do is create our workload. By using IOMeter we
can develop custom access patterns that model common workloads and have
the tool and workloads installed and configured in minutes on any machine.
There is nearly endless information on this topic, so I won't attempt to
create a definitive source here. For details on how to configure and use
IOMeter, see the following videos/articles:

To create an accurate workload you will need a good understanding
of the access pattern of your application. If you don't have that
information you can use a tool like Perfmon to do analysis on a fully
configured platform. The following counters will be of interest when
creating your access specification:

By collecting this data during the access pattern you wish to emulate you
can accurately estimate (with one caveat) the information needed to create
the IOMeter access specification. That caveat is determining the
sequential vs. random access pattern of the platform since Perfmon
analysis will reveal the rest. To determine that, you'll need an
understanding of how the platform stores and accesses/writes data. In my
case I'm tuning my VM for Splunk,
which uses a Map/Reduce functionality that has a highly sequential
read/write pattern. If you are unsure of your access pattern then err on
the side of configuring for mostly random access (90% or so) since it is
generally more common and demanding of the underlying storage
subsystem.

Install IOMeter and Config Access Specification

The following actions can be done on your target testing platform or a
different machine to stage settings. We'll be saving our settings for
quick use later.

Download and install IOMeter on your server. There are a series of
ways to stage files on any VM, but if you're looking for a quick way in
the Microsoft ecosystem check out my Onedrive/Azure
post.

Open IOMeter as administrator.

Under "Topology" configure your workers. Each worker represents one
thread generating I/O. By default it will create one per CPU thread
available, but in most cases you will only want one worker per process
you are emulating. In my case I'm assuming one large query at a time
(and we'll scale from there), so I'll be testing with one worker. If you
are unsure stick to one worker and you can move up from there when you
become more familiar.

Under "Disk Targets" select the disk you wish to test. This can change
in later runs so if the disk you want to test isn't present here select
a placeholder.

Under "Disk Targets" configure your "Maximum Disk Size". This
configures the size of your test file in sectors, which are considered
to be 512 bytes
each. To lessen the impact of OS caching you need to ensure this value
exceeds the amount of RAM present on the machine to be tested. In my
case I'll be testing on a 6GB RAM machine with a (approx) 7.5GB file, so
I've configured it for 15000000 sectors. (15000000 sectors * 512 bytes
per sector=7,680,000,000 bytes) To do this quickly take your total
desired size (in bytes!) and divide it by 512. (If you aren't certain
you got it right, check the size iobw.tst file created at the root of
your target drive after the first test is complete)

Under "Test Setup" configure your "Ramp Up Time" and "Run Time". Ramp
up need only be about 20 seconds for most scenarios and run time is best
between 1 and 10 minutes. My results are based on (many per config) 5
minute tests.

Under "Access Specification" select your access spec. There is far too
much to get into here; either select one or many existing access
specifications that suit you needs ("4k 75% read" is a good start if you
don't care) or create your own based on your findings from the
Analysis/Create workload section. For the purposes of my test I made a
"_Splunk" access spec with the following characteristics ascertained
from my earlier performance testing:

Add your access specification to the list of queued tests if you
haven't done so already (removing all others).

Click the disk icon to save the settings to an ICF file. This file
will save all your settings including custom access specifications if
applicable. Since this file is what you'll use to shortcut future
testing, save it somewhere easy to transfer to other VMs such as
OneDrive, Dropbox, SpiderOak, etc.

Run the Test

After setting up or loading your test settings, all you need do is click
the green flag to start the test and then select where you would like to
save the results. Make sure you don't overwrite any previous results and
give the file a meaningful name so you remember what this test represents
later, i.e.
"results_3disk_1_StorAcct_Striped_32k_sectors_noCache_run1.csv" or
similar.

The test will run for the configured time and then you will be able to run
additional tests or analyze results. Since the output is in CSV format, the
natural place to look at this data is Excel. When IOMeter starts for the
first time on a given disk it needs to create the test file. This will take
quite awhile in both Amazon EC2 and Azure. (15 mins for my 7.5G for example)
I believe this is due to the way space is allocated on the backend storage.
Once this is created, however, you can run subsequent tests on the same
volume without needing to wait for the test file to be created. Once the run
is done I recommend running several more to ensure your tests aren't subject
to wild performance swings. More on analysis in part 2 of this
article.

How Much Will This Cost?

Since you're charged by transaction I'm sure you will be wondering how
much this will cost. Let's break down your above baseline (system running)
cost in Azure:

IOPS are currently capped at 500
for standard tier machines (300 for basic). Storage transactions are
currently $0.01 per 100,000. (halved
on 3/14/14) For every 5 minute test per disk you access you will
then execute a maximum of 150,000 transactions. As a one time per
configuration cost, you will need to build the test file which will be
(test file size/volume sector size) transactions. For example, a 7.5 Gib
test file will be approximately 1,875,000 transactions assuming a default
4kb sector size. (7,500,000,000/4000)

Test transactions + creation transactions = 2 million or so IOPS, or $0.20
@ .01 per 100,000. So... not much. The amount is generally trivial on
Amazon EC2 as well. While this methodology will save you some in
transaction costs, the main savings will be in time & labor. (which is
usually our real cost anyhow!)

Further Optimization

Once you are comfortable with this process I would advise doing the
following to further optimize this process. After doing so you may be able
to automate the whole routine!

Create standard Perfmon counter sets for disk access and save/import
them as
a template.

Create or download IOMeter templates for common access routines and
include them with your set.

Script the installation and running of IOMeter, including multiple
runs and uploading results to a common location. Easy to do with
PowerShell and refer to the IOMeter
manual for command line options (page 75 or so).

Package up all your assets with a custom installer and put it in an
easy to get location. (mmmm... Chocolatey)

If you want angry followers and think digital bits are out there to be
wasted, auto tweet your results! (maybe not this)

In Closing

I/O testing in the cloud is certainly feasible but requires a little
extra discipline. With several access specifications in your toolkit you
can conquer most performance problems quickly. What to do if your cloud
platform doesn't provide your desired IOPS? Coming up in part 2!