Tags

Measuring the Performance of Your Cloud

Measuring the Performance of Your Cloud

As our customers continue their aggressive migration to shared
resource environments in the form of Private Clouds, they are asking
questions about measuring performance.

During IMPACT
we took a first step at characterizing a
performance benchmark for Private Clouds. The “Twitter demo”, from
IMPACT, that Andrew Spyker created and Jason McGee walked
through is
one such activity. If you click on the video accompanying this article,
you can watch Jason’s demonstration of the "Twitter demo" and the
IBM Workload Deployer, from IMPACT.

Since IMPACT, we've continued the effort our effort of
benchmarking performance of a private, perhaps even
public, cloud. This article represents our first outline of
characterizing cloud performance.
We decided to start with the 10
Cloud
Attributes that I described in my
previous blog article on this subject. Now, some of these attributes
don’t necessarily have direct attribution to performance (e.g.,
Standards Based), hence we focused on the ones that did. We ended up
looking at the following six attributes; Time to Deploy, Density,
Elastic
Scale, Resiliency (including
Security/Isolation) and Runtime
Performance. We also added a final attribute targeted solely
at
cloud
providers, which is Time to Genesis.

Cloud performance is in the eye of the beholder. If you are a
cloud consumer, you simply expect your “service level agreement” to be
met. This includes overall performance, elasticity, security and
resiliency. The tricky job goes to the cloud provider. The provider has
to make the cloud’s resource appear: instantaneous,
always
on, and
infinitely abundant. Hence, this article is predominantly
focused on
performance from the cloud providers prospective.

In the IBM
cloud
world,virtualization and automation
are the
technological keys to the allusion of "instant, always and infinite".
Hence, many of these areas are
focused on the performance and impact of virtualization, and the
automated provisioning system behind it. The ability to express service
level agreements (SLAs) against your virtual environment is another key
ingredient to measuring performance, allowing the cloud to service the
most important workloads first, and degrade others gracefully.

Here is a quick introduction to these 6 areas of cloud
performance. For each area, I try to provide a simple definition, give
some insight as to why the area is important, and provide high-level
thoughts on how one might set up experiments to quantify performance. I
am boldly assuming that the cloud is mostly used to run enterprise
web
applications, so the examples follow the behavior of this workload
style. I am hoping in the future to have a more scholarly
perspective, but until then, here is a the high-level overview…

Time to Deploy is the elapsed time to deploy a virtual
environment for a tenant of the cloud. The environment is considered
running when it can handle a request from the tenant. When measuring
deploy-time, we feel it’s important to start simple and increment to
more complex environments. For example, start with a simple base line
experiment to time the start of an operating system instance running an
application, for example a Java Virtual Machine. Once you have the base
line, time more interesting increments; including starting an
application server running an application like Apache’s Day Trader.
Continue incrementing by adding a clustered application server, and
then a clustered application server communicating with a database
server. Ramping up from simple to more complex virtual environments can
help illustrate, for the cloud provider, the effectiveness of the
provision software. Specifically it will exercise how well it handles
images including cloning and caching. It will also exercise the
automation system’s ability to initialize resources, like disk, network
and application middleware.
There are “performance tricks” that can be played by the cloud
provider. Using statistics and heuristics to determine which virtual
environments are in demand and pre-provision them within a cache.
Hence, when a tenant asks for one of these environments, it is
immediately available. While this approach works fine for simple
applications, it is nearly impossible to anticipate the environment for
a complex enterprise application; hence the pre-provisioning trick
would be less effective in this case.
This brings me to my last point of this topic. Are you measuring the
deployment of infrastructure or of an application (which runs on the
infrastructure)? Measuring deployment times of infrastructure using an
Infrastructure-as-a-Service will be different from deploying an
application within a Platform-as-a-Service. (More on this some other
time).

Density is a measurement that relates to the number of
tenants that can be packed into your cloud. A tenant can equate to a
virtual environment and/or an application. To measure the effectiveness
of a dense cloud, we have to understand how well your system can share
or over-commit resources. This includes “hard” resources like CPU,
memory, storage and network and “soft” middleware resources like
operating systems, Java Virtual Machines, Application Servers and
Databases.
One way to calculate density involves picking one of the virtual
environments from your “deploy time” experiments, and repeating the
process of deployment. Drive a modest number of requests to the Day
Trader application, using a load driver tool. Once the application hits
steady state (after a minute or two), capture the vital signs of the
cloud (CPU utilization, free memory, disk space, network traffic). Note
how resources are consumed and shared as new virtual environments are
created. You would casually expect resource consumption to be linear;
however, aggressive resource sharing and over-commitment within the
cloud should allow the result to be a factor significantly less than
the sum of the size of a all single environment.
Multi-tenancy is a key factor in enabling sharing and achieving higher
density. Multi-tenant hardware (hyper visor based environments)
combined with shared middleware (e.g., database as a service) will
yield very aggressive density statistics. Again, measuring density of
infrastructure using an Infrastructure-as-a-Service will be different
from deploying an application within a Platform-as-a-Service. IaaS
focused on sharing hardware. However, the biggest density gains come
with PaaS, which builds on IaaS and additionally focused on shared
middleware.
Not all tenants should be created equal. SLAs dictate sharing
possibilities and placement options. Does a tenant get their own
virtual environment with full dedicated resource (e.g., dedicated CPUs
or even servers) or do they share resources with others? An optimized
cloud will play tricks, based on SLAs, to give less resource to
applications or environments that are either under utilized (or under
SLA’ed) to ensure in-demand applications get what they need. (I.e., Rob
from the poor and give to the rich.)

Elastic Scaling is the ability for a virtual environment
to response to fluctuation in demand based on a service level agreement
made of operational policies. Consider this example. An application is
running (on an app server) in a virtual environment and has an
operation policy to maintain a 2 second response time. When this policy
is breached, the cloud reacts by spinning up another instance of an
application server within the virtual environment in an attempt to
reduce the response time by expanding processing capacity. So, from
this perspective, measuring elastic scale involves measuring “reaction
time” of the automatic provisioning capability of the cloud. (Of
course, this assumes your cloud supports such scaling policies).
One measurement for elastic scale is to measure how steady user
response time holds as load is increased. Hence an experiment might
look something like this. Using Apache Day Trader, ramp up user
requests until the virtual environment becomes saturated, and response
times grow beyond a targeted response time; say 2 seconds. Measure how
long it takes the cloud to react to this breach and return the response
time back to 2 seconds or better.
We almost always attribute elastic scale with growth; however, it is
also important for a cloud to shrink when the demand is no longer
there. By “garbage-collecting” under utilized environments, your cloud
to ensure it’s ready for another applications peek. Elastic data
(memory/disk) is another aspect of elastic scale and should be
considered when measuring cloud performance. For a web applications
running in a virtual environment, measuring the management of HTTP
sessions, can be one way to experiment with aspects of elastic data in
your cloud. Such an experiment might increase the number of users
logging into Apache Day Trader, to the point where the number of users
exhaust the memory of the virtual environment. If the cloud has elastic
data capabilities it should utilize an elastic data grid or disk
storage to enable the scaling of the number of concurrent users, well
beyond the memory allocated to that virtual environment.

The metrics around resiliency all relate to keeping your
cloud running under adverse conditions. Denial-of-Service attacks, run
away processes, failed-hardware resources are examples of Security,
Isolation, and Resiliency respectively. A cloud should be able to
quickly react to issues within the “hard” or “soft” aspects of
environment by moving workloads to working areas of the cloud and
quickly failing over to another virtual environments. A robust
enterprise cloud should also support disaster recovery features,
allowing your cloud to be linked to another cloud in an active/passive
or active/active setup.
One can imagine a performance experiment to measure Resiliency being
similar to the Elasticity tests. However, instead of the cloud reacting
to a breach in SLA, the cloud must now react to a system failure. For
example, unplug the “blade” running the Apache Day Trader workload, to
simulate a hardware error. Measure how long it takes the cloud to react
to this breach and return the response time back to 2 seconds or
better.
Similarly, your cloud must support isolation such that if one tenant’s
virtual system is “running amuck” another tenant will not be disturbed.
To test this scenario, we create a run away process that continually
allocates memory, or disks space. While this is happening we measure
the performance of a second tenant to see if we notice any ill effects
from its neighboring tenant. Also watch the system vital signs such
that the run-away tenant is “capped”.
Your cloud must still perform while under a denial of service attack.
Hence, another test involves setting up a denial of service attack by
opening up Port 80 and sending bogus HTTP traffic. The cloud should
employ an application firewall that filters Port 80 and looks for ill
formed HTTP requests and deny them access to the Cloud’s network.

The nature of a cloud would dictate that you shouldn’t
just look at the “performance of the one”. Given that consolidation and
efficiency are key business drivers of a (private) cloud, looking at
the “performance of the many” is likely more important. Given the
“performance of the many”, you need to bank on the diversity of
workloads and usage (i.e., law of large numbers) within your cloud and
focus on ensuring that your cloud can manage and meet SLAs across a
variety of workloads.
The topic of SLAs has come up throughout these attributes and they are
important here as well. There are two aspects of SLA management that
are key to balancing performance of your cloud. The first is Service
Classes. For example, service class that represent and categorize high,
medium, or low importance of tenants/applications. Having a diverse set
of services classes allows the cloud management system to make trade
offs and gracefully degrade performance according to a discrete
methodology. Hence application running under a Gold Service class will
be more likely to maintain their SLAs than Silver and so on. Testing
how true the cloud manages to the SLAs is one what to measure the
“performance of the many”. While in a shared environment the
“performance of the many” is most critical, many cloud providers gauge
the effectiveness of their cloud by measuring the “performance of the
one” by measuring the cost of virtualization. To measure the cost of
virtualization, run the Day Trader on a non-virtualized system (of
similar class) and compare the performance to the same application
running in a virtualized system. We typically see cost of
virtualization being anywhere from 2%-10% addition overhead in
throughput and response time. Advances in virtualization allow tricks
to be played to “prefer-local” configurations, such that if a virtual
environment has it software components co-located (on the same
hardware), the hyper-visor can work all but skip the layer of
abstraction (in, for example, I/O processing) and have minimal (1%-2%)
impact on run-time performance.

Time to Genesis is the time it takes to “stand up” your
shared cloud environment. The question is, when do your press “start”
on your stop watch? Similarly, when do you press, “stop”. Most
customers don’t even try to measure this because the units of
measurement tend to be in the months or years range. Clearly there are
vendors of Cloud infrastructure, like us in IBM, that are working the
notion of “a cloud in a box”; enabling a customer to stand up a cloud
from nothing to self service cloud portal in less than a couple of
hours.

As the pervasiveness of cloud computing continues, measuring
performance of your cloud will be less of an art, and more like a
science (like measuring the performance of web application is today).