Mike Kelly builds on Scott Barber's work to show how you can combine performance-degradation curves and complex performance scenarios to help determine "good enough" quality for an application in terms of performance.

Mike Kelly builds on Scott Barber's work to show how you can combine performance-degradation curves and complex performance scenarios to help determine "good enough" quality for an application in terms of performance.

Like this article? We recommend

A year or so ago I had the pleasure of attending a conference at which Scott
Barber gave two presentations on performance testing. The first presentation was
on the effective presentation of performance test data; the second was on the
modeling of application user communities. After watching both presentations and
talking with Scott, I was able to draw a couple of insights. First, many times
we focus too much on the problems that are easy to identify, rather than taking
the time to determine where the real problems may be hiding. Second, we tend to
performance test for the sake of performance testing, rather than taking the
time to understand the usage of the application and the business drivers within
which it operates. This behavior results in not knowing what to test and in not
understanding how much performance testing is enough for the application.

In this article, I'll build on some of Scott's work to show how we
can combine performance-degradation curves and complex performance scenarios to
help determine "good enough" quality for an application in terms of
performance. Throughout the article, I'll refer to Scott's work by
providing a quick summary and stealing an example for illustration, and then
move on to the next topic. I leave it to you to do the research necessary to
fully understand the summarized content. This article is intended for the
experienced performance tester or test manager.

Performance-Degradation Curves

Before we jump into the guts of this article, it might be good to establish
some working definitions and concepts. Let's start with
performance-degradation curves. In his article on
creating a performance degradation curve,
Scott Barber outlines a basic response-time degradation curve. If you're
not familiar with this work, take a minute to read that article first; it sets
the stage for what we're about to cover.

Figure 1 is an example of a response-time degradation curve. Degradation
curves are common among performance testers; they go by various names, so
forgive me if you know this curve by another name. A response-time degradation
curve plots the response time experienced by the user against the user load.
It's worth pointing out that the various user loads represented on one of
these plots all use the same user-community model (explained in more detail
later in this article). Later on, I'll discuss how to compare loads based
on different models. This example shows the response times for two web pages
(the home page and page 1) under differing loads (from 1 to 200 users). Curves
like the one in Figure 1 are good for comparing specific page-response times
across multiple tests using the same model, graphically displaying where
performance starts to decline and where performance becomes unacceptable.

The shape of a typical response-time degradation curve can be broken down
into four regions (see Figure 2):

The single-user region is just that—the response time for a
single user on the system. This is useful for establishing a point of
reference.

The performance plateau shows the best performance you can expect
under the specific conditions of that particular test without further
performance tuning. This area represents good candidates for baselines and/or
benchmarks.

The stress region is where the application "degrades
gracefully." Typically, the max recommended user load is the beginning of
the stress region.

The knee in performance is the point where performance
"degrades ungracefully."

These regions are typically used by testers to help them determine where
performance starts to degrade for any given portion of the application. It has
been my experience that these charts are used primarily for two purposes:

The effective display of performance information, in an effort to
show "good enough" performance or poor performance in relation to some
stated requirement. For example, if I had a requirement stating that
the home page must load in under six seconds with 100 concurrent users, I could
confirm that requirement using the chart in Figure 2. If the requirement was for
200 users, I could use the same graph to show that more work needs to be done to
meet the requirement.

As a tool used to determine the knee in performance while
performance tuning. Where the knee occurs is the absolute maximum load
you ever want your application/system to encounter. Data collected after the
knee is the load data that exploits your critical bottleneck; this data is then
used to research and correct performance bottlenecks. Many times this is an
iterative process in an effort to push the knee in performance further to the
right (or to a higher load).

While many testers are interested in the stress region and the knee in
performance, in this article we'll take a slightly different view on this
data.