Identifying Capacity Limitations: an Exercise

Often a capacity limitation manifests itself as a performance issue.
To differentiate between performance and capacity, performance might be defined
as “How fast the system is going” while capacity is “the
maximum performance of the system or an individual component.”

If your CPU is very low (at or around 10%), try to determine if the
disk controllers are fully loaded and if input/output is the cause. To determine
if your problem is disk related, use the iostat tool as
follows:

# iostat -xnMCz -T d 10

For example, a directory is available on the internet. Their customers
submit searches from multiple sites and the Service Level Agreement (SLA)
was no more than 5% of requests with response times of over 3 seconds. Currently
15% of request take more than 3 seconds, which puts the business in a penalty
situation. The system is a 6800 with 12x900MHz CPUs.

We look at the right 3 columns, us=user, sy=system and id=idle, which show that over 50% of the
CPU is idle and available for the performance problem. One way to detect a
memory problem is to look at the sr, or scan rate, column
of the vmstat output. If the page scanner ever starts running,
or the scan rate gets over 0, then we need to look more closely at the memory
system. The odd part of this display is that the blocked queue on the left
of the display has 18 or 19 processes in it but there are no processes in
the run queue. This suggests that the process is blocking somewhere in Solaris
without using all of the available CPU.

Next, we look at the I/O subsystem. The iostat command
has a switch, -C, which will aggregate I/Os at the controller
level. We run the iostat command as follows:

On controller 1 we are doing 396 reads per second and on controller
3 we are doing 400 reads per second. On the right side of the data, we see
that the output shows the controller is almost 200% busy. So the individual
disks are doing almost 200 reads per second and the output shows the disks
as 100% busy. That leads us to a rule of thumb that individual disks perform
at approximately 150 I/Os per second. This does not apply to LUNs or LDEVs
from the big disk arrays. So our examination of the numbers leads us to suggest
adding 2 disks to each controller and relaying out the data.

In this exercise we looked at all the numbers and attempted to locate
the precise nature of the problem. Do not assume adding CPUs and memory will
fix all performance problems. In this case, the search programs were exceeding
the capacity of the disk drives which manifested itself as a performance problem
of transactions with extreme response times. All those CPUs were waiting on
the disk drives.