Sunday Sep 30, 2007

From an article that I wrote last month, published in the September 2007 issue of Sun's Technocrat, this examination of System Latency starts where we left off with the last discussion What is Performance ? .. in the Real World . That discussion identified the following list of key attributes and metrics that most in the IT world associate with optimal system performance :

FailOver / Recovery Time (HA clustered DataServices, Disaster Recovery of a Geographic Service, ..) Time to recover a failed Service (includes recovery and/or startup time of restoring the failed Service)

etc ...

Each of the attributes and perceived gauges of performance listed above have their own intrinsic
relationships and dependencies to specific subsystems and components... in turn reflecting a type of "latency" (delay in response). It is these latencies that are investigated and examined for root cause and correlation as the basis for most Performance Analysis activities.
How do you define Latency ?

In the past, the most commonly used terminology relating to latency within the field of Computer Science had been "Rotational
Latency". This was due to the huge discrepancy between the
responsiveness of an operation requiring mechanical movement, vs. the flow of electrons between components, where previously the
discrepancy was astronomical (nano seconds vs. milliseconds).
Although the most common bottlenecks do typically relate
to physical disk-based I/O latency, the paradigm of latency is shifting. With today's built in HW caching controllers and memory resident DB's, (along with other optimizations at the HW, media, drivers, and protocols...), the gap has narrowed. Realize that in 1 nanosecond (1 billionth of a second), electricity can travel approximately one foot down a wire (approaching the speed of light).

However, given the industry's latest cpu's running multiple cores at clock speeds upwards of multiple GigaHertz (with >= 1 thread per core, each theoretically executing > 1+ billion instructions per second...), many bottlenecks can
now easily be realized within memory, where the densities have increased dramatically, the distances across huge supercomputer buses (and grids) have expanded dramatically, and most significantly.. the latency of memory has not decreased at the same rate as cpu speed increases.In order to best investigate system latency, we first need to define it and fully understand what we're dealing
with.

LATENCY :

noun
The delay, or time
that it takes prior to a function, operation, and/or transaction
occurring. (my own definition)

adj
(Latent) Present or
potential but not evident or active.

BOTTLENECK :

noun
A place or stage in a process at which progress is impeded.

THROUGHPUT :

noun
Output relative
to input; the amount of data passing through a system from input to
output.

BANDWIDTH :

noun
The amount of
data that can be passed along a communications channel in a given
period of time.

Once again, the all-inclusive entity
that we need to realize and examine in it's entirety is the "Application
Environment", and it's standard
subsystems :

OS / Kernel (System processing)

Processors / CPU's

Memory

Storage related I/O

Network related I/O

Application (User) SW

The "Critical Path" of (End-to-End) System Performance :

Although system performance might frequently be associated with one (or a few) system metrics, we must take 10 steps back and realize that overall system performance is one long inter-related sequence of events (both parallel and sequential). Depending on the type of workload and services running within an Application Environment, the Critical Path
might vary, as each system has it's own performance profile and
related "personality.
Using the typical OLTP RDBMS environment as an example, the Critical Path would include
everything (and ALL Latencies incurred) between :

(NOTE: MANY sub-system components / interactions are left out in this example of a transaction and
response between a client and DB Server)

Categories of Latency :

Latency, in and of itself, simply refers to a delay of sorts. In the realm of
Performance Analysis and Workload Characterization, an association
can generally be made between certain types of latency and a specific
sub-system "bottleneck". However, in many cases the
underlying "root causes of bottlenecks are the result of
several overlapping conditions, none of which individually cause
performance degradation, but together can result in a bottleneck. It is
for this reason that performance analysis is typically an
iterative exercise, where the removal of one bottleneck can easily
result in the creation of another "hot spot elsewhere,
requiring further investigation and /or correlation once a bottleneck
has been removed.

For anyone that has worked in the field with end-users,
they have likely experienced scenarios where users will attribute a
change in application behavior to a performance issue, in many cases
incorrectly. The following is a short list of the top reasons for
a lapse in user perception of system performance :

Deceptive expectations based upon marketing "PEAK" Throughput
and/or CPU clock-speed #'s and promised increases in
performance. (high clock speeds do NOT always equate to higher throughput or better overall performance, especially if ANY bottlenecks are present)

PEAK Throughput #'s can only be achieved if there is NO
bottleneck or related latency along the critical path as described
above. The saturation of ANY sub-system will degrade the performance
until that bottleneck is removed.

The PEAK Performance of a system will be dictated by the performance of it's most latent and/or contentious components (or sub-systems) along the critical path of system performance. (eg. The PEAK bandwidth of a system is no greater than that of it's slowest components along the path of a transaction and all it's interactions.)

As the holy grail of system
performance (along with Capacity Planning.. and ROI) dictates, ... a system that allows for as close to 100% of CPU processing time as possible (vs. WAIT events that pause processing) is what every IT Architect and System Administrator strives for. This is
where systems using CMT (multiple cores per cpu, each with multiple threads per core) shine, allowing for more processing to continue even when many threads are waiting on I/O.

The Application Environment and it's Sub-Systems ... where the bottlenecks can be
found

Within Computing, or more broadly,
Information Technology, "latency" and it's underlying
causes can be tied to one or more specific "sub-systems".
The following list reflects the first level of "sub-systems"
that you will find for any Application Environment :

Subsystem /
Components

Attributes and
key Characteristics

Related
Metrics, Measurements, and/or Interactions

System "Bus"
/ Backplane

Backplane /
centerplane, I/O Bus, etc.. (many types of connectivity and media are
possible, all with individual response times and bandwidth properties).

Other considerations regarding system
latency that are often overlooked include the following, which offers
us a more holistic vantage point of system performance and items that
might work against "Peak system capabilities :

Systems that are MIS-tuned, are accidents waiting to
happen. Only Tune kernel/drivers if you KNOW what you are doing,
or have been instructed by support to do so (and have FIRST tested on a
NON-production system). I can't tell you how many performance
issues I have encountered that were to do administrator "tweaks" to
kernel tunables (to the point of taking down entire LAN segments
!). The defaults are generally the BEST starting point unless a
world-class benchmarking effort is under-way.

The "Iterative"
nature of Performance Analysis and System Tuning

No matter what the root causes are
found to be, in the realm of Performance Analysis and system Tuning,
... once you remove one bottleneck, the system processing
characteristics will change, resulting in a new performance profile,
and new "hot spots"
that require further data collection and analysis. The process is
iterative, and requires a methodical approach to remediation.

Make certain that ONLY ONE (1) change
is made at a time, otherwise, the effects ( + or - ) can not be
quantified.

Hopefully at some point in the future
we'll be operating at latencies measured in attoseconds (10 \^-18th, or
1 quintillionth of a second), but until then .... Happy tuning :)

About

This blog does not reflect the viewpoint or opinions of Oracle or Sun Microsystems. All comments are personal reflections and responsibility of Todd A. Jobson, and are copyrighted from the posted year to current year, to that effect.