The Solaris 9 Operating System contains a feature to enable the use of larger memory page sizes for the heap and stack segments of a program. This article explains how to use this feature to deliver significant performance gain for a large range of applications. This article addresses a reader with an intermediate to advanced knowledge level.

Like this article? We recommend

One of my favorite features of the SolarisTM 9 Operating System (Solaris
OS) is multiple page size support (MPSS). Why? Because it's one of the
easiest ways to achieve a significant performance gain for a large range of
applications.

Memory intensive applications that have a large working set often perform
suboptimally on the Solaris OS without a little tuning. This is because they
make inefficient use of the microprocessor's translation lookup buffer
(TLB) facility. MPSS allows you to exploit larger page sizes for the
microprocessor's memory management unit (MMU, or M-Emu), which allows more
efficient use of the TLB, ultimately resulting in improved application
performance.

Applications most likely to benefit from MPSS typically have working sets
greater than a few hundred megabytes, and are memory intensive. Because the TLB
can only hold a few hundred translations at a time, these applications typically
overflow the microprocessors TLB. The Solaris kernel services overflows from the
UltraSPARCTM TLB, which can result in a significant amount of
system-software time.

There is a catch, however. Regular performance tools like mpstat,
sar, and vmstat do not report the time spent processing TLB
overflows (we refer to them as TLB misses) as system time. Instead, they report
an application's TLB misses as user time. This can be quite misleading
because it can appear that the CPU is spending all of its time running an
application when, in fact, it is really spending a large fraction of time in the
kernel.

This Sun BluePrintsTM OnLine article explains how to engage the MPSS
feature on the Solaris OS and how to analyze its effect on performance. It
briefly explains the hardware feature being exploited, how to measure the usage
of this hardware feature with standard Solaris OS tools, and the ways by which
users and programmers can invoke the feature. The article doesn't explain
the underlying theory in great detail, but provides working examples and
references to help you locate additional information on the subject.

Catching Your Emu With trapstat

To help you determine how frequently an application overflows the TLB, the
Solaris 9 OS introduces a new tool called trapstat. This tool provides
an easy way to measure the time spent in the kernel servicing TLB misses. Using
the -t option, trapstat reports how many TLB misses occur and
what percentage of the total CPU time is spent processing TLB misses.

The -t option provides first-level summary statistics. Time spent
servicing TLB misses is summarized in the lower right corner of the report. As
shown in the following example, 46.2 percent of the total execution time is
spent servicing TLB misses. The TLB miss detail is broken down to show TLB
misses incurred in the data portion of the address space (dTLB) and for the
instruction portion of the address space (iTLB). Data are also provided for user
misses (u) and kernel-mode misses (k). We are primarily interested in the
user-mode misses because our application likely runs in user mode.

We can conclude from this output that the application could potentially run
almost twice as fast if we could eliminate the majority of the TLB misses. Our
objective in using the mechanisms discussed in the following paragraphs is to
minimize the user-mode data TLB misses (dTLB) by instructing the application to
use larger pages for its data segments. Typically, data misses are incurred in
the program's heap or stack segments. We can use the Solaris 9 OS MPSS
commands to direct the application to use 4-megabyte pages for its heap, stack,
or anonymous memory mappings.