trapstat

Synopsis

Description

The trapstat utility gathers and displays run-time trap statistics on UltraSPARC-based systems.
The default output is a table of trap types and CPU IDs,
with each row of the table denoting a trap type and each
column of the table denoting a CPU. If standard output is a terminal,
the table contains as many columns of data as can fit within
the terminal width; if standard output is not a terminal, the table
contains at most six columns of data. By default, data is gathered
and and displayed for all CPUs; if the data cannot fit in a
single table, it is printed across multiple tables. The set of CPUs
for which data is gathered and displayed can be optionally specified with
the -c or -C option.

Unless the -r option or the -a option is specified, the value
displayed in each entry of the table corresponds to the number of
traps per second. If the -r option is specified, the value corresponds
to the number of traps over the interval implied by the specified sampling
rate; if the -a option is specified, the value corresponds to the
accumulated number of traps since the invocation of trapstat.

By default, trapstat displays data once per second, and runs indefinitely; both
of these behaviors can be optionally controlled with the interval and count
parameters, respectively. The interval is specified in seconds; the count indicates the number
of intervals to be executed before exiting. Alternatively, command can be specified,
in which case trapstat executes the provided command and continues to run
until the command exits. A positive integer is assumed to be an interval;
if the desired command cannot be distinguished from an integer, the full
path of command must be specified.

UltraSPARC I (obsolete), II, and III handle translation lookaside buffer (TLB) misses
by trapping to the operating system. TLB miss traps can be a
significant component of overall system performance for some workloads; the -t option
provides in-depth information on these traps. When run with this option, trapstat displays
both the rate of TLB miss traps and the percentage of
time spent processing those traps. Additionally, TLB misses that hit in the
translation storage buffer (TSB) are differentiated from TLB misses that further miss in
the TSB. (The TSB is a software structure used as a translation
entry cache to allow the TLB to be quickly filled; it is
discussed in detail in the UltraSPARC II User's Manual.) The TLB and TSB miss information
is further broken down into user- and kernel-mode misses.

Workloads with working sets that exceed the TLB reach may spend a
significant amount of time missing in the TLB. To accommodate such workloads,
the operating system supports multiple page sizes: larger page sizes increase the
effective TLB reach and thereby reduce the number of TLB misses. To
provide insight into the relationship between page size and TLB miss rate, trapstat
optionally provides in-depth TLB miss information broken down by page size using
the -T option. The information provided by the -T option is a
superset of that provided by the -t option; only one of -t
and -T can be specified.

Options

The following options are supported:

-a

Displays the number of traps as accumulating, monotonically increasing values instead of per-second or per-interval rates.

-ccpulist

Enables trapstat only on the CPUs specified by cpulist.

cpulist can be a single processor ID (for example, 4), a range of processor IDs (for example, 4-6), or a comma separated list of processor IDs or processor ID ranges (for example, 4,5,6 or 4,6-8).

-Cprocessor_set_id

Enables trapstat only on the CPUs in the processor set specified by processor_set_id.

trapstat modifies its output to always reflect the CPUs in the specified processor set. If a CPU is added to the set, trapstat modifies its output to include the added CPU; if a CPU is removed from the set, trapstat modifies its output to exclude the removed CPU. At most one processor set can be specified.

-eentrylist

Enables trapstat only for the trap table entry or entries specified by entrylist. A trap table entry can be specified by trap number or by trap name (for example, the level–10 trap can be specified as 74, 0x4A, 0x4a, or level-10).

entrylist can be a single trap table entry or a comma separated list of trap table entries. If the specified trap table entry is not valid, trapstat prints a table of all valid trap table entries and values. A list of valid trap table entries is also found in The SPARC Architecture Manual, Version 9 and the Sun Microelectronics UltraSPARC II User's Manual. If the parsable option (-P) is specified in addition to the -e option, the format of the data is as follows:

Field

Contents

1

Timestamp (nanoseconds since start)

2

CPU ID

3

Trap number (in hexadecimal)

4

Trap name

5

Trap rate per interval

Each field is separated with whitespace. If the format is modified, it will be modified by adding potentially new fields beginning with field 6; exant fields will remain unchanged.

-l

Lists trap table entries. By default, a table is displayed containing all valid trap numbers, their names and a brief description. The trap name is used in both the default output and in the entrylist parameter for the -e argument. If the parsable option (-P) is specified in addition to the -l option, the format of the data is as follows:

Field

Contents

1

Trap number in hexadecimal

2

Trap number in decimal

3

Trap name

Remaining

Trap description

-P

Generates parsable output. When run without other data gathering modifying options (that is, -e, -t or -T), trapstat's the parsable output has the following format:

Field

Contents

1

Timestamp (nanoseconds since start)

2

CPU ID

3

Trap number (in hexadecimal)

4

Trap name

5

Trap rate per interval

Each field is separated with whitespace. If the format is modified, it will be modified by adding potentially new fields beginning with field 6; extant fields will remain unchanged.

-rrate

Explicitly sets the sampling rate to be rate samples per second. If this option is specified, trapstat's output changes from a traps-per-second to traps-per-sampling-interval.

-t

Enables TLB statistics.

A table is displayed with four principal columns of data: itlb-miss, itsb-miss, dtlb-miss, and dtsb-miss. The columns contain both the rate of the corresponding event and the percentage of CPU time spent processing the event. The percentage of CPU time is given only in terms of a single CPU. The rows of the table correspond to CPUs, with each CPU consuming two rows: one row for user-mode events (denoted with u) and one row for kernel-mode events (denoted with k). For each row, the percentage of CPU time is totalled and displayed in the rightmost column. The CPUs are delineated with a solid line. If the parsable option (-P) is specified in addition to the -t option, the format of the data is as follows:

Field

Contents

1

Timestamp (nanoseconds since start)

2

CPU ID

3

Mode (k denotes kernel, u denotes user)

4

I-TLB misses

5

Percentage of time in I-TLB miss handler

6

I-TSB misses

7

Percentage of time in I-TSB miss handler

8

D-TLB misses

9

Percentage of time in D-TLB miss handler

10

D-TSB misses

11

Percentage of time in D-TSB miss handler

Each field is separated with whitespace. If the format is modified, it will be modified by adding potentially new fields beginning with field 12; extant fields will remain unchanged.

-T

Enables TLB statistics, with page size information. As with the -t option, a table is displayed with four principal columns of data: itlb-miss, itsb-miss, dtlb-miss, and dtsb-miss. The columns contain both the absolute number of the corresponding event, and the percentage of CPU time spent processing the event. The percentage of CPU time is given only in terms of a single CPU. The rows of the table correspond to CPUs, with each CPU consuming two sets of rows: one set for user-level events (denoted with u) and one set for kernel-level events (denoted with k). Each set, in turn, contains as many rows as there are page sizes supported (see getpagesizes(3C)). For each row, the percentage of CPU time is totalled and displayed in the right-most column. The two sets are delineated with a dashed line; CPUs are delineated with a solid line. If the parsable option (-P) is specified in addition to the -T option, the format of the data is as follows:

Field

Contents

1

Timestamp (nanoseconds since start)

2

CPU ID

3

Mode k denotes kernel, u denotes user)

4

Page size, in decimal

5

I-TLB misses

6

Percentage of time in I-TLB miss handler

7

I-TSB misses

8

Percentage of time in I-TSB miss handler

9

D-TLB misses

10

Percentage of time in D-TLB miss handler

11

D-TSB misses

12

Percentage of time in D-TSB miss handler

Each field is separated with whitespace. If the format is modified, it will be modified by adding potentially new fields beginning with field 13; extant fields will remain unchanged.

Examples

Example 1 Using trapstat Without Options

When run without options, trapstat displays a table of trap types and
CPUs. At most six columns can fit in the default terminal width;
if (as in this example) there are more than six CPUs, multiple
tables are displayed:

The -t option displays in-depth TLB statistics, including the amount of time
spent performing TLB miss processing. The following example shows that the machine
is spending 14.1 percent of its time just handling D-TLB misses:

By specifying the -T option, trapstat shows TLB misses broken down by
page size. In this example, CPU 0 is spending 7.9 percent of
its time handling user-mode TLB misses on 8K pages, and another 2.3
percent of its time handling user-mode TLB misses on 64K pages.

By specifying the -e option, trapstat displays statistics for only specific trap
types. Using this option minimizes the probe effect when seeking specific data.
This example yields statistics for only the dtlb-prot and syscall-32 traps on
CPUs 12 through 15:

The following example uses the -r option to specify a sampling rate
of 1000 samples per second, and filter only for the level-10 trap.
Additionally, specifying the -P option yields parsable output.

Notice the timestamp difference between the level-10 events: 9,998,000 nanoseconds and 10,007,000
nanoseconds. These level-10 events correspond to the system clock, which by default
ticks at 100 hertz (that is, every 10,000,000 nanoseconds).

Notes

When enabled, trapstat induces a varying probe effect, depending on the type
of information collected. While the precise probe effect depends upon the specifics
of the hardware, the following table can be used as a rough
guide:

Option

Approximate probe effect

default

3-5% per trap

-e

3-5% per specified trap

-t, -T

40-45% per
TLB miss trap hitting in the TSB, 25-30% per TLB miss trap
missing in the TSB

These probe effects are per trap not for the system as a whole.
For example, running trapstat with the default options on a system that
spends 7% of total time handling traps induces a performance degradation of
less than one half of one percent; running trapstat with the -t
or -T option on a system spending 5% of total time processing
TLB misses induce a performance degradation of no more than 2.5%.

When run with the -t or -T option, trapstat accounts for its
probe effect when calculating the %tim fields. This assures that the %tim
fields are a reasonably accurate indicator of the time a given workload is
spending handling TLB misses — regardless of the perturbing presence of trapstat.

While the %tim fields include the explicit cost of executing the TLB
miss handler, they do not include the implicit costs of TLB miss
traps (for example, pipeline effects, cache pollution, etc). These implicit costs become
more significant as the trap rate grows; if high %tim values are reported
(greater than 50%), you can accurately infer that much of the balance
of time is being spent on the implicit costs of the TLB
miss traps.

Due to the potential system wide degradation induced, only the super-user can
run trapstat.

Due to the limitation of the underlying statistics gathering methodology, only one
instance of trapstat can run at a time.