Performance Monitoring Tools for Linux

Mr. Gavin provides tools for systems data collection and display and discusses what information is needed and why.

Collecting the Data

The file /proc/stat contains current counters for most of the
data I wanted, and it is in a readable format. In order to keep the
collector script as quick and simple as possible, I saved the data
in a readable format rather than as binary data.

Breaking down and reorganizing the data for storage was a
good job for awk, writing the data
out to different files depending on the type of data. The /proc
files are formatted nicely for this; each record has an identifying
name in the first field. Here's a sample of /proc/stat from my 486
system:

I dug into the kernel source for the /proc file system to
figure out what the various fields were, as the man pages seem to
date back to 1.x.

cpu: contains the following
information: jiffies (1/100 of a second) spent in
user/nice/system/idle states. I wasn't too concerned about the
actual measurement, as I was just planning on looking at each state
as a percentage of the total.

disk: summarizes all I/O to each
of the four disks, while disk_rio,
disk_wio, disk_rblk and
disk_wblk break down the total into read, write,
blocks read and blocks written.

page: page in and out
counters

swap: counts of pages swapped in
and out. The swap data in /proc/meminfo is expressed as total
pages, used and free. Combine both sets of data to get a clear
picture of swap activity.

intr: total interrupts since
boot time, followed by counts for each interrupt.

ctxt: the number of context
switches since boot time. This counts the number of times one
process was “put to sleep” and another was “awakened”.

btime: I haven't found much use
for this—it is the number of seconds after January 1, 1970 that
the system was booted.

processes: the most recent
process identification number. This is a good way to see how many
processes have been spawned since the last check, so by subtracting
the old value from the current one and dividing by the time
difference (in seconds) between the two observations, the number of
new processes per second is known and can be used to measure how
busy the system is.

Network activity counters are found in the /proc/net/dev file; an
example of this file is shown in Table
1.

The lines we want here are the
ethx and
pppx records. In the
collector script, the data is written out to a file using the full
interface name. This way, the script is generalized for most any
configuration.

Memory utilization can be tracked in the /proc/meminfo file
as shown in Table 2.

The memory counters are expressed twice in this file, so we
need to save only the Mem: and
Swap: records to get the whole picture. The
script matches the keywords at the start of the line and writes the
data out to individual files rather than to one large database to
allow more flexibility as new fields or data types are added. This
makes for a cluttered directory but simpler script writing.

The script that collects the data is shown in
Listing 1. Here are some things
that are going on in a few key parts, plus comments:

Line 13: move to the directory where the data is to
be stored using cd.

Line 14: get the timestamp for the data records in
format HHMM.

Line 15: get the date for the output data file
names in format MonDD.YY

Lines 19 - 25: select the memory and swap counter
lines from /proc/meminfo and write the timestamp and data portion
of the record to Mem.MonDD.YY and Swap.MonDD.YY.

Lines 29 - 36: extract the counters for any network
interfaces from /proc/net/dev and write them out to files including
the interface numbers, i.e., eth0 data is written out to
eth0.MonDD.YY.

The data accumulates over the course of the day to provide the data
points for analysis. A cleanup script invoked by the second line
removes each file after two weeks to keep the disk space
requirements down. A possible enhancement might be to compress each
file after it is complete, but space hasn't been much of an issue
yet.

Comment viewing options

When you want to do network monitoring you need a network monitoring system also known as network monitoring software or a network monitoring tool. If you are looking then try SysOrb for free. http://www.evalesco.com/

There's been some progress in the last 12 years or so...for example, Zoom from RotateRight ( http://www.rotateright.com ) provides a rich GUI or CLI-based system-wide profiler for Linux. It takes callstacks with every sample and can show source and assembly code for any sampled function.

The sarChart.cgi script has a bug in it. It reads from the tstamp column in each table incorrectly. To calculate the time it uses substr to extract the hour and min, but the offset parameter is off by 2 in both cases. This problem is probably due to changing the length of the year from 2 to 4 digits.

Description of the columns in the CPU output is incorrect:
0000 4690259 69915 661038 7937582
Column 5: seconds in idle state since last booted
Column 2: seconds in system state since last booted
Column 3: seconds in nice state since last booted
Column 4: seconds in user state since last booted
Column 1: time-stamp of observation (HHMM)