This chapter is from the book

Successful systems administration relies on clearly and thoroughly understanding how a system uses its resources—its current hardware and software components—and understanding their interaction with, and dependencies upon, user workflows and running services. This chapter discusses the tools and knowledge required to effectively evaluate several key aspects of an existing infrastructure, including current utilization of bandwidth, services, hardware, and storage. This evaluation enables you to plan and implement changes to an existing infrastructure that minimize interruption to the system’s operation and maximize the user’s productivity.

Determining Current Utilization

It’s important to monitor utilization after a new system has been set up, or if an existing system has been serving users for a while. This information is critical to know for new systems to ensure that they run with the best performance. For existing systems, changes in usage (such as a group of video users that have changed to high-definition files) should cause you to reevaluate the setup to make sure that the system is still adequately meeting needs.

Chapter 1, “Planning Systems,” defines utilization as the ratio of usage to capacity. In that chapter, you planned for future capacity; here, your concern is watching a current, running system. Fortunately, Mac OS X includes many utilities that can help you determine system utilization.

Computing Network Bandwidth Utilization

How do you determine utilization—ratio of usage to capacity—for network bandwidth? Chapter 1, “Planning Systems,” explains the concept, but you need a way to obtain the values used in the computation. No single command neatly lays this out, but a series of commands enables you to assemble all the information.

The first piece of information comes from the netstat command, which displays network status information, including total bytes received and transmitted by a particular interface. To interrogate a network interface (represented here as en0), use the following command:

The -I switch specifies the interface, and the -b switch asks netstat to display bytes in and bytes out of that interface. These values are taken from the time that the system boots.

You can also figure out how long the system has been running since boot time, with the uptime command:

$ uptime
8:16 up 16:41, 10 users, load averages: 0.08 0.15 0.20

That’s great output for a person, but not great for a computer. A running variable stores boot time in seconds since the UNIX epoch (the UNIX epoch is the time 00:00:00 UTC on January 1, 1970), accessible by sysctl:

sysctl enables you to get or set kernel variables. To obtain a full list of variables, use the -A switch.

You can also retrieve the current date in terms of seconds with the date command:

$ date +%s
1208175680

That gives you enough information to compute the average utilization of a given interface. Because you’ll want to assess this value from time to time, you can automate this entire routine.

This script is pretty straightforward math, with basic definitions of bits, bytes, and megabytes (automation and scripting will be introduced in Chapter 9, “Automating Systems”). The script uses line numbers for easier reference:

The definitions on lines 4 and 5 are hard-coded into this script; update as necessary. Line 8 performs the same sysctl call presented previously, but then cleans up the output to retrieve only the boot time timestamp. Similarly, lines 11 and 12 reduce the output of netstat to only the “bytes in” and “bytes out” of an interface. Also of note in the script is the use of bc to perform floating-point calculations, which the Bash shell cannot do alone (lines 27 through 31). The math here is rudimentary:

Read an interface’s activity in bytes (lines 11 and 12).

Convert the results to bits by multiplying by 8—8 bits to a byte, remember? (lines 13 and 14).

Convert bytes to megabits (Mbit)—unused in this script, but a good exercise (lines 15 and 16).

Gather total seconds of uptime by subtracting boot time in seconds since the UNIX epoch from current date in seconds from the UNIX epoch (line 22).

Compute average bandwidth in bits per second by dividing total bits on an interface by seconds of uptime (line 25 and 26).

Convert bit/s to Mbit/s by dividing by 1,000,000 (10^6) (lines 27 and 28).

Compute utilization by dividing used bandwidth per second by the interface’s capacity.

For this script to work properly, you need to set the appropriate definitions (interface name and capacity) at the top of the script. Chapter 9, “Automating Systems,” shows ways to refine this script.

To single out current utilization statistics—network throughput and currently connected users—for the Apple Filing Protocol (AFP) and Server Message Block (SMB) file-sharing services on Mac OS X Server, you can use the serveradmin command with the fullstatus verb. Each service displays its current throughput in bytes per second. For example, to display the statistics for AFP, use the following command with root-level access:

The afp:currentThroughput key contains the value of current AFP throughput. To single out throughput, pass the output through the grep command. For example, to single out the current throughput for the SMB service, use the following command:

To list currently connected users and information on each user, serveradmin allows commands to be specified. The command for AFP and SMB is getConnectedUsers. For example, on a server with one user connected via SMB, the command and output would look like this:

To gather information on currently connected AFP users, use the corresponding afp command: afp:command = getConnectedUsers.

Determining Services and Hardware Utilization

It’s important for an administrator to understand the resources that individual programs consume on a given piece of hardware, as the two are intrinsically linked. Running services use hardware resources. Is the use of resources effective? Overwhelming? Can certain services be paired with other services? Service utilization refers to the impact of a single service, and hardware utilization refers to considering the hardware as a whole (for example, looking at memory utilization).

The Server Admin framework is specific to Mac OS X Server and can report on unique information. The GUI-based Server Admin.app can display graphs of CPU utilization over an adjustable range of time. To view these graphs, launch Server Admin.app, authenticate when prompted, select the server in question, and choose the Graphs button in the toolbar.

The default view displays CPU usage for the past hour. You can expand the time range to the past seven days using the pop-up menu in the bottom right corner of the window.

Server Admin can also list services being provided by a server. Currently running services are indicated by a green ball next to their name in the servers list at the left of the Server Admin window. Additionally, services configured and running appear on the Overview page of Server Admin, along with high-level graphs of system utilization for CPU percentage, network bandwidth, and disk storage.

The information provided by the Server Admin framework is valuable, but may not tell a full story. It does not report service status for installed third-party software. Also, the Server Admin tools are specific to Mac OS X Server; there needs to be a way to assess workstation usage as well.

The most straightforward tool is ps, or process status. Typically, executing ps on its own, with no switches, is of little value. By itself, ps simply shows running processes that are owned by the calling ID and attached to a terminal. Of more interest is a list of all processes, owned by any user, with or without a controlling terminal. You can easily achieve such a list with the following command, run with an admin-level account:

The a switch, when combined with the x switch, causes ps to display all processes, from any user, with or without a controlling terminal. However, this does not tell the entire story. Each process in that ps list uses resources—but how much?

You can determine CPU percentage, load, and idle percentage with the top command, which is covered extensively in “top, CPU%, and Load Averages” in Chapter 8, “Monitoring Systems.” You can also find the load average statistic in other places. The uptime command displays load average along with the machine uptime:

Additionally, you can fetch the load average directly from a sysctl variable, vm.loadavg:

$ sysctl vm.loadavg
vm.loadavg: { 0.54 0.68 0.51 }

You can also find CPU percentage and load averages with the iostat command, covered in the next section, “Determining Storage Utilization.”

Each process places load on the CPU by asking it to do work, in the form of making system call requests and placing an instruction to execute in the run queue. To determine which process currently is making the most system call requests, DTrace and Instruments utilities also are very helpful. Both utilities are covered in “Instruments and DTrace” in Chapter 8, “Monitoring Systems.”

You can also find virtual memory statistics with the top command, and view them in more detail using vm_stat. Most of the vm_stat columns are the same columns that you can view with the top command: free, active, inac (inactive), wire (wired), pageins, and pageout. If you do not specify an interval, vm_stat prints only a total and exits. If you add a numeric value after vm_stat and run it, it prints statistics repeatedly at the interval specified in seconds (to stop the listing, press Control-C):

Unlike the earlier exercise of writing a script to determine network bandwidth, you can use vm_stat to report on total statistics gathered since bootup. If you run vm_stat with a repeat interval, you should not be surprised by the first set of statistics printed under each banner: a lifetime-accumulated total (since bootup).

The columns have the following significance:

faults—Number of times the memory manager faults a page.

copy—Pages copied due to copy-on-write (COW). COW is a memory-management technique that initially allows multiple applications to point to the same page in memory as long as it is read-only. However, if any of those applications needs to write to that memory location, it cannot without changing COW for every other application pointing to that location. If an application tries to write to a shared memory location, it instead gets a copy; the original is left intact. The pages copied statistic shows how many times an application tries to write to a shared memory location. It’s an interesting statistic for administrators in some ways, but they can do little about it, short of choosing not to run certain applications that cause the behavior.

zerofill—Number of times a page has been zero-fill faulted on demand: A previously unused page marked “zero fill on demand” was touched for the first time. Again, there’s not much an administrator can do about this particular value.

reactive—Not what it sounds like; the number of times a page has been reactivated (or, moved from the inactive list to the active list).

See the vm_stat man page for further information.

Determining Storage Utilization

In addition to memory using resources, bandwidth and capacity can also use up storage resources. Systems administrators have several tools they can use to determine input/output activity, disk capacity use, and disk usage for a given part of the disk hierarchy, as well as to pinpoint details about file and disk activity. These tools include iostat, df, system_profiler, du, and Instruments and dtrace, respectively.

iostat displays I/O statistics for terminals and storage devices (disks). Similar to vm_stat, iostat can report on total statistics since bootup, or at a given interval. Running iostat solely with an interval is useful for displaying disk transactions, CPU statistics, and load average; to stop the listing, press Control-C. The -w switch specifies the wait interval between refreshing statistics:

Often, the reason to use iostat is to focus solely on the disk statistics. To drop the CPU and load information—the same information available from the top utility—use the -d switch. To further focus on a specific disk or disks, you can add the device node name or names to the command:

The -T switch limits the display to file systems of a certain type, in this case, hierarchical file system (HFS) (which also implies HFS Plus, the default file system for Mac OS X Leopard). The -h switch causes df to display capacities in “human-readable” format (output uses byte, kilobyte, megabyte, gigabyte, terabyte, and petabyte suffixes, as necessary, rather than blocks).

system_profiler is a versatile Mac OS X–specific utility. It excels at querying Macintosh hardware. Along with the other command-line utilities presented here, system_profiler can also report on the total capacity and available space on a storage device. For example, to display detailed information on all Serial Advanced Technology Attachment (ATA)–connected disks, use the SPSerialATADataType command:

While df is perfect for quickly determining the overall use of a mounted storage device, you often need more detail. The du (“disk usage”) command answers the questions “Where is storage being allocated on a given file system?” and “Where is all the space going?”

Running du with no options will, for the current directory, list each file and directory along with the number of blocks occupied by the given object:

The -h switch generates “human-readable” output, as seen in the df command described previously in this section. The -d switch causes du to output entries only at the given depth, with the current directory being 0, immediate subdirectories being 1, and so on. Use the -c switch to print a final, grand-total line. Also, instead of simply summing up the current directory, you can name the directory path, which in this case is named /Users.

Finally, Instruments.app is an ideal way to examine file activity and impact on a storage system for one or more processes. If df and du do not provide the information that you need, Instruments, with its capability to finely detail file and disk activity, and dtrace offer the necessary power and depth to provide that information. For more information on Instruments and dtrace, see the section “Instruments and DTrace” in Chapter 8, “Monitoring Systems.”