Search form

You are here

Home › Archives › Examples of using SAR command for system monitoring in Linux › Examples of using SAR command for system monitoring in Linux

Examples of using SAR command for system monitoring in Linux

Submitted by Sarath Pillai on Mon, 11/26/2012 - 03:42

System Activity Reporter is an important tool that helps system administrators to get an overview of the server machine with status of different important metrics at different points of time.

If suppose you are having an issue with the system currently, Like some of your customers are unable to list some data from the database. The first thing that most of the Linux system administrators do is to recall the same issue when it previously occurred, and If you remember the day of its previous occurrence then you can easily compare the internal system statistics with the current statistics.

SAR is very much helpful in doing exactly that.

The first thing that we need to do is check and confirm whether you have SAR utility installed on the machine. Which can be checked by listing all rpm's and finding for this utility.

SAR is one of the utility inside sysstat. You can easily download and install it in your machine very easily through YUM. (But yeah dont worry because most of the distribution comes prepacked with sysstat tool).

[root@myvm1 ~]# yum install sysstat

Yeah but make it sure that you have epel,rpmformge repository enabled for installing. Otherwise your distribution DVD will be a nice place to look for the package.

SAR (System Activity Reporter) will Give Information about the following things:

System Buffer activity

Information about system calls

Block device information

Overall paging information

Semaphore and memory allocation information

CPU utilization and process report

The main thing that we need to understand regarding SAR is that, everything is done using a cron. By default in many Linux distribution you will have a file named /etc/cron.d/sysstat.

Lets see how really SAR works.

If we start thinking about system monitoring, then the tool must have each and every data about the system's different aspects and must cover all time intervals. Which means a monitoring system must be able to provide the statistics of the machine for a given time.

There is no way, other than taking all the metrics and statistics of the machine at a definite time interval. Reducing the time interval for collecting the statistics will increase the amount of detailed statistics we have(because we will be having more data about the system).

SAR does exactly that. sar takes the statistics of different aspects of the machine at a definite time interval. So SAR runs through CRON.

So it can be seen from the above cron file for SAR that its running "sa1" script located at "/usr/lib64/sa/" at every 10 minutes

And is also running a script /usr/lib64/sa/sa2 at the end of the day at around 23.53

So the first cron entry for SAR(/usr/lib64/sa/sa1) will run every 10 minutes which inturn will call the sadc utility to collect system stats and store it in a binary file (one file for a day)

And the second cron entry will dump all the contents of that binary file into another text file, and purges data older than a particular number of days, Normally 7 days by default(which is mentioned in the following file),

So although the system statistics is being collected every 10 minutes through cron(modify the cron to run every 1 minute for more accurate information) If you want to see the stats, then you need to run the command as below.

It can be seen from the output that its reporting me the output of the collected stats for every minute(which means i have my cron at 1 minute interval), and will show the details of the whole day(or will show details collected till when you typed the command).

Understanding the output of SAR command

%user: This shows the total time that the processor is spending on different process YCX5UKN5ZKEJ

%sys: this shows the percentage of time spend by the processor for operating system tasks(because the previous user shows the time spend for user end process)

%iowait: the name iowait itself suggests that its the time spend by processor waiting for devices(input and output)

the proc file /proc/net/dev shows the total no of packets and bytes received by a network interface(its a total number, hence a large value). If you see the values in the columns named packets & bytes in /proc/net/dev, the values goes on increasing each time you do a cat /proc/net/dev(its a more real time data).

However the output you see in sar -n DEV command is a per second statistics calculated. If you want to compare both of these values (values in sar -n dev and /proc/net/dev ) i would suggest to calculate the difference (by subtracting the values you find each time you do a cat /proc/net/dev ).

The difference in values that you find by doing two consecutive(typed per second) cat /proc/net/dev will be somewhere near the per second values shown by sar -n dev.

So the %CPU shown in the top output for each processes implies the actual percentage of cpu utilization by that process for that particular time (say 3 sec), whereas %CPU shown in the "ps -eo pid,%cpu" command implies the % of processing time allocate by the CPU (i.e out of 100 cpu%, % of CPU time allocated for process) for each processes, but it may not be used by the processes.

To be more accurate both the ps command output and the top command output shows you the same values as far as percent of cpu usage by a process.

However top command is much better coz it will give you a more real time statistics compared to ps command.

You can even mange to get a real time statistics using ps command, but for that you need to rerun the same ps command over and over again. Hence for a more accurate monitoring you can take the help of something called as watch command. Watch command will repeat the command passed to it, at an interval of seconds you mention. For example,

watch -n 1 'ps -eo pcpu,pid,user,args | sort -k 1 -r | head -40

the above shown watch command will update the statistics every 1 seconds, basically it will rerun the command every one second and keep on updating the output on the screen (this is default property of top command.)

I would like to know about the network status from sar output that means how exactly i know that my network traffic is bad from the "sar -n DEV" output what is the maximum threshold value from which we can identify that our network traffic is really bad?

For example i ran tcpdump command on one terminal and on another terminal i captured the sar output, Please see below & let me know the below values shown are fine that my eth0 is normal? if yes how do you say that?