Using the Linux top command in batch mode

In this article I'm going to teach you how you can use the popular Linux top command in batch mode on your VPS (Virtual Private Server) or dedicated server to track down possible problematic script executions.

A lot of times there can be just a few scripts on your server that can quickly cause a lot of CPU usage and cause your server's load average to spike. If you've already read my previous articles on either advanced server load monitoring, or on how to create a server load monitoring script, you might have noticed that your server's usage has been spiking. Being able to hop on your server and look for long running script executions can be a good troubleshooting tactic to use to see where this extra usage might be coming from.

The top command on your server can be used to view dynamic real-time information about the processes and scripts running on your server. This information quickly refreshes and sometimes it can be difficult to track down exactly what was happening on the server watching the output from the top command fly by. So in this article I'm going to discuss how to use the batch mode of the top command so that we can log the activity going on, and then go back and review it.

To follow along with the steps below, you'll need to already have root access to either your VPS or dedicated server so that you have full access to all of the processes running on your server.

Run top in batch mode and log activity

Following the steps below I'll show you how to run top in batch mode and output that activity to a log file.

Run the top command with the idle processes being shown, full command line paths being shown, running in batch mode, and with the delay set to .1 seconds so that it quickly refreshes.

egrep -v "top|Tasks|Cpu|Mem|Swap|PID|top icbd|^$|tee -a"

Use the egrep command with the -v flag to not show any lines that include top, Tasks, Cpu, Mem, Swap, PID, top icbd, ^$ which is any blank line, or tee -a. This way we only see lines from top that have process information.

tee -a TOP_LOG

Finally use the tee command with the -append flag to simultaneously write the data from the top command out to a file called TOP_LOG.

After you've let this run for some time to gather data, you can go ahead and hit Ctrl-C to stop the top command from gathering more data.

Parse top batch mode log

Run the following command to parse the data from our TOP_LOG file and sort the processes by the longest amount of CPU time used:

Start a bash for loop where we are setting the variable PID to the value we get from using the TOP_LOG file that we've alreaded used the sort -nk1 command to sort the process IDs numerically, use the awk command to only print out the $1st column, then finally use the uniq command to only grab unique process IDs.

do grep $PID TOP_LOG

Use the grep command to look for the current $PID in our loop from our TOP_LOG file.

sed -e 's#[ ]*$##' -e 's#\([0-9]*\):\([0-9]*\)\.\([0-9]*\)#\1 \2 \3#'

Use the sed command to first replace any blank lines with the -e 's#[ ]*$##' part, then take the CPU minute lines that look like 0:00.20 and break them up with spaces like 0 00 20 with the -e 's#\([0-9]*\):\([0-9]*\)\.\([0-9]*\)#\1 \2 \3#' part.

sort -nk1 -nk11 -nk12 -nk13 -k15

Sort our data numerically by the 1st column which is the process ID, then by the 11th which is the CPU minutes, followed by the 12th which is the CPU seconds, then by the 13th which is the miliseconds, and finally by the 15th column which is the command run.

tail -1; done

Use the tail -1 command so that we only get back 1 entry per process ID, with the way we sorted them previously we should only get the line from the log file that had the highest recorded CPU usage. Then use the done command to complete our bash for loop.

sort -nk11 -nk12 -nk13

Finally sort all of the data numerically so that it's sorted by CPU time used.

So in this case we can see that the /home/userna5/public_html/wp-cron.php script was the longest running script during the time we logged to our TOP_LOG file running for 1 minute and 15.23 seconds. If you have lots of scripts or processes that are having minute long script execution times this can cause server load spikes.

Once you are done reviewing the data in our TOP_LOG file you can run the following command to remove this file:

rm -rf ./TOP_LOG

You should now understand how you can use the top command in batch mode to help better troubleshooting long script executions.