Note: the load used to be 50%. Moreover, column VIRT is usually 0. However, once in a while it goes to this stuck mode where 90% of CPU is used by kernel. Don't know what the hell is that kernel doing.

very little CPU is used by users. No wonder load is extremely high. But what? What is the kernel doing>

I'd be more worried about the load if I were you. Also, it's a common misconception that VM equals swap.
–
schaibaFeb 16 '13 at 12:47

For virt and res, it is explained in that man page you link to. What are your scripts doing? (Interesting username.)
–
MatFeb 16 '13 at 13:34

You have 12 zombies and very high load which are things to worry about.
–
Naai SekarFeb 16 '13 at 14:17

A zombie is a process that has finished, whose parent hasn't, but the partent has not asked for the statistics of the dead child. Sloppy programming, a bad sign for sure (but nothing urgent to worry about).
–
vonbrandFeb 16 '13 at 14:33

What does 92m, 31m, etc mean? Why so much cpu usage in kernel?
–
Jim ThioFeb 16 '13 at 15:09

2 Answers
2

I'm copying this from a man page I wrote for plog, since I was trying to make it clear there:

It is important to understand the difference between virtual address
space and physical memory in interpreting some of the above
statistics. As the name implies, virtual address space is not real;
it’s basically a map of all the memory currently allocated to a
process. The limit on the size of this map is the same for each
processes (generally, 2-4 GB), and it is not accumulated (ie, you may
have dozens or hundreds of processes, each with its own 2-4 GB virtual
address space, on a system that only actually has 512 MB of physical
memory).

Data cannot actually be stored or retrieved from virtual address
space; real data requires real, physical memory. It is the kernel’s
job to manage one in relation to another. Virtual space stats
(VirtualSz, Data+Stack, and Priv&Write) are useful for considering the
structure of a process and the relationship to physical memory use,
but with regard to amount of RAM actually used, the physical memory
stats (ResidentSz, Share, and Proportion) are what counts.

Top doesn't quite have all those metrics, but the VIRT score is virtual address space, RES refers to physical memory as does SHR. If you are concerned about relative memory usage (ie, one process compared to another), the RES score is more relevant.

Certain parts of VIRT are relevant relative to other processes; visors such as openVZ limit containers based on the total amount of private writable address space, not RSS. Top doesn't report this, but pmap and plog do (see the plog manpage for "Priv&Write"; this was actually part of my motive when writing it).

Note that RES is the part in memory of the virtual address space, and lots of that will be shared among processes.
–
vonbrandFeb 16 '13 at 14:30

@vonbrand It's not documented in man proc, but the (linux) kernel (as of 2.6.something?) reports a Pss figure in /proc/<pid>/smaps. This is like Rss (top's "RES") except it is just the unshared space plus the shared regions divided by the number of other processes sharing them, which is pretty handy. Unfortunately top doesn't report that (but plog does ;)). Simply subtracting SHR from RES isn't very accurate, because SHR is real. If there are only two processes accessing a library, that space is "shared", but not much.
–
goldilocksFeb 16 '13 at 15:01

@Thio : no idea. The kernel is spinning its wheels over something. Try ps -A -o state,pid,comm | grep "^D". If you see anything, then you may have disk corruption, which will cause a system I/O busy loop (google "uninterruptible sleep"). That's all I can think of, not knowing anything else about the system itself. It's unlikely tho, as I think those D processes would be evident with high CPU usage in top.
–
goldilocksFeb 16 '13 at 16:54