Here's a scenario. A Linux system is reported for being slow and not quite working as you would expect. A
preliminary examination shows nothing out of ordinary. You do your due diligence, and run the routine bunch of
commands, which only leads to a gentle shrug of gentle frustration. Nothing wrong seems to be afoot. Hmm,
perhaps the memory usage seems to be a little high. But why? The plot thickens.

Today, you are going to learn how to cope with seemingly crazy problems that defy the simple mathematics and
your logic as the system administrator, or perhaps, a highly enthusiastic user, keen on fixing a wonky box.
After me.

Problem

Our problem, in more details. So Let's say you have this misbehaving Linux box that is churning memory and swap
like a pro. Now, you find this highly suspicious, because the earlier, preliminary examination of the system
memory usage revealed no reason as to why your box ought to behave the way it does. But let us be even more
precise.

The system has 48GB RAM. If you consult top, it reveals a single heavy hitter with an 18GB real set, but other
than that, there do not seem to be that many processes with high memory consumption, and the sum of all of the
processes does not amount to what the total memory usage supposedly is. Looking at the system cache and
buffers, again, the total is only 2.2GB.

If you recall my older system hacking guides and howtos, then you will remember that Linux memory usage is a
rough approximation. Commands like free and top
will report nice sums, but they are not 100% accurate. For one thing, the free memory field is the most
misleading one, because it may lead you to believe that the rest of the stuff is taken away. However, you also
need to account the buffers and cached fields as free memory, since they are readily available to new
processes. For instance:

Here, you may assume this 128GB system has only 1.6GB free. But this is not correct. More than 100GB virtual
memory is safely cached. This means that that you should treat the system usage as:

Used = Total - Free - Buffers - Cached

In other words, if you want to know how much memory is free, you can safely add the cached and buffers values
to the free count. Furthermore, for the most accurate count, you might want to sum the RES value for all
processes in the process table. This can be done using the nice BSD notation, which gives you VSZ and RSS
values:

ps aux | awk '{print $5}'

And then you can replace the newline character with a plus sign (using tr), and
then pipe the numbers to a calculator to get the numbers you want. Of course, you can also always use the
system reporting tools, like top and free and others.

Now, in our particular example, there's a problem. Namely, if we sum all of the memory usage in our system, the
total amounts to about 22GB. This means we have roughly 26GB missing, according to what the top command
reports. Well, minus the buffers and cached, but this still amounts to roughly 24GB seemingly unallocated for.

Slabinfo

At this point, we need to take a peek into the kernel space and try to figure out what gives. Luckily, you will
not have to write your own kernel module. The /proc pseudo filesystem already provides a human-readable view
into the kernel memory space via slabinfo. If you issue the cat command against /proc/slabinfo, you will dump the contents of this struct in a table displaying all sorts of
useful data.

First, let us define this slab thingie. Quoting a bit from the encyclopedia material and such, slab allocation is a memory management mechanism intended for the
efficient memory allocation of kernel objects which displays the desirable property of eliminating
fragmentation caused by allocations and deallocations. The technique is used to retain allocated memory that
contains a data object of a certain type for reuse upon subsequent allocations of objects of the same type.

Now, objects of the same type are organized into slab pools, which is the next hierarchy level in memory
management. And slabinfo gives you information about memory usage on the slab level. Bingo. The output of the
slabinfo is as follows:

First, you get the slab name, the number of active and total number of objects of the particular type, the size
of the object, and so forth. This is indeed what we are looking for. Multiply the number of objects with their
size, sum across all the slab types, and you will get the total slab usage. In our case, can you guess what the
total count will be? Yes, roughly 24GB, the missing memory. Bob's your uncle. Indeed.

You can also use the slabtop command, which will parse the slabinfo and display a
top-like view of the used slabs. This can be quite useful for problem debug in real time, plus it can save you
time digging manually through the /proc/slabinfo data. Finally, if you consult /proc/meminfo, you will also get
the total summary of the slab usage:

Conclusion

You may ask, why did I not see slab info as cached or buffered objects? And that is a very good question, but
it goes beyond your immediate problem, and that is figuring out how to account for all the system memory,
regardless of the accounting methods used. Now, a much bigger challenge awaits you, and that is to sort out the
program memory usage, understand if there might be a bug in the system memory reporting, and so forth.

However, for today's lesson, we have accomplished our mission. We wanted to know how to sort out the missing
memory phenomenon, and we've done it. Living la vida kernel. While the black magic of the Linux memory
management may never be fully unraveled, you have gained some valuable knowledge in this tutorial; how to use
various system tools to check and interpret memory usage reports, and most importantly, check the kernel slab
allocation. Your geekness level has just notched up, almost equaling your WoW skills. Peace.