Bruning Questions: Where's My Memory Going?

Thank You

Thank you for contacting us. We will get back to you shortly.

March 15, 2013 - by Mr. Max Bruning

Share:

While teaching a combined SDC, SmartOS Internals, ZFS Internals, and DTrace course over the last 2 weeks, a student noted that he thought he was seeing a memory leak while running imgadm import dataset_uuid. He was doing this on his laptop on top of VirtualBox. The command eventually failed. So, the question he wanted to answer was "What is using up the memory"? We'll also take a look at how the command failed (it was not because of being out of space).

I was able to reproduce the behavior, and if you like, you can follow along by doing the same steps on your SmartOS system; you can get the latest version here. Here are good instructions for getting SmartOS running on VMware Fusion, and a link for doing the same on VirtualBox. If you prefer to run it on bare metal, you can get the USB image from https://download.joyent.com/pub/iso/latest-USB.img.bz2, copy the img to a USB key, and boot your system off of the USB key. There are also links for running on VMware, VirtualBox, or from the USB key on the download SmartOS wiki page.

Assuming you've logged into the global zone as root using ssh, we'll do the following:

From the above output (as with most of the *stat commands, ignoring the first line of output), the free memory on the machine is steadily dropping.

Let's see if prstat(1M) tells us anything. Basically, the memory is either being used by a process(es), for files using tmpfs (i.e., files in /tmp), or by the kernel itself (very possibly by ZFS' ARC).

From the above output, process id 3553 (imgadm) is by far the largest process, and its RSS (Resident Set Size, i.e., amount of physical memory in use) is growing. A closer look at this process shows that it is the node.js engine:

So the node::Buffer::Replace method is calling malloc(), which in turn decided it needed more memory so it called brk(2). The brk(2) system call is used to grow the virtual size of the heap. But again, node is large, but most likely not leaking memory.

The kernel is growing, but is not using an excessive amount of space. So, applications are growing, but not enough to cause problems. The same can be said for /tmp files, and for the kernel. The conclusion is that there is no memory leak.

My student thought a memory leak was causing the imgadm import to fail. Here is the failure message:

This occured on a system where network access was slow and not very reliable. Most likely, the failure is due not to a memory shortage, but to corruption of data received over the network. We may come back to this in a future blog.

If the kernel is fine, no large (and growing) tmpfs files, and no large and growing processes, you probably don't have a leak.

Note that files that have been unlinked but are still open can cause memory to be used up if these files are being written to and are in a tmpfs file system. How to deal with this will be covered in a future blog.