For more than a year now I've been successfully monitoring SmartOS smartmachines with Nagios. To monitor the memory usage, I am using "check_mem" (https://github.com/Voxer/nagios-plugins/blob/master/check_mem) which works very well and allows me to create graphics (the perfdata code of this plugin was actually added by me).

Here an example of the graph:

While this is working on smartmachines (the zones), the plugin does not work on physical servers.
To get the currently used memory value, the command kstat is used. If I launch the command on a physical (global zone) SmartOS, all zones are shown:

There are some downsides of this command, though: The command takes nearly 4 seconds for the output (I can live with that) and I am not sure if the sum of the percentage correct. Sure, they sum up to 100% and I know that ZFS uses a lot of memory, but 36% of the whole system? But at least this is a working alternative.

Another way I found is to use prstat which in combination with -Z shows a summary of the zones. With -z a zone id can be used to retrieve the data for a specific zone:

The interesting part comes after the process list. The column RSS is the amount of memory used by the global zone.

As prstat is an interactive command (like top on Linux), you have to play around with it a little to be able to save the output into a file:

prstat -z 0 -Z 1 1 > output.txt

I have now different options to patch the "check_mem" plugin for SmartOS:

Use the same kstat command as already used in the plugin but add the rss values of each found zone to a total rss size. Issue here: The global zone itself uses memory, too. This value is missing in kstat.

Use mdb output and use the third column (MB) to calculate the current usage. The good part here is that I don't need another command to get the total physical memory value. I just have to watch out that I don't count "ZFS File Data" as used memory but rather as cached memory.

Use the prstat output, but without declaring the global zone (-z 0) so I get the current RSS value for all active zones (including the global zone). Basically the same logic as using kstat but prstat contains the rss value for the global zone.

Add a comment

Comments (newest first)

Hi UX-Admin. Maybe in your point of view. But if you sell zones to your customers (and that's the case in that setup), you need to know their memory usage in order to generate correct invoices.
EDIT: Oh, I misunderstood your comment. Yes, on the global zone it might be wrong, however we needed at least some graph in our monitoring to understand the memory consumption on a physical level. We never got to the point where this is 100% accurate, but it gave us an idea.
Luckily we ditched SmartOS after a while (it was too much of a headache with customer requests).

UX-admin from wrote on Feb 24th, 2016:

The global zone keeps track of the entire system's memory usage, and all zones share that same memory. Therefore, keeping track of each individual zone's aggregate memory usage is pointless, not to mention incorrect. If you however have zones whose processes are leaking memory, then the indivdual process which is leaking memory should be tracked down and debugged. What you are attempting to do is really unnecessary.