Installing nrpe on OS X

If you don't know what nrpe is, you problably don't want it. What nrpe
will do for you is allow your nagios server to contact a computer over
the network and query it's state, using scripts to check on things such
as cpu load, drive space, pretty much anything you can script together.
Also, I'm walking through the steps of this process so as to provide a
reasonable accounting of what I did, but I've also scripted pretty much
all of it, and you can download the whole schmear. There's link to the
script at the bottom of this article.

Create a user for the nrpe service to use

The nrpe service, as everything on the mac, will have to run under a
user account, and it really shouldn't run under your account, since we
wouldn't want it to be able to reach out and hurt your files. There are
two ways you can create an account, either via the GUI using System
Preferences, or via the command line. I use the latter, but we'll cover
the former as well, since I'm not entirely sure of my kung fu creating
accounts. I also deviant somewhat from standard unix practice--normally,
you'd run nrpe under an account named nagios, but in this case, I'm
going to create a general account used to do some housekeeping tasks in
addition to just running the nrpe service. Think of this as a maid or
butler, with limited power, to whom you can assign household tasks.

Creating the account in the GUI

In the GUI, create the account, and give it a password. Do not make it
an administrator. For the purposes of this doc, it will be called susi.

Creating the account via command line

I found a good script online for creating accounts, and modified that,
but the meat of it is:

Now, we have to make a minor change to the configure file, using vi or
your favorite text editor, open the configure file and find the line
that tests for the libssl.so file, and comment that out, and add a test
for libssl.dylib instead:

If the configure fails complaining about not being able to find the ssl
libraries, double check the configure file--I got held up for a while
missing that my browser and editor "helped" me by using smart quotes
instead regular double quotes. Now run a make and install the parts we
want.

make all
sudo make install-plugin
sudo make install-daemon
sudo make install-daemon-config

Ok, if everything went well, great. Next we edit nrpe.cfg.

nrpe.cfg

The nrpe config file is at /usr/local/nagios/etc/nrpe.cfg, let's take a
quick look:

less /usr/local/nagios/etc/nrpe.cfg

In the hardcoded commands section, note the structure:

command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10

In this case, if the nagios server contacts this machine asking for the
nrpe service to run the command check_user, the nrpe service (running as
susi) will run:

/usr/local/nagios/libexec/check_users -w 5 -c 10

and pass the results and error codes back to the nagios server. Try
running the command now as yourself and see if it works ok. Next, let's
try starting up an instance of the daemon to make sure it works:

sudo /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

Now, what nagios uses on the server end to check the nrpe service
running on the various machines to be monitored is itself a plugin,
check_nrpe. That plugin is installed, so we can check to see if the nrpe
service is working ok by calling it with check_nrpe thusly:

/usr/local/nagios/libexec/check_nrpe -H localhost -c check_users

Assuming this works, you should get something back like:

USERS OK - 3 users currently logged in |users=3;5;10;0

Now, what we've just done is verify that we have a working version of
nrpe that can be started on this machine, and can be queried using the
check_nrpe plugin from nagios. What we need to do now is get this
working so that our nagios server can query this machine using that
check_nrpe over the network. Next, kill the daemon, and we'll make some
changes in the basic configuration. Run:

If you write a script in bash or whatever, and you follow the guidelines
on how the check_nrpe plug works, you can make custom queries. See for
example the check_temper
plugin. Restart the daemon:

sudo /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

Now, login to your nagios server, and call the check_nrpe command
against the workstation you've just set up with the check_users command,
using the ip name or number of the machine you're running nrpe on:

/usr/lib/nagios/plugins/check_nrpe -H myhost.mydomain -c check_users

You should get a response similar to what you get when you run the
check_nrpe plugin locally.

Troubleshooting

If you get an error "Connection refused by host" immediately, double
check to make sure that you have the correct ip for the nagios server
in the nrpe.cfg, and that the nrpe service is running normally.

If the connection appears to hang, check your firewall settings to
make sure the firewall will accept connections from the nagios server.

If you get an error that check_users is not defined, make sure you are
starting nrpe with the correct path to the nrpe.cfg file.

Starting nrpe via launchd

For more on this topic, see Starting-nrpe-via-launchd,
but basically, you create a file in /Library/LaunchDaemons/ that
contains the instructions for running the nrpe service. By convention,
the name of this file begins with the reverse of the domain "owning" it,
so in my case the file is named edu.unc.cs.nrpe.plist. Here's a copy of
my version, which is a bit different from the one on the web site above.

you probably have a problem with the account. Double check to make sure
that the nrpe.cfg file is specifying the correct user and group. Also,
if you start over at some point and delete or create an account, one
thing to be aware of is account data is cached in the system. The errors
above I got after deleting an account via commnand line, that account
had the UID of 504 but the cache had not cleared, so I rebooted and the
errors stopped.

An installer script

I've put together some scripts to do all of this automagically, but use
them at your own risk. There's a 00readme, but the short version is you
download it, unpack it, and then run it, giving it the userid you would
like nrpe to run under and the ip number of your nagios server. You can
download it here.

myworkstation:~ hays$ tar -xzf nrpe_installer.tgz
myworkstation:~ hays$ cd nrpe_installer
myworkstation:nrpe_installer hays$ sudo ./install.sh
Usage: install.sh [OPTIONS]
-c = Configure nrpe packages on this system
-m = Make nrpe packages on this system
-i = Install made verison
-a = Configure, Make, Install all
This should work in all cases
for a fresh installation, and possibly all generic cases
-u = Userid to use for installation, who the installed software
will run as (required)
-s = Ip number of the nagios server (required)
The most common usage would be:
./install.sh -a -u [userid] -s [nagios server ip number]
myworkstation:nrpe_installer hays$ sudo ./install.sh -u susi -s x.x.x.x -a

Thanks Bil! You saved me from what was likely going to be a night of figuring out how to get nrpe working on a Mac OSX box. We only have one and only need one, so it would have been knowledge I would have gained grudgingly!
I really appreciate it.

matt.

iflowfor8hours.info

Comment by Mark Clayton - Monday 20th February 2012 05:47:59 PM

Maybe my post will help someone else out. I upgraded to Lion form SL the other day. After the upgrade my MBP's fan was running full speed and the battery drain was excessive. Turning off the network brought the symptoms back to normal. Eventually, I figured out that launchd was hogging the cpu by looking at 'top -o cpu'. Looking in system.log, I found getpwnam was failing on nrpe startup. Apparently the Lion upgrade removed the nrpe user and group. Once I recreated the user and group, cpu usage, the fan speed and the battery drain returned to normal. This error occurred on 2 different macs.

Mark

Comment by bil - Monday 20th February 2012 06:13:21 PM

Thanks, that's a good tip to know!

Comment by Mahesh - Tuesday 19th June 2012 04:09:33 PM

Thank you so much Bil. Was struggling with installing Nagios on Mac and you saved me now.. .. : )

Comment by John McAdams - Tuesday 03rd July 2012 12:04:36 PM

Thank you for this. I'm still new to Nagios and have a dozen OS X servers to monitor. The macosx-nrpe-agent in NagiosXI doesn't work on Mountain Lion (yet) and this was a great way to get my test machines into Nagios.

Comment by Raouf - Thursday 19th July 2012 07:10:56 PM

Thank you so much for this. It was a great help for me .. .

Comment by JohnO - Sunday 23rd December 2012 01:05:27 AM

Thanks for posting these excellent instructions! They worked very well in OS X 10.8.2 Mountain Lion with two changes:

1) After installing Xcode, you need to install the command line tools from within Xcode so that the makefiles can find the C compiler. This can be found via XCode --> Preferences --> Downloads: Install Command Line Tools.

2) With the current nrpe (nrpe-2.14) you no longer need to modify the ./ configure script to deal with the libssl dynamic libraries. That check has been added to the configure script already.

One suggestion: You might wish to highlight the lines in the plist file that are likely to need to be edited. For example, I saved my plist with a name that made sense for me. I also created a different username than susi. Straightforward, I know, but it might save future person a little head scratching since the errors you receive do not clearly point you to what you did wrong when you try to load or start via launchctl.