Month: January 2015

As we have reached part 3 or 4 in our series on monitoring your servers for free, I would like to take a moment to highlight a few gotcha with Nagios before moving into our final section where we will setup graphical monitoring with Observium. At this point we’ve gone through building our a Nagios server, setting up contacts, monitoring printers, as well as monitoring both Windows and Linux servers. There are certainly a few issues you are bound to encounter on your journey with Nagios monitoring. These issues cost me hours of struggle, digging, testing, and ultimately coming upon a resolution. I would like to pay forward some of this effort in the hope that I can help save some poor soul out there the time, effort, untold amounts of silent cursing and coffee drinking.

Monitoring SQL Express

Chances are if you have been a Windows administrator for very long and have deployed and application server or two, you have undoubtedly had to setup at least one SQL Express database. The challenge of monitoring SQL Express with Nagios is that our friends at Microsoft have created a service name that contains a dollar sign (MSSQL$SQLEXPRESS). The challenge is that the $ character must be escaped using $ in order for Nagios to correctly read the service name. Use the following service definition below as reference for making SQL Express play nice with Nagios.

define service{

use generic-service

host_name host.example.com

service_description Service: SQL Server – SQLEXPRESS

check_command check_nt!SERVICESTATE!-d SHOWALL -l MSSQL$$SQLEXPRESS

}

Nagios localhost Warnings for HTTP

On your Nagios server you may encounter a yellow warning message regarding the HTTP service. The reason for this is that the NRPE client running on Linux servers is checking to ensure that apache is running and that it can locate and index file within the default docroot for apache. Resolving this issue on your Nagios server can be as simple as creating a blank file called index.html in the /var/www/html directory. A simple way to accomplish this is to use the touch command (ex touch /var/www/html/index.html

Once you create this dummy file you will need to restart apache (service httpd restart on RHEL or service apache2 restart on Debian variants) as well as the nagios service (service nagios restart).

Nagios Error When Rescheduling a Service Check (Error: Could not open command file ‘usr/local/nagios/var/rw/nagios.cmd’ for update!)

This and the SQL Express issue may very well be the most aggravating Nagios issues I encountered in my first deployment. However through being highly caffeinated and stubborn I did eventually find a solution. The root of the problem is that there is a permission setting that is getting flipped each time the Nagios service restarts. I have seen a few different fixes for this issue that both work. The first method I have seen corrects the problem at its root by correcting the broken permissions. However if this doesn’t resolve the issue for you, another alternative is to use a script that resets the permission each time the Nagios service runs. I recommend attempting the first method first as it is the more preferable fix, however if this doesn’t resolve your issue, try method two. Please note, terminal commands are in italics

Method 1:

#usermod -G nagios apache

#grep nagios /etc/group (ensure that the result shows that nagios is part of the apache group)

Alex Nogard’s blog lays out the methodology for creating a script that fixes the permission each time Nagios starts by adding the script into init.d/nagios. Please follow the link below for his instructions:

Although outside the scope of this particular discussion, one way to automate deployment of NRPE throughout multiple web servers is to use puppet to facilitate this. I will perhaps visit this topic in later posts regarding Puppet Labs, however I will sum this point up quickly in a nutshell before we wrap up. If you have an existing Puppet infrastructure, you can simply have Puppet add the EPEL repo to each LAMP server, create an ensure installed statement to ensure that NRPE is installed, and finally push out a preconfigured nrpe.cfg that contains the correct server information for your Nagios server (located in /etc/nagios).

That brings me to the end of part 3 in our series on monitoring your servers for free. In the next installment, we’ll take a look at deploying Observium to give us graphical output for our servers. Til next time, may the coffee be endless and the uptime in your favor!

In Part 1 of our discussion, we covered how install Nagios Core 4 from source on Centos 7. Now that we have a Nagios server up and running, it’s time to begin monitoring things. To begin with, lets cover agent vs agentless Nagios monitoring. If you would like to perform simply checks such as determining if a switch or printer is responding to ping, you can setup basic ping monitoring without configuring an agent . As long as our Nagios server is able to reach the device and the device has ICMP enabled, everything will be happy. This type of monitoring can also be used to perform checks on website access, DNS resolving to expected locations, and SSL check (however that is not covered in this post). To monitor uptime, resource utilizaition (HDD space, CPU, Memory, etc), and specific services, we will need the Nagios agent. The agent is available in both Linux & Windows flavors. The Nagios agent for Linux is called NRPE and can be installed using apt-get or yum (ex; yum install nrpe). On the Windows side, the Nagios agent is NSClient++ and can be downloaded as an executable, see the download location: http://nsclient.org/nscp/downloads

Nagios Server Side Configuration

Nagios retrieves its monitoring configuration from the nagios.cfg file located in /usr/local/nagios/etc/nagios. The the nagios.cfg file contains definitions that point to templates that define what is being monitored. When creating new templates, it is important to remember to go back and add the nagios.cfg entry corresponding to the new template (or to uncomment one of the default templates if you choose to use one).

The other locations of interest we will take a look at are the templates located in /usr/local/nagios/etc/objects. If you choose to modify the existing templates rather than creating new ones, I would recommend making a backup copy of these templates to keep just in case you ever need to restore them or refer back to them (ex: cp windows.cfg windows.cfg.bak) One of the first templates we will want to modify is the contacts.cfg, this is where you can add the email address (or distribution list) that you want to receive nagios alerts. For example if I want to receive Nagios alerts at alerts@richsitblog.com I can set this within the contacts.cfg. The final thing we need to discuss before we begin walking through some actual setups, is that Nagios communicates on port 5666, you will need to ensure that the server you are monitoring have the ability to communicate with your Nagios server on port 5666, and that port 5666 is open on your Nagios server (see example below):

iptables -A INPUT -p tcp -m tcp –dport 5666 -j ACCEPT

Fortunately the NSClient++ agent makes the needed provisions in the Windows firewall on install, however you will still want to be aware of this if hardware firewalls or AWS security rules are between your Nagios server and the infrastructure that it is monitoring.

Monitoring Printers

Let’s go ahead and setup some basic print monitoring now that we have an overview of the basics. For this example I am going to use the default network printer template provided by Nagios. In my examples I am using nano as the simple text editor, depending on your installation of Centos you may or may not have this editor by default, however you can use vi, vim, or any other Linux text editor, or install nano (yum install nano).

cd /usr/local/nagios/etc/objects

cp printer.cfg printer.cfg.bak

nano printer.cfg

At this point we can now create our host definitions, in the example below, you will see my entries for 2 printers. Customize your host definitions according to your environment, you can simply copy and paste to add more host definitions (customizing them with the appropriate info). (click image to zoom)

Customize the host group if desired, I have left this default since I am only manage the printers for one site

In the service definitions section you will want to replace the dummy host_name with the hostnames defined in your host definitions. To list multiple, you can type them on the same line separated by commas (ex: host_name IT_ColorLaser,Lobby_Copier). For my monitoring purpose I only care if the printers are on the network, so I am only monitoring ping. Any unused service definitions must be commented out or deleted.

nano /usr/local/etc/nagios.cfg

remove the # in front of cfg file path for the printers hostgroup (see example below)(click image to zoom)

At this point we can save and exit nagios.cfg and restart nagios (systemctl restart nagios.service)

If we’ve done everything right at this point the web page should now have an additional host group for network printers that are lit up green

Monitoring Windows Servers

Now that we’ve successfully gone through setting up printer monitoring, lets get started with Windows Server monitoring. For larger organizations, there will be large groups of servers configured with similar roles, features, and uses. If you’re IT environment is like most SMB shops, you may only have 1-2 Windows Servers configured the same way, and these are likely to be Domain Controllers. You may choose to setup all of your Windows Servers under a single Nagios template and associating host definitions only with the services you want monitored, or you may choose to create multiple templates based on function, so instead of everything being under windows-servers, you may have windows-domaincontrollers, windows-webservers, etc. I have my production environment setup using the latter method, however for the sake of simplicity and minimizing config sprawl, we will add a couple of servers into monitoring. The servers I have chosen are a 2012 Domain controller, and a 2008 R2 server configured with IIS.

Proceed through the wizard entering the IP address of your Nagios server, no password, and check the first 3 boxes see below:(click image to zoom)

Now that the client is prepared, we can ssh into our Nagios server and complete the configuration on the Nagios side

cd /usr/local/nagios/etc/objects

cp windows.cfg windows.cfg.bak

nano windows.cfg

Edit the host configuration to contain the information for your servers. See the example screenshot below of the 2 servers configured for this demo(click image to zoom)

add the hostnames defined in the host configuration to the service definitions (note: on services such as W3SVC that apply only to one server, be sure to only include the hostname of the server it is applicable to).

To monitor additional services you will need to create service definitions for them. The easiest way is to copy an existing service definition and customize it with the service you wish to monitor. I have done this in our example by copying the W3SVC service and using it as a template for our DNS Server and AD services. To find service names on your windows server you will want to go to services.msc, locate the service, right click and make note of the service name and display name.(click image to zoom)

In Server 2008 R2 and later ICMP ping is turned off by default. This will result in a false positive for host down when monitored by Nagios. To enable ICMP open cmd or Powershell and type the following and press enter: netsh firewall set icmp 8

nano /usr/local/nagios/etc/nagios.cfg and uncomment out the cfg file path for windows.cfg

systemctl restart nagios

At this point if we are successful we should see a host group with each server and its services listed

Monitoring Linux Servers

To monitor a Linux server the process is somewhat simple. In the environment I support, our most common use case for Linux is as a web server for our web based SaaS application. Being that nearly every Linux server in our environment is configured the exact same way, the use of a single template with multiple hosts is extremely applicable. If you are installing on Centos you will need to enable the EPEL repo or obtain the NRPE plugin through wget. For more info about EPEL visit http://fedoraproject.org/wiki/EPEL.

yum install nrpe

nano /etc/nagios/nrpe.cfg

Locate the allowed_hosts portion of the nrpe.cfg

Add a comma after the localhost address, add a space then type the IP address of your Nagios server (ex: allowed_hosts=127.0.0.1, 192.168.1.37)

save and exit

chkconfig nrpe on

Now that we’ve configured out client, lets hop back over to the Nagios server

cd /usr/local/nagios/etc/objects

cp localhost.cfg linux.cfg

nano linux.cfg

Replace localhost in the host definitions with the name and IP of the web server(s) you are monitoring

Change the hostgroup name to something else (ex: linux-webservers) as well as the alias (ex: Linux Web Servers)

Replace all instances of localhost in the service definitions with the hostname of your web server(s) as defined in the host definitions

Save and exit

nano /usr/local/nagios/etc/nagios.cfg

Copy and paste the definition for localhost and modify the description and path to correspond to the linux.cfg object

systemctl nagios restart

If all has gone well at this point we should see another host group containing our web server(s).

Video

Printer Monitoring Video

Windows Server Monitoring

More coming soon!

Thanks for sticking with me, I know this has been a long post. Hopefully this will help you to get your own free alert monitoring going with Nagios in your environment.

We’ve all had the dreaded phone call “did you know XYZ server is down”? This is normally followed by a flood of calls and at some point questions regarding how to be more proactive in responding to issues. First and foremost, let me be clear no matter how good of an IT admin you are, there will always be some unexpected downtime (unless you work for the only SMB size company on the planet with clustered and redundant everything). That said careful planning and monitoring can help reduce downtime and help to provide a more proactive response to outages. There are untold amounts of monitoring solutions out there from Opsview, PRTG, Nagios XI, Solar Winds, etc. Each of these products is certainly a fine solution for monitoring, however monitoring gets expensive quickly. Perhaps you need to build a proof of concept or just need something simple and free. Enter Part 1 of Monitoring Your Servers For Free!

Nagios Core:

For alert monitoring with granular features such as monitoring specific services on your server infrastructure or simply running ping checks to make sure you wireless APs are alive and responsive, Nagios core is hard to beat. If you’re a Linux admin this should be a walk in the park for you, however even for a Windows admin this is a fairly easy setup with solid instructions and a few gotchas. The link below details the installation steps for both Debian and RHEL flavors of Linux, however some undocumented gotchas to be aware of (if you want to access the web page from something other than the local host you will need to create iptables firewall exceptions (as well as firewalld for RHEL 7). Below is an example of an iptables entry to allow inbound traffic on port 80, the same can be applied to 443 and any other needed ports.