Troubleshooting Sensu

In this guide, we’ll cover some of the more common issues to run into when deploying Sensu. For each section, we’ll start with the behavior that’s most commonly observed, and then walk through some possible solutions to solve that issue.

Have an issue that isn’t listed here? Open an issue with what you think should be added to this guide!

Initial Troubleshooting

Before we dive into things like troubleshooting connectivity with RabbitMQ, or Redis, it’s important to start off with some baseline troubleshooting steps you can take whenever you encounter an issue with your Sensu deployment.

Setting Log Levels

Sensu has the ability to set log levels interactively, or by using a configuration directive in /etc/default/sensu. This is particularly useful when attempting to debug an issue where the current log level doesn’t provide sufficient information. Let’s take a look at the ways you can set your log levels.

Perhaps the quickest way to set your log level is to use the following command:

sudo kill -TRAP $SENSUPID

This will toggle the debug log level on/off for Sensu. In practice, it looks something like this:

Additionally, you can set the log level to info or debug by using the configuration directive in /etc/default/sensu. Let’s take a look at an example:

sudo cat /etc/default/sensu
LOG_LEVEL=debug

And after setting that directive, restarting the respective Sensu services:

sudo systemctl restart sensu-{server,api,client}

Keep in mind that to set log levels back to normal, you can either run sudo kill -TRAP $SENSUPID (if you’ve used that method), or revert the change in /etc/default/sensu and restart the Sensu processes for the change to take place.

NOTE: By default, Sensu’s logging level is set to info. However, there are more log levels available than just info and debug. You can find the full list of available log levels in the configuration reference documentation. It’s worth noting that debug is the most granular log level, while fatal is the least granular.

Printing Configurations

Frequently, Sensu staff or community members may ask you to print your configuration. It’s fairly easy to print the configuration for your Sensu deployment:

This command will result in output that will list the entire configuration for your Sensu deployment. This can be especially useful when comparing the configuration that Sensu is aware of, versus the configuration living on-disk. If the values of a particular file differ from what you’re expecting, then see the next section for how to proceed.

Restarting Services

It’s crucial that you restart your Sensu services after each change so that the configuration changes are read. For most recent Linux distributions (CentOS/RHEL, Debian/Ubuntu) this is done using systemd:

sudo systemctl restart sensu-{server,api,client}

In the event that you’re using a system where sysvinit is the service manager of choice, you can use:

Local socket is open (Can be verified with netstat -tnlp | grep 3030 and nc -vz localhost 3030)

Once the prerequisites have been met, we can move on to troubleshooting.

Troubleshooting steps

Consider the following scenario: Sensu has been installed, has been verified to be working correctly (alerts are seen in the dashboard), and is configured to send alerts via the mailer handler. However, mail doesn’t appear to be coming through.

NOTE: Successfully submitting a check result this way will be indicated by ok being printed on the next line – typically this is appears ahead of the command prompt so it can be easily missed. See below for an example.

Review the logs on the Sensu server to determine if the issue is making it through to the server.
Grep for the error message, specifically the check “name” attribute.

{"timestamp":"2018-10-11T11:02:00.576261-0500","level":"info","message":"processing event","event":{"id":"f4a9453f-ac70-4e91-a601-a97ff31c589a","client":{"name":"sensu.test.local","address":"192.168.156.176","environment":"testing","subscriptions":["dev","linux-hosts","roundrobin:web_probe","client:sensu.test.local"],"version":"1.5.0","timestamp":1539273717},"check":{"name":"testing","output":"THIS IS AN ERROR","status":2,"refresh":10,"executed":1539273720,"issued":1539273720,"type":"standard","history":["2"],"total_state_change":0},"occurrences":1,"occurrences_watermark":1,"last_ok":null,"action":"create","timestamp":1539273720,"last_state_change":1539273720,"silenced":false,"silenced_by":[]}}

It’s also recommended that you note the event ID, as this persists and allows you to track an event throughout its lifecycle.

"event":{"id":"f4a9453f-ac70-4e91-a601-a97ff31c589a"}

Ensure that the event is being handled by the mailer handler (you can do this by searching for the event_id and looking at additional log entries to confirm that the event is handled as expected).

Most common issues surfaced

Troubleshooting via the local client socket typically surfaces the following types of issues:

Misconfiguration (either of Sensu, or a handler’s/integration’s corresponding service)

Inadvertent filtering (in the case of the community mailer, or handle_when in Sensu Enterprise Classic)

RabbitMQ Connectivity

In this section, we’ll discuss issues faced when connecting to RabbitMQ and how you can go about troubleshooting them.

Authentication Failures

One of the more common issues that you’ll encounter when having RabbitMQ connectivity difficulties is the client and/or server failing to authenticate to RabbitMQ. Let’s take a look at what an example error message might look like from both Sensu and from RabbitMQ:

WARNING: The credentials in this guide shouldn’t be used in any production environment. If you’re curious about how to better secure RabbitMQ, see our Securing RabbitMQ Guide.

SSL

SSL issues are one of the more difficult ones to troubleshoot inside of Sensu. What lends to this difficulty is the way that AMQP (the protocol used by RabbitMQ) handles SSL failures, primarily in that the failure seen is indistinguishable from an actual authentication issue.

If you’ve already gone through the steps in the previous section to confirm that your Sensu instance is using the correct credentials to connect to your RabbitMQ instance, then you’ll want to proceed through this part of the guide to rule out any issues with SSL.

PRO TIP: For troubleshooting SSL issues, the openssl tool provides a wealth of troubleshooting capabilities. To see what is possible with the tool, take a look at this handy cheat sheet.

Handshake Failures

There are several layers of the proverbial onion when it comes to diagnosing handshake failures. We’ll start by looking at the obvious errors that you’ll see in logs, and dive deeper from there. The assumption here is that you’ve already configured Sensu to use SSL. If not, you’ll want to refer back to our SSL Configuration Reference material before you proceed. Now, on to examining the errors you’ll likely encounter in a handshake failure scenario:

Much like the errors seen in the previous section, the failure to connect to RabbitMQ appears to be one related to credentials. However, we can go a bit deeper by looking at the RabbitMQ logs, which present an error similar to the following:

NOTE: We’ll presume that if you’ve gone through our SSL guide, that you’re using the SSL tool to generate the certificates used in your deployment. If not, this is not a problem, as the commands we’ll use for troubleshooting this particular scenario will prove useful no matter how your cert and key pairs are generated.

Let’s start off by manually verifying our certificate and key pairs. Sensu’s SSL tool will place the certs/keys in the following directory:

Provided that the MD5 sums match and we’re able to connect to RabbitMQ via openssl, we can effectively rule out any issues with the certificates.

Let’s move on to looking at the certificate (in this case, the server certificate specifically) and see what we find there. You can use the following command to examine the inner details of a certificate:

openssl x509 -in server/cert.pem -text -noout

This will give you quite a bit, but the most important thing to note here is a specific extension:

In the output above, we’re specifically interested in the TLS Web Server Authentication extension. In a non-working certificate, you will not see this present. Instead, you’ll end up seeing a value that looks similar to a SNMP MIB. See the image below for an example.

Unknown CA

There is a possibility that you may encounter an error inside of RabbitMQ when configuring SSL/TLS that states the following: “Unknown CA”. To remedy this issue, ensure that the full certificate chain is present on every system connecting to RabbitMQ (e.g., Sensu clients, Sensu Servers.)

Hopefully you’ve found this useful! If you find any issues or have any questions, feel free to reach out in our Community Slack, or open an issue on Github.

About Sensu

The Sensu monitoring event pipeline empowers businesses to automate their monitoring workflows and gain deep visibility into their multi-cloud infrastructure, from Kubernetes to bare metal. Companies like Sony, Box.com, and Activision rely on Sensu to help deliver value faster, at scale.