Every operations team has its fair share of monitoring solutions. While you may not have achieved the perfect state of a single pane of glass, you likely have settled on two or three solutions that cover all the hardware and software that supports your business. You even invested considerable time and effort to not just implement these solutions with out-of-the-box settings, but with tailored IT alerting thresholds and alarms that suit your environment’s specific needs. Look at you!

It’s easy to overlook the next step of your monitoring deployment: notifications. Usually, one of two things will happen:

1. You get too many alerts, and subsequently turn off alerts

2. You get too many alerts, and subsequently create an Outlook rule to trash them all

It’s the age-old signal-to-noise problem in IT. How do you fine-tune your notifications, so they alert you to events that deserve your attention while filtering out all of the notifications that are not actionable? Your first thought might be to turn off any performance-related alerts and just receive system or device down notifications. But if that’s all you’re looking to get out of notifications, you should just write a PowerShell script to run a Test-Connection against your server list and Send-MailMessage when a host is down. (That’s mostly sarcasm.)

Instead of throwing the baby out with the bath water, here are some monitoring and alerting best practices for reducing notification overload.

Inventory Your Applications

First things first: no one cares about your servers like you do. You invest countless hours building, installing, patching, backing up, repairing, and generally supporting these virtual beasts. Even if you’re running an automated shop (which you’re not), you still train your attention on the infrastructure. But the business cares about the applications.

So, if you don’t have a list of your apps (which should include URLs in this modern SaaS era), get one together. Without a reliable and accurate inventory, you’ll never know if you’re monitoring all your devices. (On a related note: if you’ve got tips on how to collect and maintain an application inventory, share them in the comments below.)

Map Applications to Devices

Now it’s time to correlate infrastructure with applications. In other words, if server org1east-c goes offline, what applications are affected? What if the NAS doesn’t survive a firmware upgrade? When you can draw direct connections between your applications and the infrastructure, you can shift the focus of your monitoring (and eventually notification routing) to the right teams as quickly as possible.

The benefit of this exercise is to tune your alerts and notifications to reach the right teams right away.

For example, if one of your load balanced web servers goes offline, you can have an alert sent to the server team to investigate the server. But don’t stop there. Also have an alert sent to the team that supports the website or app that relies on that web server. They may not need to take any corrective action, but they’ll certainly appreciate a heads-up that there may be infrastructure trouble brewing. And you’ll also avoid fielding calls from the web team asking, “What’s going on?”

Routing notifications isn’t the most exciting part of deploying a monitoring solution, because it’s likely the most difficult. Not because of the technology, but because of the deep dive required to really understand the connection between your applications and your infrastructure.

SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process. Learn more today by joining.

SolarWinds uses cookies on its websites to make your online experience easier and better. By using our website,
you consent to our use of cookies. For more information on cookies, see our cookie policy.