How to get started with Network and Server Monitoring

Network and Server monitoring is one of the least loved areas of Network Administrators. This is Ironic because it can make your life so much easier. Nobody likes to look through tons of log files and we don’t have enough screens to log into each server, switch, router and firewall so we can keep an eye on them.

It is not uncommon to see monitoring budgets effectively $0. Despite spending tens of thousands or hundreds of thousands on equipment, there is little money, thought and time spent on monitoring. Everyone wants the shiny new toys, but let’s face it, monitoring just isn’t sexy.

Why should you monitor?

Monitoring can prevent outages. Public Safety is a critical part of our society. When people dial 911 they expect someone to answer. The network and servers have to be up and working. It just isn’t an option for them to be down for extended periods of time while someone figures out what is wrong. Monitoring the servers and network can warn you of problems. You will see that the drive that has your critical database is running out of space before it causes the database to stop working.

Monitoring can help you plan. If you see that your storage requirements are growing you can plan to expand it. Adding a request into the budget process for a new Storage Area Network (SAN) is a lot easier than trying to get one approved in the middle of the fiscal year when you are out of space.

Monitoring can save you time. Putting a good monitoring tool in place can save time by giving you a single place to look to get a feel for the health of your network and servers. It can also save time when there is a problem by helping you to focus your troubleshooting where it will do the most good. You don’t want to spend your time looking at a server’s event logs when the problem is with the switch it is connected to, or the workstation of the user reporting the problem.

Monitoring can save you money. Much like saving you time, having an effective monitoring system in place provides information about the health and resiliency of your public safety IT infrastructure. This information allows managers to make informed decisions about technology investments and people investments while managing risks and priorities. As an example, if network monitoring indicates a device is failing, does it make sense to ignore that warning only to pay expediting fees and overtime when the device takes the network down?

So where do you begin?

The first place is metrics. Deciding what you want to monitor is just as important as monitoring your system. Every network is different and what is important to you may not be important to the next guy. Some of the areas to think about are: storage free space, bandwidth utilization, and backups. Many of our clients monitor specific services that can warn them of software problems before the users do. By deciding on what to monitor you can also eliminate noise. Monitoring tools are designed to meet a wide variety of needs by presenting a lot of information on all your systems. This is a good thing, but can also be a bad thing because the critical data can be hidden in the “noise” of all the information that is not important.

The second step is to determine if you want to do it yourself or contract with a vendor to do it. Many IT departments are too busy or short staffed to add one more thing, or they just don’t have the in-house skill set to take on the task. The advantage of going with a vendor is that they have worked with the tools they use and are experts in configuring it. If you go with a vendor that focuses on organizations like yours, they will also be able to help you with determining which metrics to watch and will do all the leg work of setting up the monitoring. If you decide to do it yourself, then the next task is picking the right tool.

Monitoring may not be sexy, but it can save time and money in the long run, by preventing downtime, shortening the duration of down time, and providing you the information needed to plan for the future. In organizations where lives depend on your ability to function it is a must. In future posts I will show some real-world examples of how monitoring prevented failures, focused troubleshooting, and provided solid information for planning.