Looking for a monitoring solution for a distributed environment

Background: I work for an MSP. Our customers range from single server with a handful of desktops to larger environments with hundreds of users. Right now, our monitoring consists of various systems sending (or not sending) email alerts, MRTG collecting performance data via SNMP, and RGE IPSentry running active checks where it can. My boss is not happy with it, and wants to implement One Monitoring System To Rule Them All - and he has tasked me with it.

Not all hosts can be accessed from a central locations - in fact, the majority of them only have private IP addresses, and site to site VPN is usually not feasible. Therefore, the monitoring solution would need both the ability to deploy proxy servers on customer sites, and ability to accept passive checks.

Basically, my boss wants to have a single dashboard where all problems will pop up automatically. He thinks Nagios can do it, but I just spent several hours wrestling with NSClient++/NSCA, and it's a huge fucking mess where documentation is years out of date, incomplete, contradictory, and often seems to be plain incorrect. Oh yeah, I'm expected to implement it all in my "spare time". He's prepared to pay for a solution, but no more than a few thousand dollars (i.e. Solarwinds Orion, which at this scale would cost $20k+ based on number of sensors, is not feasible).

Kaseya will monitor all of these, and more, along with loads of scripting options for automating the hell out of the management and monitoring of environments. It is not free. It is not expensive, but it is not cheap either. Yes there is a learning curve like all other solutions.

It has a tiny, tiny agent that any end user can install (only requires a couple of clicks), takes almost no resources, and requires only one outbound port for access and NO inbound ports so it is still secure for the vast majority of environments.

My company is NOT a reseller but we do use the system and most of us love it. We are an MSP and integrator, and even the Consulting team (I'm on that team) uses it for some tasks.

Quote:

He's prepared to pay for a solution, but no more than a few thousand dollars

Get a Visa to the USA, apply at my employer (we're always looking for more MSP people), and then once you have a job here tell your boss to go screw himself. "No more than a few thousand dollars" for an MSP-class monitoring solution does not compute. If your company really wanted to offer its customers the best support they can, they would make the investments required to do so. Anything less is a waste of customer money, trust, and time.

This is near and dear to me because again we have our own MSP and our MSP team is amazing.

I'm in the same scenario- We're an MSP with about 150 servers distributed across 50 remote sites. However, long ago I demanded that our customers let us have a VPN to their site so I can access all of them from a central location.

We use Solarwinds Orion NPM + SAM (Formally APM). It lets us do everything we need, is complex but easy to learn, and can monitor hardware from most major vendors (including those you listed).

I believe it can do multiple polling engines, but I think that was more designed around load balancing rather than remote site deployment, but it's worth checking into.

Orion has it's flaws, in some areas it's rough around the edges, but they're constantly improving on it and I love the product.

Yes, Barmaglot, move to the US, they don't pay you nearly enough at your job.

Foglight will definitely do it, that is essentially what it is designed for, remote agents collecting data and reporting back.I found the setup not too bad although some of the sensors are a little touchy. The UP/DOWN dashboard interface and dashboards in general are pretty good.

Hostmonitor has its RMA agent for remote locations. I am a fan, it is certainly very powerful and inexpensive but it doesn't have the polish and reports of other products.

Look at KS Hostmonitor it can aggregate many alert sources into a dashboard and receive from remote agents. It will be a huge job to setup and maintain though.

Looks interesting, not too complex, and fairly affordable. It looks very 1998, but I don't mind that, and feature list is good.

moullas wrote:

I'm under the impression Foglight (and vFoglight) from Quest could fit the bill.

It's very powerful, but there is a huge learning curve, both for setting it up (I know you can deploy remote agents to handle data collection),creating dashboards etc

And it's not free.

Unfortunately it looks like one of those "if you have to ask how much it costs, you can't afford it" things.

AngelZero wrote:

Kaseya will monitor all of these, and more, along with loads of scripting options for automating the hell out of the management and monitoring of environments. It is not free. It is not expensive, but it is not cheap either. Yes there is a learning curve like all other solutions.

Looks interesting, and at $2.95/month/agent, not too expensive. I sent a link to my boss, will have a chat with him this coming Sunday.

AngelZero wrote:

Get a Visa to the USA, apply at my employer (we're always looking for more MSP people), and then once you have a job here tell your boss to go screw himself. "No more than a few thousand dollars" for an MSP-class monitoring solution does not compute. If your company really wanted to offer its customers the best support they can, they would make the investments required to do so. Anything less is a waste of customer money, trust, and time.

You don't know our customers Stuff like a customer demanding to pay 30% less than last year - after doubling their size and our work during aforementioned year - is sadly commonplace. Still, Kaseya seems to include a ticketing system (and we've been looking to replace ours for a while now) and asset tracking, which may help when negotiating service charges.

Also, I have a USA visa, but only a B1/B2 type

Skyview wrote:

I'm in the same scenario- We're an MSP with about 150 servers distributed across 50 remote sites. However, long ago I demanded that our customers let us have a VPN to their site so I can access all of them from a central location.

We use Solarwinds Orion NPM + SAM (Formally APM). It lets us do everything we need, is complex but easy to learn, and can monitor hardware from most major vendors (including those you listed).

I believe it can do multiple polling engines, but I think that was more designed around load balancing rather than remote site deployment, but it's worth checking into.

Orion has it's flaws, in some areas it's rough around the edges, but they're constantly improving on it and I love the product.

While theoretically I could do VPNs into most of our customer sites, it'd be a huge project due to many networks having overlapping private ranges - I'd need to either set up a lot of NAT rules (and then keep them straight), or renumber dozens of networks. Even with all that, I wouldn't get them all - quite a few run basic SoHo routers rather business grade firewalls, and while I could probably solve that by deploying software inside the network, it'd be more and more and more work.

Also, we looked at Solarwinds Orion NPM a couple years back, and after a few back-of-the-envelope calculations about how many sensors we'd need to license, the cost quickly passed $20k and kept growing, which put it squarely in the 'unaffordable' range.

Stuff like a customer demanding to pay 30% less than last year - after doubling their size and our work during aforementioned year - is sadly commonplace.

Then you drop the customer. It is antithetical to both your company mission and your business strategy to support customers who are unwilling to pay for services rendered. These are business issues, most likely with your company's management team.

Zabbix can do all of this and it's distributed nature and proxies would give you insight into all the disparate networks. The learning curve is steep and you would end up writing most of your own checks.

All monitoring solutions suck to a certain degree. They are either easy and trivial, or complex. The complexity comes from your requirements, and the customization that you have to implement.

The investment you mentioned for something like Solarwinds is probably the bare minimum investment that you'd want to make. If that cost is unpalatable, your management isn't serious about their job, and you need to figure out how to make this someone else's problem.

Yup, angelzero is right on the money. The other one to look at is labtech. If you just need the RMM (remote monitoring & mgmt) you can just get labtech, but it will integrate with another product called connectwise and can make your life so much easier, it is the customer/business/ticket management side of things. We bought out a competitor that was using Kaseya, from what I have seen it is just as good, and actually better on some of the technical stuff.

You might also want to consider PRTG. Their licensing is extremely fair, and the system overall well thought out. PRTG allows you to deploy a single "Probe" agent in a network and perform checks from this probe device (hence only one firewall rule addition if required).

I've deployed this in several multi-site environments and been quite pleased with performance, flexibility and price.

Labtech and kaseya page back home, so no opening ports needed. The reason I posted though, is I would recommend going w/ agents. We tried doing a few different agentless setups over the past 5 years (level platforms & n-able among them) and none were even close to using agents. Granted there is a big trade-off (you have to install agents on all the machines,) but if you want to scale, every large MSP I have talked to is using agent-based tools.

Kaseya looked enticing at first, particularly the ticketing component, but I quickly found out that it doesn't support Hebrew - at all. Anything I type in Hebrew turns into question marks as soon as I save it, so its ticketing system is a complete non-starter for us. Too bad - it could've been something that would've justified the high price. Also, the web GUI is annoyingly slow - not super-slow, but not something I'd want to use for 10-12 hours a day, six days a week.

KS HostMonitor is not nearly as pretentious - unlike Kaseya, which is everything-and-a-kitchen-sink, it only does monitoring - but so far I like what I see. The UI is spartan, but very quick, and the monitoring capabilities are excellent. I particularly like the active remote agent, which functions almost 100% transparently. What I have yet to see, however, is how well will it scale to a 100+ agents monitoring some 10-20k+ sensors.

"but I just spent several hours wrestling with NSClient++/NSCA, and it's a huge fucking mess where documentation is years out of date, incomplete, contradictory, and often seems to be plain incorrect. "

Don't do it! SNMP has 99.9% of the functionality as NSClient. Windows comes with SNMPv1 which is ugly (plain text) but if you're reasonably careful it can be ok, or you can go with NetSNMP and get things to work.

If you PM me I'll send you a Ubuntu VM template with Nagios etc all configured and ready to go, you just need to add the hosts & IPs and which services you want to monitor. Or I can give you access to a Subversion repo with the configs, just lmk, it literally takes ~4 hours to set up on a new network, I've done it at 6 sites globally so far (using mntos to make a cross site dashboard).

Just don't ask me to set up pnp4nagios again , I struggled there although I'm a lot more comfortable with PHP in general now so it may not be as big a deal.

Edit:And ++ about ALL monitoring being a pain, the whole point is to be able to automate the stuff that is specific to your business. Generic junk is easy, but there is very little generic stuff and the value for us is stuff like graphs of data warehouse load times/time or IOPS/VM etc.

"but I just spent several hours wrestling with NSClient++/NSCA, and it's a huge fucking mess where documentation is years out of date, incomplete, contradictory, and often seems to be plain incorrect. "

Don't do it! SNMP has 99.9% of the functionality as NSClient. Windows comes with SNMPv1 which is ugly (plain text) but if you're reasonably careful it can be ok, or you can go with NetSNMP and get things to work.

Don't do it! SNMP has 99.9% of the functionality as NSClient. Windows comes with SNMPv1 which is ugly (plain text) but if you're reasonably careful it can be ok, or you can go with NetSNMP and get things to work.

If you PM me I'll send you a Ubuntu VM template with Nagios etc all configured and ready to go, you just need to add the hosts & IPs and which services you want to monitor. Or I can give you access to a Subversion repo with the configs, just lmk, it literally takes ~4 hours to set up on a new network, I've done it at 6 sites globally so far (using mntos to make a cross site dashboard).

Problem is, I don't have six sites - I have a hundred plus, and not all of them are directly accessible, some of them are behind really dumb SoHo routers with no port forwarding capability, many have dynamic IPs, dozens of internal IP ranges overlap, etc. Plus SNMP queries over WAN have a bad habit of timing out and triggering false alarms.

So far the more I use KS HostMonitor, the more I like it. The predefined checks are good, and I can write my own via scripts as needed. The agent management, in particular, is a breeze compared to Nagios - I install a remote agent on a customer site, plug in the central monitor IP and name/password, it registers and then it's completely transparent - I define sensors on the central monitor, pick which agent to run them with, and it Just Works.

Just works is good! We moved off hostmonitor because of scaling issues (I am currently watching ~1k hosts + their infrastructure) and I wanted to be able to graph trends etc. and present it to my customers via the webui.

I do have some problems with checks timing out but that's for the most part configurable (well it's UDP so...) and it's sites that are seriously far away from a topology point of view (Perth comes to mind). Mostly after dealing with Unicenter, Tivoli/IBM Director and NSClient++ I just hate agents in general.

How many sensors did you have in HostMonitor when you started running into scaling issues? I estimate I need to monitor about 500 devices - physical servers, ESX(i) hosts, VMs, firewalls, switches, etc.

Not to hijack the thread but does anyone have any opinions about What's Up Gold? specifically vs Solarwinds Orion? I've ran a demo of both and What's Up seems to do most everything that Orion does but it's not quite as pretty and needs more manual setup. Anything I may have missed that would make Orion worth twice the price as What's Up?

How many sensors did you have in HostMonitor when you started running into scaling issues? I estimate I need to monitor about 500 devices - physical servers, ESX(i) hosts, VMs, firewalls, switches, etc.

Not sure, we're running ~8k service checks & 1k host checks now but it was not that many at the time.

The work involved was more on the managing stuff (like getting people at other sites access to login securely, which Apache is handling now) and setting up version control on changes etc. A lot of the benefit was the host OS, Linux is (IMO) generally easier to script stuff like this in.