We use a combination of Monit, M/Monit, and StatusCake as an external monitor.

Monit + M/Monit is excellent as you can monitor pretty much anything you can think of and set specific thresholds for not only alerts but for functions. You can detect if the webserver or Mysql hangs or dies and attempt a restart or fire a script automatically. Only downside I have found is that occasionally certain network issues seems to escape detection, and that is why we added StatusCake as an external monitor ... just in case.

Does Nagios or Munin really help at all for monitoring dozens of virtualized containers? And does Pinguzo support SolusVM? I tried googling it, but the top result is this thread... lol. Good job on the improved SEO, knownhost

Does Nagios or Munin really help at all for monitoring dozens of virtualized containers?

Click to expand...

We install agents for Munin and Nagios in VPSes that have management which solves the per-VPS monitoring for us. And we also monitor our Xen host nodes directly although we're not checking anything Xen-specific. It wouldn't be at all difficult to write a custom Nagios plugin that checks useful stuff like number of running Xen domains though, Nagios plugins are very simple to write.

Munin is nice but it can be very heavy on the server, if you have many nodes added to it. Somewhere after 50 it gets very bad, even with good hardware, so you have to choose what you will monitor. Even so, it is the best regarding diagnostic, I think. Zabbix and observium not bad either.

This thread is a little older but some information fits, you can install Pinguzo in all servers and it will monitor through a web browser, Pinguzo is still free from softaculous the URL is http://pinguzo.com.
It can monitor your server and also can monitor the VPS as well websites and SMTPs.
The features are: monitor CPU, Load, Network, RAM, Disk
Email warnings configured by the user.

Munin is nice but it can be very heavy on the server, if you have many nodes added to it. Somewhere after 50 it gets very bad, even with good hardware, so you have to choose what you will monitor. Even so, it is the best regarding diagnostic, I think. Zabbix and observium not bad either.

Click to expand...

The default install hammers disk IO really hard, you're right, but you can fix that by using rrdcached. If you're not using it, check it out!

You probably also want to enable CGI mode for HTML and graphs to only generate them when you're actually looking at them rather than generating every single graph and every single HTML page every 5 minutes.

Doing both of the above made our Munin scaling problems go away pretty much entirely.

Thanks, after some 20 nodes we started to apply some mitigation techniques, however, after 40-50, things got too heavy and had to start removing graphs and enlarging the interval. We still have munin now but scaled down and some groups of servers are monitored differently.