To troubleshoot a slowdown, a lot more metrics are needed. Actually all the metrics are needed, since the real cause of a slowdown is most probably quite complex. If we knew the possible reasons, chances are we would have fixed them before they become a problem.

Most monitoring solutions, when they are able to detect something, provide just a hint (e.g. “hey, there is a 20% drop in requests per second over the last minute”) and they expect us to use the console for determining the root cause.

Of course this introduces a lot more problems: how to troubleshoot a slowdown using the console, if the slowdown lifetime is just a few seconds, randomly spread throughout the day?

You can’t! You will spend your entire day on the console, waiting for the problem to happen again while you are logged in. A blame war starts: developers blame the systems, sysadmins blame the hosting provider, someone says it is a DNS problem, another one believes it is network related, etc. We have all experienced this, multiple times…

Centralization of metrics depends on metrics filtering, to control monitoring costs. Time-series databases limit the number of metrics collected, because the number of metrics influences their performance significantly. They get congested at scale.

It is a lot easier to provide an illusion of monitoring by using a few basic metrics.

Troubleshooting slowdowns is the hardest IT problem to solve, so most solutions just avoid it.

Netdata collects, stores and visualizes everything, every single metric exposed by systems and applications.

Due to Netdata’s distributed nature, the number of metrics collected does not have any noticeable effect on the performance or the cost of the monitoring infrastructure.

Of course, since netdata is also about meaningful presentation, the number of metrics makes Netdata development slower. We, the Netdata developers, need to have a good understanding of the metrics before adding them into Netdata. We need to organize the metrics, add information related to them, configure alarms for them, so that you, the Netdata users, will have the best out-of-the-box experience and all the information required to kill the console for troubleshooting slowdowns.