First of all, I tried out to increase the number of pre-forked instances of pollers for the Zabbix server, that is, I changed its default value from 5 to 256 (remember that for that case, you have to set the the number of maximum connections in MySQL - max_connections - higher than 256, since every single poller opens a dedicated connection to the database).

Below you can see the outcome after applying it (Zabbix server performance).

And the Zabbix data gathering process.

In the first figure, you can observe that the Zabbix queue has gone from 48 to 30 (approximately), and for the second one, the Zabbix busy poller processes went from 100% to 24%. So it is clear that if you have a server with enough resources, there is no problem to start many pollers. These kind of processes are responsible for requesting the data defined in the items, so the more pollers have available, the less overloaded the system is.

Other Zabbix server parameter that you ought to take into account is for example the Timeout (specifies how log pollers wait for agent responses). Try not to assign a very high value. Otherwise, the system might get overloaded.

Next week, I will end up this series of articles by accomplishing the part of the client.

Oct 21, 2012

Let's finish the series of articles related to the differences in the measure of sysstat, top and ps.

What about the system CPU time (%sy) of the first top from the previous article? It is 39.7%. It is right as well. You have to take into account that, because this server has a couple of cores, that data represents the average from both cores. You can see this point by running top into interactive mode and pressing the number 1. Then you will be able to obtain the consumption for both cores.

Oct 15, 2012

In my previous job, I had to set up a Zabbix infrastructure in order to monitor more than 400 devices between switches and servers. The main feature of this architecture was that there were a lot of machines, but the update interval was large (around 30 seconds) and the number of items small.

For this purpose, I wrote down a couple of articles related to this issue:

But in my current position, I am starting to introduce Zabbix (2.0.3 on Ubuntu Server 12.04) with the aim of controlling few devices where a large number of items and a small monitoring period are required. This situation leads to an overload of the Zabbix server, on the one hand by increasing the number of monitored elements delayed in the queue, and on the other, turning out that the poller processes are busy long.

In addition, I have been able to observe that, from time to time, the agent goes down in an unexpected way. If you take a look at the log file from the client (debug mode), the following error lines are dumped.

Below you can observe a figure which shows the Zabbix server performance (queue) for the aforementioned case.

And the other one, reflects the Zabbix data gathering process (pay attention to the data Zabbix busy poller processes, in %).

For the first case, the Zabbix queue has averaged more than 50 monitored items delayed, and for the second one, the poller processes are busy about 100% of the time. This situation can produce that, sometimes, Zabbix draws sporadic dots rather than lines in the graphs. Another effect that you can get from this condition is that if you set a short update interval for an item, you could run into lack of data when you check the values gathered later.

Also say that I followed the tuning guide that I mentioned before, but as you can see, Zabbix server was acting up.

Oct 6, 2012

Following up on the previous article, sysstat vs top vs ps (I), a curious case that I would like to talk about is when you use more than one core. Let's create a simple script in Python which runs a couple of threads a little bit overloaded.

So what would be the first weird thing that you can observe from the previous screen? The script is consuming 129% of the CPU. This is right because you have to remember that this virtual machine has two cores and the script, rather, its two threads, are using the two ones and that figure is the combination of the CPU utilization from both cores. You can appreciate this situation much better if you execute top with the -H option.

Subscribe to

Follow by Email

About the author...

Javier Andrés Alonso has got a Master's Degree in Telecommunication Engineering and a Bachelor's Degree in Telecommunication Technical Engineering (specialising in Telematics), from the Polytechnic School of the University of Alcalá de Henares.