Activity

The network connection is stable, this is confirmed by ping graphs. Ping frequency is 3 seconds, IPMI requests frequency is 300 seconds. IPMI library version is 2.0.16-7.el5. StartIPMIPollers=2, 3 hosts are monitored via IPMI and there are some other hosts monitored via agents, snmp and simple checks.

Aleksandrs Saveljevs, no this is not the same problem as ZBX-3188, I have no "host unreachable" errors in zabbix log. Besides that, all sensors work in my setup, the gaps in data happen periodically, after some time sensors data becomes reachable again, then after some time network error occurs, then it becomes ok again. Just see the graphs attached to my first post.

Sergey Syreskin
added a comment - 2010 Nov 29 10:37 - edited Aleksandrs Saveljevs, no this is not the same problem as ZBX-3188 , I have no "host unreachable" errors in zabbix log. Besides that, all sensors work in my setup, the gaps in data happen periodically, after some time sensors data becomes reachable again, then after some time network error occurs, then it becomes ok again. Just see the graphs attached to my first post.

richlv
added a comment - 2011 Apr 25 17:54 could it be that by polling ipmi too often it becomes slow, locks up or just applies some connection throttling ?
how many ipmi items you have ? do they all have the same interval ?

There are 3 hosts with 125 IPMI items each. Polling interval is set to 300 seconds for each item.
I'm using Zabbix 1.8.5 now and don't experience this problem any more.
I can't remember when the problem disappeared, it could be Zabbix update or changes in the IPMI template,
that I have done some time ago.
The only thing I can say for sure, is that I didn't change any settings on the IPMI devices.

Looking at the template in the attached ipmi_error_report.tgz archive, I can see that my current template is
definitely different from the old one. The old template had only 19 items.

Sergey Syreskin
added a comment - 2011 Apr 26 09:27 There are 3 hosts with 125 IPMI items each. Polling interval is set to 300 seconds for each item.
I'm using Zabbix 1.8.5 now and don't experience this problem any more.
I can't remember when the problem disappeared, it could be Zabbix update or changes in the IPMI template,
that I have done some time ago.
The only thing I can say for sure, is that I didn't change any settings on the IPMI devices.
Looking at the template in the attached ipmi_error_report.tgz archive, I can see that my current template is
definitely different from the old one. The old template had only 19 items.

-S <sdr_cache_file>
Use local file for remote SDR cache. Using a local SDR cache
can drastically increase performance for commands that require
knowledge of the entire SDR to perform their function. Local
SDR cache from a remote system can be created with the sdr dump
command.

Chris Witte
added a comment - 2012 Mar 28 16:52 - edited It seems that the BMC gets too many requests/connections.
From time to time I get following message when running ipmitool:
# ipmitool sdr -H <HOSTNAME> -U <USER> -P <PASSWORD> -L USER
Get Session Challenge command failed: Node busy
Error: Unable to establish LAN session
Get Device ID command failed
Unable to open SDR for reading
Does Zabbix use a sdr cache ? This could increase the performance.
ipmitool offers this parameter:
-S <sdr_cache_file>
Use local file for remote SDR cache. Using a local SDR cache
can drastically increase performance for commands that require
knowledge of the entire SDR to perform their function. Local
SDR cache from a remote system can be created with the sdr dump
command.
BMC busy topic: http://old.nabble.com/possible-causes-for-%22ipmi_ctx_open_outofband%3A-BMC-busy%22-td31448014.html

My colleague has done some testing on this issue, and he came to the conclusion that IPMI CPU is unable to handle all those requests. As he says, for each request to IPMI host Zabbix opens one separate connection and IBM System x IMM module is unable to handle all the requests. So he had to write a wrapper script that requests all IPMI items from the host at a time, stores them in a cache file, and gives items to Zabbix when it requests.

Sergey Syreskin
added a comment - 2012 May 22 09:52 My colleague has done some testing on this issue, and he came to the conclusion that IPMI CPU is unable to handle all those requests. As he says, for each request to IPMI host Zabbix opens one separate connection and IBM System x IMM module is unable to handle all the requests. So he had to write a wrapper script that requests all IPMI items from the host at a time, stores them in a cache file, and gives items to Zabbix when it requests.

What about caching the sdr query like ipmitool does whe using the parameter -s ?

-S <sdr_cache_file>
Use local file for remote SDR cache. Using a local SDR cache
can drastically increase performance for commands that require
knowledge of the entire SDR to perform their function. Local
SDR cache from a remote system can be created with the sdr dump
command.

I know that freeipmi automaticaly creates a cachefile of the sdr. But Zabbix uses openipmi.
For sure Zabbix's IPMI-Engine would have a better performace when using the caching option by default.

Chris Witte
added a comment - 2012 May 22 15:42 - edited Thanks for your reply. Could you post the wrapper script here ?
What about caching the sdr query like ipmitool does whe using the parameter -s ?
-S <sdr_cache_file>
Use local file for remote SDR cache. Using a local SDR cache
can drastically increase performance for commands that require
knowledge of the entire SDR to perform their function. Local
SDR cache from a remote system can be created with the sdr dump
command.
I know that freeipmi automaticaly creates a cachefile of the sdr. But Zabbix uses openipmi.
For sure Zabbix's IPMI-Engine would have a better performace when using the caching option by default.
Chris

The script is rather simple, it just stores values in a local file with a timestamp. Then, when Zabbix requests a value, script examines the timestamp, and either renews its cache first, or just gives out data from cache, if it's recent enough.

Sergey Syreskin
added a comment - 2012 May 22 16:02 - edited The script is rather simple, it just stores values in a local file with a timestamp. Then, when Zabbix requests a value, script examines the timestamp, and either renews its cache first, or just gives out data from cache, if it's recent enough.

I'm experiencing the same network errors in the server log as Chris is above (running 2.0.2), trying to connect to a Dell PowerEdge 1950 (BMC) and PowerEdge R210 II (iDRAC 6 Express). Is there some way to make the IPMI poller more accommodating for slow devices?

Aaron Smart
added a comment - 2012 Aug 24 14:10 - edited I'm experiencing the same network errors in the server log as Chris is above (running 2.0.2), trying to connect to a Dell PowerEdge 1950 (BMC) and PowerEdge R210 II (iDRAC 6 Express). Is there some way to make the IPMI poller more accommodating for slow devices?

I know the solution for this issue:
as for me, if you are using LO-100 you must set password size to 16 bytes (not 20). After that monitoring of IPMI will start to work.
So, zabbix don't use ipmi 2.0 and I can't find where I can set it.

I too had same problem (monitoring 5 hosts with around 10 items each), and was getting unsupported items intermittently every minute or so. Based on a suggestion from forums[1], I changed number of IPMI pollers to just one. Since then, there was no problem with getting IPMI values at all. This was on zabbix 2.0.3 at that time, and still works flawlessly on 2.05 with just one IPMI poller.

Andrej Kacian
added a comment - 2013 Mar 14 15:54 I too had same problem (monitoring 5 hosts with around 10 items each), and was getting unsupported items intermittently every minute or so. Based on a suggestion from forums [1] , I changed number of IPMI pollers to just one. Since then, there was no problem with getting IPMI values at all. This was on zabbix 2.0.3 at that time, and still works flawlessly on 2.05 with just one IPMI poller.
1. https://www.zabbix.com/forum/showpost.php?s=783bdc9aff7d3ea26999f74f4d223e59&p=118389&postcount=4

Alexey Pustovalov
added a comment - 2014 Feb 24 14:27 if IPMI sensor is located at the end of table of sensors, getting value can take about 40-50 seconds and sometimes can be failed with network error:
10673:20140224:182441.012 In get_value() key:'ipmi.cpu[FAN 1]'
10673:20140224:182441.012 In get_value_ipmi() key:'Zabbix server:ipmi.cpu[FAN 1]'
10673:20140224:182441.012 In init_ipmi_host() host:'[10.100.52.28]:623'
10673:20140224:182441.012 In get_ipmi_host() host:'[10.100.52.28]:623'
10673:20140224:182441.012 End of get_ipmi_host():0x2f9bbf0
10673:20140224:182441.013 End of init_ipmi_host():0x2f9bbf0
10673:20140224:182441.013 In get_ipmi_sensor_by_id() sensor:'FAN 1@[10.100.52.28]:623'
10673:20140224:182441.013 End of get_ipmi_sensor_by_id():0x307fcf8
10673:20140224:182441.013 In read_ipmi_sensor() sensor:'FAN 1@[10.100.52.28]:623'
10673:20140224:182448.020 In got_thresh_reading()
10673:20140224:182448.020 got_thresh_reading() fail: [16777411] Unknown error 16777411
10673:20140224:182448.020 End of got_thresh_reading():NETWORK_ERROR
10673:20140224:182448.020 End of read_ipmi_sensor():NETWORK_ERROR
10673:20140224:182448.020 Item [Zabbix server:ipmi.cpu[FAN 1]] error: error 0x10000c3 while reading threshold sensor
10673:20140224:182448.020 End of get_value():NETWORK_ERROR

I have had the same experience, that reducing the number of IPMI pollers to just one has stopped the frequent Network Error messages and gaps.

I had both my production server and a small test VM server, the production server was only polling a few IPMI items and a lot of other non-IPMI monitoring, and the test server was only polling a few IPMI items and doing no other monitoring. Both servers were showing frequent Network Errors and gaps from the items then going unsupported. I reduced the number of IPMI pollers to just one last week and have yet to see a Network Error warning since. Neither server showed all the IPMI pollers as busy.

This is with Zabbix 2.2.2.

Thinking it might have something to do with two different ipmi pollers polling the same device at the same time, I did a simple test where from two different hosts I issued an ipmitool sensor command to the same IPMI device. What I observed is that the resulting output from the IPMI is only sent to one device at a time. The effect I observe is that one ipmitool output starts scrolling while the other is paused for a few seconds, then the other starts scrolling and the first one pauses, and this goes back and forth a few times until both are complete.

Michael Sphar
added a comment - 2014 Apr 14 17:57 I have had the same experience, that reducing the number of IPMI pollers to just one has stopped the frequent Network Error messages and gaps.
I had both my production server and a small test VM server, the production server was only polling a few IPMI items and a lot of other non-IPMI monitoring, and the test server was only polling a few IPMI items and doing no other monitoring. Both servers were showing frequent Network Errors and gaps from the items then going unsupported. I reduced the number of IPMI pollers to just one last week and have yet to see a Network Error warning since. Neither server showed all the IPMI pollers as busy.
This is with Zabbix 2.2.2.
Thinking it might have something to do with two different ipmi pollers polling the same device at the same time, I did a simple test where from two different hosts I issued an ipmitool sensor command to the same IPMI device. What I observed is that the resulting output from the IPMI is only sent to one device at a time. The effect I observe is that one ipmitool output starts scrolling while the other is paused for a few seconds, then the other starts scrolling and the first one pauses, and this goes back and forth a few times until both are complete.

Same here with 2.2.6
I observed that I had an item with an invalid sensor id. Seems that if there is a problem with any one item, further processing just breaks.
I disabled all items that did not give a value and the problem disappears.
Note that this happens even if the sensor is listed and basically available, but simply doesn't provide a value (e.g. I have sensor FAN4 but no fan connected to it)!

Norbert Wögerbauer
added a comment - 2014 Oct 09 11:48 Same here with 2.2.6
I observed that I had an item with an invalid sensor id. Seems that if there is a problem with any one item, further processing just breaks.
I disabled all items that did not give a value and the problem disappears.
Note that this happens even if the sensor is listed and basically available, but simply doesn't provide a value (e.g. I have sensor FAN4 but no fan connected to it)!

Jeroen van den Berg
added a comment - 2015 Jun 27 15:16 This issue still exists in 2.4.5, and I can confirm that if you disable unavailable sensors it works without problems.
Looks like the handling of not available sensors is incorrect.

Disabling unavailable sensors did not help (zabbix_server v2.4.5) in my case - those sensors that had been successfully receiving data still had issues.
However, when I set IPMIPollers to 1, the issue disappeared.

Ilya Kruchinin
added a comment - 2015 Aug 12 04:10 Disabling unavailable sensors did not help (zabbix_server v2.4.5) in my case - those sensors that had been successfully receiving data still had issues.
However, when I set IPMIPollers to 1, the issue disappeared.

Similar behaviour on my supermicro board monitored using latest zabbix server and agent from zabbix debian repository (2.4.7-1+jessie).
I was actually able to fix the issue by lowering the update interval of one IPMI item from 300s to 60s (all others ipmi items were kept to their 300s interval). If I set this ipmi item to 90s, the issue appears again.
Could be some ipmi session handling issue.

pfoo
added a comment - 2015 Dec 13 03:40 Similar behaviour on my supermicro board monitored using latest zabbix server and agent from zabbix debian repository (2.4.7-1+jessie).
I was actually able to fix the issue by lowering the update interval of one IPMI item from 300s to 60s (all others ipmi items were kept to their 300s interval). If I set this ipmi item to 90s, the issue appears again.
Could be some ipmi session handling issue.

Sascha Plumhoff
added a comment - 2016 Jun 14 17:39 Same here with DELL PowerEdge R510.
It seems to be a know issue with OpenIPMI, see https://www.zabbix.com/documentation/3.0/manual/config/items/itemtypes/ipmi :
"IPMI session inactivity timeout for LAN is 60 +/-3 seconds. [...] then the next IPMI check after the timeout expires will time out due to individual message timeouts, retries or receive error."
Reducing the check interval to 45 seconds fixed the problem for me.
The issue appears naturally more frequently in testing environments e.g. when checking only 1 item on a server.