Customer has experienced a major performance issue with metric collection. An agent which only contained 1 RHQ Agent, 1 RHQ Server, and 1 SOA-P Server, was spending more then 50 seconds collecting metrics for the 60 second metric collection schedule. This resulted in metric collection on a 30 second schedule to be delayed by about 80 seconds. After investigation, the conclusion is that the number of metrics being collected at the 60 second schedule was just too much. Although there was no CPU bottle-neck or network bottle-neck, the issue was that there were just too many individual metrics to be collected. In this case, the cause was directly related to JBoss AS5 Queues. In this case there were approximately 80. The default metric schedule configuration for Queues seems to be to collect a lot of metrics very rapidly.
Consumer Count The number of consumers on the queue MEASUREMENT Yes 00:01:00
Count The total message count since startup or last counter reset MEASUREMENT Yes 00:01:00
Count Delta The message count delta since last method call MEASUREMENT Yes 00:01:00
Created Programmatically Was this queue created programmatically? If Yes, the queue will not survive a restart of the application server. If No, the queue was created via a deployment XML file. TRAIT Yes 00:10:00
Delivering Count The number of messages currently being delivered MEASUREMENT Yes 00:01:00
Depth The current message count of pending messages within the queue waiting for dispatch MEASUREMENT Yes 00:01:00
Depth Delta The message count delta of pending messages since last method call MEASUREMENT Yes 00:01:00
Message Count The number of messages in the queue MEASUREMENT Yes 00:01:00
Message Counter History Day Limit This queue's message counter history day limit - <0: unlimited, =0: history disabled, >0: maximum day count TRAIT Yes 00:10:00
Run State Run State TRAIT Yes 00:00:30
Scheduled Message Count The number of scheduled messages in the queue MEASUREMENT Yes 00:01:00
Time Last Update The timestamp of the last message add MEASUREMENT Yes 00:01:00
The recommendation is to increase these schedules to 5 minutes to even longer or at least turn some of them off. Additionally, much of this issue could be resolved by simply making metric collection schedules occur in a non-synchronous fashion.