Filtering Alerts

Hi,

A little help please!

Is there a way to filter the transaction failure alerts we receive by email?

I ask this because there is an ongoing intermittent issue which generates a particular transaction failure alert for all of our configured tests - we are aware of the issue, a third party have taken responsibility and are working to resolve this for us. Unfortunately there seems to be no quick fix.

As I mentioned, the error is intermittent but does cause a large volume of email alerts to be generated for something we already know is happening, making the alerting less meaningful. This could cause genuine alerts for something else possibly ignored along with the endless alerts for this. Is there any way of blocking just the alerts generated for this particular error, as we are already aware that it is an ongoing issue? Without having to change the original alerting parameters - so that if there were to be a different genuine error I could still receive alerts for them promptly.

Test measurements are taken across 3 nodes, every 15 minutes.

Alerting has been configured to send an alert when there is transaction failure on 2 nodes, after 1 consecutive error. I know that I could prevent the alerts by increasing the consecutive error to more than 1 but that would mean that we also wouldn't receive error alerts for genuine failures for more than 30 mins( ? ) which is too long to wait.

5 Replies

Unfortunately, there really isn't a way to filter those alert e-mails without changing the alert parameters. Do you have the "Send reminders" checked on? If so, then you may want to consider either turning off the reminders or increasing the frequency to which they get sent out. The only other suggestion would be not with Gomez but with filtering using your e-mail application.

I think your last trail of thoughts is in the right direction. The setting 15 minutes is not a "hard" setting, especially if you have multiple nodes doing the same test.

So if you have three nodes doing the same test every 15 minutes, that can be translated to:

Do all now in the first minute.

Do one every 5 minutes.

Do all in the last minute

In reality the tests are spread out fairly evenly but the drift in time can be related to a particular nodes temporary workload.

I have a similar problem as you and increased the setting to 2 consecutive errors to trigger and I can still record and report the problems but as they are intermittent, they don't fill up my inbox anymore.

-Sidenote here - The consecutive setting is only available on the backbone nodes, but I'll make an ER (soon) for having this setting on the Mobile nodes as well.

Thanks for providing the background information. I'd like to add on to what Ulf said. I think increasing the consecutive error value to 2 could be an option for you.

How many steps are in the test? Do you have any measurements available to you?

I am assuming that since this is a known issue, it is also a temporary one that will be fixed in the near future. With that in mind, it might worthwhile to temporarily increase the consecutive error to 2 and increase the test frequency to something more frequent (i.e., 7 minutes). This might cut out some of those transaction alerts that you are expecting from this known issue. Moreover, it would ensure that that you still have visibility and are notified in a timely manner if another issue arises.

There are no reminders configured (thankfully - as the situation would be even worse) and I did think that filtering through the email application (outlook in this case) could be an option for myself but I am not the only recipient of the alerts.

I'd gathered that generally they tend to spread evenly over the 15 minutes, so that a different node is tested every 5 minutes. Also, we shall be introducing mobile tests soon also so have voted for the ER raised.

Unfortunately- we had a genuine incident over the weekend when the site had a complete outage and was down for max. 15 minutes. Due to the alerts being bumped up to two consecutive errors, no alert was received and I was informed of the situation by someone else. Looks like it may be back to 1 cons. error and endless alerts for now.

These are static alerts – I would like to play with dynamic alerts but I think that they are a bit too complicated and feel I may receive more ‘invalid’ alerts. Would it be worthwhile experimenting with dynamic alerts?