I'm working on an application where I want to keep a timeout value computed dynamically using a moving average of say 100 previous response times for service A.

This value + some buffer (~3 seconds) will be used as a timeout for network requests to service A.
The problem here is, I want this timeout to improve with network characteristics- this is not a problem when user switches from a slow network to a faster network, but in the opposite case, the timeout should increase 'gradually', EXCEPT in scenarios where connection was spotty (like in a parking garage), in which case I don't want this timeout to increase since this is transient.

I'm not sure if it's even possible to distinguish between the 2 cases mentioned above.
Even if not, what should I record as the response time when a request times-out?
I was thinking of recording (currentTimeoutValue + 2) seconds, this works well for parking garage case (doesn't push the average up dramatically) but if someone's going to be on a slower network for a while, this will take many more timed-out requests to catch up with current network speed.

I guess this is an application on a mobile device which tries to call the service? For your parking garage case: can't the mobile device detect the difference between "no network available" and "network available (though slow)"?
– Doc BrownMay 4 '16 at 6:16

Is there any reason why you wouldn't want the worst-case timeout time?
– SnoopMay 17 '16 at 22:11

2 Answers
2

With the moving average, you can't identify easily the quick variations. By construction, the curve of the moving average always flatten/smoothen the reality. It will react to change trends with a time shift. The longer the moving average span, the longer the shift.

Here how your timeout based on moving average could evolve in two situations: 1) with a slow increase and 2) with a quick change, as for example when you pass a tunnel with your phone, or if you're in an elevator:

There will be a timeout, only if your response time increases fast enough to burn the buffer time. But it's not possible with one moving average to notice if the increase of response time is fast or not, because it's averaged.

You can make your timeout adaptation more aware of brutality of change by using a second moving average: the first would average the 100 last response times and the second would only take the 10 or 20 last ones. By comparing the two you can get an idea of how brutal the change is and adapt the reaction :

apply algorithm consistently for slow degradation (fig 1)

suspend the update of the first moving average and use a fallback strategy (fig 2) You could for example use a fixed value in case of garage or tunnel, or switch temporarily to the second moving average and add a higher buffer.

Of course you'll have to fine tune what difference between both moving average would be relevant to make the switch.

I assume this is an application on a mobile device which tries to call the service. If your mobile device can detect the difference between "no network available" and "network available (though slow)", then you are in a much better situation, since if the network is not available, you simply do not change the timeout.

For the other, "standard" case, you could try something like

newTimeout = (currentTimeout + 1) * c

where c is a value somewhere between 1 and 2, whenever the response time gets exceeded, and apply the "moving average" only when the response time gets not exceeded. Moreover, it should be clear that you should keep the maximum timeout below a reasonable upper limit to prevent it converging to infinity.

Thanks for the answer. Our data strongly suggests that the network is not entirely gone in a lot of situations. May be in parking garage, there is no network. But in other situations like in a remote area, the network will be poor but only for a short while. Basically I'm looking for a way to cover the cases where network is poor for a short term and where network is poor for a while and improve the average accordingly.
– VishalHemnaniMay 4 '16 at 14:57

1

@Armageddon: so what? Poor network for a short while, or for a long term: does not matter, timeout increases quickly in both cases, which is fine. If your network gets better after a minute, you get quick responses -> moving average over the last 100 response times will quickly bring your timeout down to a low value. If it gets not, the timeout will keep to a high value.
– Doc BrownMay 4 '16 at 15:56