Wednesday, June 25, 2014

An article by Josee Loudiadis [pictured], Director of Network Intelligence, Alcatel-Lucent, covers the "signaling spikes that can overwhelm available signaling capacity, undermining the customer experience and the operator’s reputation with it" and analyzes the signaling behavior of several common applications.

"There are three kinds of “signalasis”: microbursts that can be measured in seconds, extended bursts that can last minutes to hours, and suddenly sustained signaling growth where signaling jumps significantly at one point in time and continues to increase over weeks and months. Facebook’s 60% jump on Nov 2012 is an example of the latter"

On April 29th, CPU overload alarms reported that RNCs (Radio Network Control) were inundated with requests. The signaling spike could be matched to Viber – flows showed that Viber servers were no longer responding. Viber was down. But, why would this app outage have such a significant impact on signaling resources? The answer is in Viber’s call failure handling: the app would retry repeatedly to connect with the server, creating a larger signaling wave as more users tried unsuccessfully and repeatedly to connect.

Microbursts: Microsoft Exchange and iOS:

GWs subjected to 36% signaling microburst at midnight
due to iOS-based Microsoft Exchange

This case exemplifies other short term outages where the signaling spike exceeded the signaling capacity on a daily basis. A 36% signaling jump occurred everyday at midnight, but the reason for the spike remained mysterious. The WNG narrowed it down: the signaling was initiated by devices trying to reach the Microsoft Exchange server. This interaction lasted less than 1 second in duration. It only involved iPhone devices and was more predominant with iOS version 6.1. Equipped with this information, operators could contact Apple and identify the root cause. A fix was issue in a subsequent iOS version update