Skype talks of “perfect storm” that caused outage, clarifies blame

Confused reporting combined with vague explanations from the Skype team have …

Confused reporting combined with vague explanations from the Skype team have led to new explanations from Skype, clarifying how its service went belly-up for more than 30 hours.

The outage, which began on August 16 and cleared up roughly two days later, has led to many a missive on both the merits of Skype in particular and VoIP in general. Although most of Skype's user base utilizes only the free aspects of the service, the company has come under fire for the outage and its response to it. Thus, we're not too terribly surprised to see Skype trying to be transparent about the outage and reassure readers that it won't happen again.

Skype spokesperson Villu Arak took to the company blog again to defend both Skype and Microsoft, the latter having been indirectly blamed by the media for Skype's outage, and the former being portrayed as passing the blame to Microsoft.

Skype attempted to say yesterday that Windows Update was not the cause of the Skype outage but its catalyst. Yet many media outlets only heard "we blame Microsoft," and this constituted the headlines and ledes adopted by many in the press. In turn, the accusation created significant confusion because as we all know, Microsoft's big Patch Tuesday updates are generally a monthly event. Why did this happen just now if Microsoft is to blame?

Without going into technical detail, Arak said that this outage was created by a "perfect storm" of conditions that it had not encountered before. In previous instances where there were mass reboots of PCs on the Skype P2P network, "there had not been such a combination of high usage load during supernode rebooting," he said. Arak proclaims that "We don't blame anyone but ourselves."

"Skype's peer-to-peer core was not properly tuned to cope with the load and core size changes that occurred on August 16," Arak noted.

Working with Microsoft, Skype checked over all of the patches made available on Patch Tuesday and ruled out any of them playing a specific role in the outage. The only possibility left was a flaw relating to the self-healing algorithm, one in which high load and mass reboots created a kind of authentication black hole where fewer and fewer supernodes were available to service more and more authentication requests. What's still confusing, however, is why the problems showed up on the 16th when Patch Tuesday arrived two days earlier, on the 14th. One must assume that supernode changes are (or at least were) rather slow-moving or that Skype is not telling us the whole story.

Conspiracy theorists in my inbox are convinced that there's more going on here than meets the eye, as some find it difficult to reconcile the time between Patch Tuesday and the flaw. However, it is worth noting that the earliest signs of the flaw were reported very early on Thursday, before 9 AM GMT. While Microsoft begins seeding Windows Update notifications sometime on Tuesday, the automatic installation and reboots that occur typically happen in the very early morning the following day. One would need Microsoft to reveal the exact timing of this last round of updates, but it's quite possible that the bulk of reboots stemming from Patch Tuesday actually occurred on Wednesday, especially for European users who are 5+ hours ahead of the US. Hence, the window may be significantly less than 48 hours.

For its part, Skype declares the bug fixed. "We'd like to reassure our users across the globe that we've done everything we need to do to make sure this doesn't happen again. We've already introduced a number of improvements to our software to ensure our users will not be similarly affected—in the unlikely possibility of this combination of events recurring," Arak said.

Ken Fisher / Ken is the founder & Editor-in-Chief of Ars Technica. A veteran of the IT industry and a scholar of antiquity, Ken studies the emergence of intellectual property regimes and their effects on culture and innovation.