In a nutshell, Skype says it was bug in a Windows Client software which lead to overloading of certain super nodes, which crashed and thus caused a chain reaction of problems.

On Wednesday, December 22, a cluster of support servers responsible for offline instant messaging became overloaded. As a result of this overload, some Skype clients received delayed responses from the overloaded servers. Because of a bug identified in a version of the Skype for Windows client (version 5.0.0152), the delayed responses from the overloaded servers were not properly processed, causing Windows clients running the affected version to crash.

Around 50 percent of all Skype users globally were running the 5.0.0.152 version of Skype for Windows, and the crashes caused approximately 40 percent of those clients to fail. These clients included 25–30 percent of the publicly available supernodes, also failed as a result of this problem.

If you had the latest Skype for Windows (version 5.0.0.156), older versions of Skype Windows (4.0 versions), Skype for Mac, Skype for iPhone, Skype on your TV, and Skype Connect or Skype Manager for enterprises, you were not initially affected by this problem. However, with nearly a quarter of Skype’s super nodes going down, it quickly became a network-wide problem.

A supernode is important to the P2P network because it takes on additional responsibilities compared to regular nodes, acting like a directory, supporting other Skype clients and establishing connections between them by creating local clusters of several hundred peer nodes per each supernode.

Once a supernode has failed, even when restarted, it takes some time to become available as a resource to the P2P network again. As a result, the P2P network was left with 25–30 percent fewer supernodes than normal. This caused a disproportionate load on the remaining available supernodes. A significant proportion of users were also restarting crashed Windows clients at this time. This massively increased the load as they reconnected to the peer-to-peer cloud.

In order to deal with the problem, Skype essentially introduced “thousands of instances” of the Skype software into its P2P network and created temporary supernodes. The biggest lessons learned from this, Rabbe writes:

More investments in their infrastructure so that the system becomes and stays reliable.

More rigorous testing procedures that don’t let buggy software out into the market.This is not the first time Skype systems came under pressure because of faulty bugs. In August 2007, Skype had software problems as well, which in turn caused a flood of log-in requests and crashed the network.