BlackBerry outages due to message backlog

The BlackBerry outages are all down to a message backlog, apparently. A backup failure in Europe meant undelivered messages broke the entire global network, BlackBerry makers RIM said in a press conference this evening.

It's working to fix the situation, but users all over the world are still suffering, with some with no BBM, others no email or internet as well. Read on for the full story, especially if you're one of those affected.

Previously RIM announced through its official Twitter account that a core switch failure in the infrastructure was to blame, and now it's clarified things a little with a follow-up announcement. Basically, this failure caused a backlog of messages, which caused a domino effect through its infrastructure all over the world.

And things still aren't looking too good if you use a BlackBerry, with no promised date the services will be back up and running, and no mention of any recompense.

CTO for software David Yach said: "On Monday RIM's infrastructure based in Europe experienced a core switch failure," confirming what the company had already announced. "Now all of our network switches have multiple redundancies and the network is designed to automatically fail over to a redundant switch with no impact to users."

However, this backup went tits up, to use a technical term, causing a backlog of messages, creating a domino effect on its worldwide infrastructure. Or as Yach put it: "In this case, however, the fail over did not function as expected, despite the fact that we regularly test our fail over systems and process these to minimise this type of service impact to our customers.

"As a result, a large backlog of messages has been generated. We've had to throttle traffic to stabilise service while we process this substantial backlog of messages in a controlled manner. This is why we're seeing ongoing issues and why we're seeing impact to other regions around the world."

He said the company was sure this was the cause, but "we will be revealing a full and complete evaluation once service is restored fully in order to confirm the root cause and the reason for the subsequent instability."

Obviously it's taking some time to work through all these messages, so if you are experiencing problems, rest assured your messages will be delivered. Eventually. Yach made no mention of recompense, only that his priority was to get the service working for customers again.

In terms of impact, he said some customers haven't had any problems, while some saw "varying degrees of delays or in some cases service interruptions." He said there was no evidence of a breach or hack, and added the company has "global teams working around the clock on this, and are focussed on… minimising the impact on our customers."

The problems started on Monday in Slough, and soon spread across Europe, to the Middle East and Africa, then America and Canada.