Once Again, RIM Blames Upgrade For Outage; Time To Do Weekend Upgrades

from the just-a-suggestion dept

Last year when RIM's Blackberry service had a major outage, the company blamed a software upgrade, which made many folks question why RIM would be doing a major software upgrade in the middle of the week. Since most Blackberry users are corporate users, it would seem to make sense to do upgrades over the weekend -- or at least late at night. So, with this latest outage, it's a bit surprising to see RIM fall back on the old "software upgrade" excuse again. The company doesn't say that's definitely what happened, but suggests that's what caused it, failing to explain why the company would run an upgrade in the middle of the afternoon on a weekday.

where I work the outage windows starts at 2 AM eastern so even the west coasters only see it at 11 PM. Sucks for me but this is also when usage is at it's lowest. The bad part is you won't see what the system is like under load until the next morning. This make me wonder if the upgrade went smoothly at night and then broke horribly under mid-day load and a roll-back needed to happen.

yawn

I don't know a single company that makes changes on the weekend unless its a MAJOR change (usually a cutover of sorts). The majority of change requests happen during the week with Wednesday being the most popular day. RIM fails at properly testing their changes. Not at scheduling. Besides if this happened on Sunday - everyone would still be bitching about it.

I think its crazy to think that any company is going to actually achieve 24/7/365 uptime, when capacity only keeps going up, and there's really no way to say to customers look you'll possibly have downtime for 4-12 hours. Unlike websites that are not necessarily 24/7 throughout everywhere all the time, Blackberries apparently are given the amount of criticism flung at RIM. Can they do a better job of informing customers, of course the answer is yet. The problem is when we criticize services for being down as a result of an update then we give no room for error, and people doing their best to hide the truth.

The reason to update on weekdays is ...

If you update on a weekend and everything goes wrong, then there is no way for the BB users to get help/answers from their techs or get messages the old-fashioned way (ha ha!) using their email client.

When you upgrade during the week and something breaks people aren't completely cut off, they just have to accomplish it without the BB.

Also, everyone talks about the "massive outages." Although it was massive in terms of the number of users cut off, RIM has an outstanding uptime record. Whether it's the most efficient set up or not, to handle that much traffic 24/7-365 with only one big outtage a year is nothing to be alarmed about. I heard that on the radio yesterday and just thought I'd add it to the discussion.

Rediculous

A system upgrade caused that? In my experience you upgrade on your downtime (middle of the night, weekends, etc.) when the least amount of user's are affected, you build in redudancy in case you have to fall back and YOU TEST BEFORE YOU RELEASE.

Single Point of Failure

This is one of the many things I dislike about RIM and blackberries. It adds yet another network to your e-mail delivery system. The company’s network, the internet, RIM network, and the Mobile carrier network. Other solutions such as Microsoft Active sync or something like Apple's IMAP access only depend on 3 of the above four (a little simplified of course). I still got my e-mail when RIMs network wasn't working - but then again I'm not using a Blackberry. That should tell you something.

Re: Single Point of Failure

[Single Point of Failure,] This is one of the many things I dislike about RIM and blackberries.
Absolutely. I can't believe any competent IT person would recommend these things. All email going through another company's network.... no thanks!
I love how people tout the "end to end encryption" on Blackberrys, I wouldn't call an encrypted connection terminating at a foreign company "end to end".

One more example of decisions being made by high level management without regard for the programmers in the trench. Every time you see a glitch like this I can guarantee there are a bunch of programmers sitting back saying to each other "see, I told you that would happen, but noooo, they wouldn't listen, and now WE have to work our asses off to fix the mess"