Microsoft's Danger SideKick data loss casts dark on cloud computing

Microsoft has demonstrated that the dark side of cloud computing has no silver linings. After a major server outage occurred on its watch last weekend, users dependent on the company have just been informed that their personal data and photos "has almost certainly been lost."

While occasional service outages have hit nearly everyone in the business, knocking Google's Gmail offline for hours, plunging RIM's BlackBerrys into the dark, or leaving Apple's MobileMe web apps unreachable to waves of users, Microsoft's high profile outage has impacted users in the worst possible way: the company has unrecoverable lost nearly all of its users' data, and now has no alternative backup plan for recovering any of it a week later.

The outage and data loss affects all Sidekick customers of the Danger group Microsoft purchased in early 2008. Danger maintained a significant online services business for T-Mobile's Sidekick users. All of T-Mobile's Sidekick phone users rely on Danger's online service to supply applications such as contacts, calendars, IM and SMS, media player, and other features of the device, and to store the data associated with those applications.

When Microsoft's Danger servers began to fall offline last Friday October 2, users across the country couldn't even use the services; even after functionality was beginning to be brought back on Tuesday October 6, users still didn't have their data back. This Saturday, after a week of efforts to solve the crisis, T-Mobile finally announced to its Sidekick subscribers:

?Regrettably, based on Microsoft/Danger?s latest recovery assessment of their systems, we must now inform you that personal information stored on your device ? such as contacts, calendar entries, to-do lists or photos ? that is no longer on your Sidekick almost certainly has been lost as a result of a server failure at Microsoft/Danger.?

A new report from Engadget says that T-Mobile has suspended sales of its Sidekick models and is warning: "Sidekick customers, during this service disruption, please DO NOT remove your battery, reset your Sidekick, or allow it to lose power."

Sidekick and the iPhone

Danger's Sidekick platform bears some resemblance to the iPhone; Danger brought the GSM Sidekick to market by partnering exclusively with T-Mobile. The partnership involves custom network services that makes features of the device unusable on other networks, and of course the phones are physically incompatible with the CDMA service operated by Verizon/Sprint. In some ways, Microsoft's purchase of Danger is exactly the fix recommended by some pundits for Apple's iPhone: a third party who could swoop in and break the iPhone's exclusive partnership with AT&T by bringing Verizon into the mix. In Danger's case, the "Pink Project" operated by Microsoft not only failed to achieve this intended goal, but failed in large part because the goal was simply a bad idea.

After all, if exclusivity was inherently a bad thing, it wouldn't be being used to successfully bring competing new models to the crowded smartphone market; Danger's Sidekick, Apple's iPhone, RIM's BlackBerry Storm and Palm's Pre all gained their visibility in the market because of concerted marketing by their exclusive mobile partners. All have experienced some launch issues which would have been far worse and more complex to resolve had their hardware makers tried to simultaneously launch them on multiple carriers, as Microsoft planned to do with its Pink "Windows Phone," using components borrowed the Danger acquisition.

However, the Danger Sidekick also has some significant differences from Apple's iPhone business model. First, the iPhone is designed to plug into a computer running iTunes for initial setup, and while not entirely mandatory, it is designed to regularly sync with a desktop system. This process involves backing up all of the device's application data to the users' local computer, and allows the user to restore the device later. Apps running on the iPhone also run as local software and do not require an external service to be available. Most applications are designed to work offline, as a significant chunk of the iPhone platform is made of up of iPod touch users. Apps are only updated and/or removed by the user.

The closest thing to Danger's online services is Apple's MobileMe, which is sold separately as an optional package of services that can sync, update, and push messages, contacts, calendars, bookmarks and other data to the phone, to associated desktop computers and for presentation via the web. After a problematic rollout plagued by slow performance and frequent outages occurred last year, Apple's MobileMe has matured into a reliable service. Even so, an interruption in MobileMe services wouldn't result in users being unable to use apps on their iPhones nor would it risk the loss of data on the device or backed up by the user's copy of iTunes.

The dark side of clouds

More immediate types of cloud services take away users' control in managing their own data. In addition to the Danger services for Sidekick users, Microsoft also independently runs a MyPhone service for its Windows Mobile platform. It provides certain mobile and web publishing features (but not push messaging) comparable to Apple's MobileMe.

However, Microsoft's MyPhone performs its backups of users' phone data directly to the company's servers, and not to the user's local system. That means a Danger-like failure on the server end of MyPhone could easily result in unrecoverable data loss for Windows Mobile users, too.

Users have reason to be wary about keeping all their data on a vendor's cloud service without also maintaining their own local backup. If Apple's MobileMe service loses your data, the company won't do much to help you restore it because it also provides a variety of ways for users to backup and restore their own data locally, directly from apps such as Address Book and iCal, by using a local backup system like Time Machine, and in using iTunes to backup mobile devices at sync. Apple's MobileMe cloud services are run as an accompanying value added service in addition to the maintenance tools users are given to secure their own data.

Other vendors have very different ideas about accountability for data in the cloud. In 2006, a relatively small number of Google's Gmail users experienced a security-related loss of their email and contacts. At the time, Google could only offer to reach out to the people who were affected "to apologize and to work with them to restore the email from any personal backup they might have." Google's strategy moving forward is highly dependent upon "non-local" cloud computing, with the company's Gmail being joined by its online Docs, Picassa, and Gtalk clients (which store all their data on Google's servers) as well as future plans to deliver Chrome OS as a web-client substitute to the conventional local operating system. That will largely replace the entire idea of local apps under the user's own control with online apps that the vendor can change, update, or drop at any time.

Palm's new WebOS in the Palm Pre similarly banks on the cloud to provide web-based apps that are updated and replaced by the network operator, not by the local user as is the case with the iPhone. Amazon's Kindle also demonstrated the potential for the network to take control of users' data after the company revoked certain books from users' devices, a policy it has since apologized over and paid to settle. Delegating all control to the cloud sounds great until there's a problem that the cloud vendor has no interest or capacity to resolve for the user. It then quickly becomes a frustrating nightmare.

Is Danger in Microsoft?

Some users commenting on the week-long outage and its resulting data loss crisis at Danger were quick to absolve Microsoft of any responsibility in the incident, suggesting that Microsoft only bought the company last year and that it did not originally design the service. While Danger has run its services for years prior to the acquisition and has previously experienced outages, it hasn't lost all of its users' data across the board before. The frustration and dashed hopes voiced by long term Danger partner T-Mobile in its apology to Sidekick users was clearly worded to highlight Microsoft's involvement in the incident.

Microsoft's takeover of Danger almost two years ago should have given the software giant the time to fortify and secure Danger's online operations. Instead, it appears the company actually removed support to cut costs. According to a source familiar with Danger before and after the Microsoft acquisition, T-Mobile's close partnership with the original Danger was leveraged and then betrayed by Microsoft when Steve Ballmer's company decided there would be more money involved in dropping its exclusive deal with T-Mobile to partner with Verizon on the side.

Microsoft's accountability in supporting its acquired Sidekick support obligations with T-Mobile was also shirked. The source stated that "apparently Microsoft has been lying to them [T-Mobile] this whole time about the amount of resources that they've been putting behind Sidekick development and support [at Danger] (in reality, it was cut down to a handful of people in Palo Alto managing some contractors in Romania, Ukraine, etc.). The reason for the deceit wasn't purely to cover up the development of Pink but also because Microsoft could get more money from T-Mobile for their support contract if T-Mobile thought that there were still hundreds of engineers working on the Sidekick platform. As we saw from their recent embarrassment with Sidekick data outages, that has clearly not been the case for some time."

That indicates that Danger's high profile cloud services failure didn't occur in spite of Microsoft's ownership, but rather because of it. This has led observers to question the company's commitment to its other cloud services, not just Windows Mobile MyPhone, but also the Azure Services Platform of cloud computing efforts that the company has had on the drawing board for years. Azure is designed to allow third parties to build applications that are dependent upon Microsoft's data centers.

In covering the Danger debacle at Microsoft, Ina Fried of CNET wrote, "while outages in the cloud computing world are common (one need only look at recent issues with Twitter or Gmail), data losses are another story. And this one stands as one of the more stunning ones in recent memory."

Fried added, "The Danger outage comes just a month before Microsoft is expected to launch its operating system in the cloud--Windows Azure. That announcement is expected at November's Professional Developer Conference. One of the characteristics of Azure is that programs written for it can be run only via Microsoft's data centers and not on a company's own servers. It should be pointed out that the Azure setup is entirely different from what Danger uses: the Sidekick uses an architecture Microsoft inherited rather than built (Microsoft bought Danger last year). Still, the failure would seem to be enough to give any CIO pause."

Daniel Eran Dilger is the author of "Snow Leopard Server (Developer Reference)," a new book from Wiley available now for pre-order at a special price from Amazon.

Comments

There has to be more to this story. What cloud service doesn't do at least nightly backups?

I doubt this is going to slow down cloud computing. Data loss can happen anywhere, hardware failure can happen anywhere. Regardless of what platform you use, you need a backup plan for your data and services and have proceedures in place for the worst case scenario.

When Apple lost about 1% of email data while moving from .mac to MobileMe back then, I considered that unacceptable and a major sign of incompetence in that area (cloud-based services).

Of course, MS dwarfs them here. This is not the usual Redmond photocopy, this is a serious enlargement. A 100% proof of incompetence, negligence and lack of organization. Any part-time admin with such a back-up strategy would be held personally liable.

Still, it underlines one major point: cloud based services without 100% and up-to-date local back-ups (client-side) are not ready for prime time anytime soon. If even the richest companies having full control about each and every software and device involved can't manage it, then nobody can.

The latest I've read is that this came from a failed SAN migration of some variety. Waiting for confirmation of that, if true it's almost inexcusable (even if it is inherited technology).

Still, when you outsource key functionality like running your database architecture, you can't be absolutely certain that proper precautions are in place the way you can when you control the process. If you can't control a copy of your data, you literally have no control of your business. It's that simple.

Deutsche Telekom, parent of T-Mobile, has already suffered a major problem in Germany this year. In January of 2009 the mobile phone service of T-Online was completely disabled for 20 hours due to software problems.

They later offered a whole (Sun)day of free texting for their German subscribers to make up for the inconvenience.

Don't trust your vendor, if financial account data was lost that's another story, but still, lost data can be inconvenient at best.

Losing everyone's contacts, calendar entries, notes, to-dos is inconvenient. That's probably the source of this thinking from some brainiac for not instituting a proper recoverability strategy for this data. Probably saved a few hundred grand by "swimming naked", at least until the tide went out...

Why on Earth anyone would trust these people with their critical data after this fiasco is beyond me. But I'll bet they'll be lining up for Azure in the next several months...

In IT the costs for redundancy and back up (as well as support staff, training, procedure development, test labs, etc.) has to be justified. To justify the risk mitigation, you have to put a value on the data or service being backed up or provided. What is the value of users' data? It's stated right on Danger's web site: Customer Loyalty. Loyalty means trust. (btw- When they say customer, I think they are referring to T-Mobile, not the Sidekick owners.)

M$ milked that trust for all it was worth by cutting costs and riding the wave of profits until disaster inevitably struck.

T-Mobile's response in fingering Microsoft is right on. Microsoft might not have designed the initial infrastructure, but they certainly gutted it and created a ticking time bomb. Even the statement that they were going to try to recover the data but thought it was 'probably' gone is pretty amazing. They aren't even sure if the data is gone or not but, *shrug*, assumes so. Wow, nice customer service.

The problem isn't with technology or that things like 'clouds' aren't reliable yet. It's pure economics. Your data will always be worth more to you than to the corporation. If they think they can get away with a few accidental data losses, but still keep a majority of the customers (who are locked in, in this case), then what's the advantage of them investing in redundant/backup infrastructure?

At some point I imagine that the FCC will step in to regulate how a service can market it's off-site storage service as 'reliable' or 'safe' when in reality it is not. Perhaps a big fine every time they lose users' data, similar to when somebody drops the F-word on TV or radio accidentally?

I would imagine there are a lot of unhappy SideKick users out there now. Maybe we can cheer them up by flashing our iPhones around.

I find it strange this desire to move to the cloud anyway as it represents some of the worst problems associated with the days of the mainframe. For me personally heavy reliance on the cloud would make my iPhone unusable, there are just to many places with no connection at all. If I could use my iPhone off line I'd have to get rid of it.

As to syncing I'm to the point now that I have a lot of confidence that both my Mac Book Pro and iPhone will be in sync via Mobile Me. Plus I have the local back ups. So I'm confident I won't loose everything. This is a very good thing as this is the first PDA / cell phone I've found worth using.

I'm a bit worried though that Apple still hasn't seen the wisdom of giving the user local control right on the iPhone. The iDisk app is a good idea but I'd feel much better if I could designate which files are critical and need to be stored on the iPhone also.

At some point I imagine that the FCC will step in to regulate how a service can market it's off-site storage service as 'reliable' or 'safe' when in reality it is not. Perhaps a big fine every time they lose users' data, similar to when somebody drops the F-word on TV or radio accidentally?

The FTC would certainly be interested too as there seems to be some pretty massive fraud involved here. If nothng else this is a situation where a massive class action suit might actually be justified. Of course the only one making out will be the lawyers but even if they get $25 per customer it will hurt MS a little bit. It might not help the user community much but I suspect that they will all become new iPhone owners.

Some users commenting on the week-long outage and its resulting data loss crisis at Danger were quick to absolve Microsoft of any responsibility in the incident, suggesting that Microsoft only bought the company last year and that it did not originally design the service.

Are they kidding?

When I get a new client, from day one I make sure they are backed up and that they have an offsite component to it.

In IT the costs for redundancy and back up (as well as support staff, training, procedure development, test labs, etc.) has to be justified. To justify the risk mitigation, you have to put a value on the data or service being backed up or provided. What is the value of users' data? It's stated right on Danger's web site: Customer Loyalty. Loyalty means trust. (btw- When they say customer, I think they are referring to T-Mobile, not the Sidekick owners.)

M$ milked that trust for all it was worth by cutting costs and riding the wave of profits until disaster inevitably struck.

T-Mobile's response in fingering Microsoft is right on. Microsoft might not have designed the initial infrastructure, but they certainly gutted it and created a ticking time bomb. Even the statement that they were going to try to recover the data but thought it was 'probably' gone is pretty amazing. They aren't even sure if the data is gone or not but, *shrug*, assumes so. Wow, nice customer service.

The problem isn't with technology or that things like 'clouds' aren't reliable yet. It's pure economics. Your data will always be worth more to you than to the corporation. If they think they can get away with a few accidental data losses, but still keep a majority of the customers (who are locked in, in this case), then what's the advantage of them investing in redundant/backup infrastructure?

At some point I imagine that the FCC will step in to regulate how a service can market it's off-site storage service as 'reliable' or 'safe' when in reality it is not. Perhaps a big fine every time they lose users' data, similar to when somebody drops the F-word on TV or radio accidentally?

So how valuable is Mozy or Carbonite type back up services? At just under 5 bucks a month, you think a customer's data on a MS server or anyone's outside sever can be backed up? Can it be backed up or is Carbonite and services like it, not designed that way? If they are, that would definitely absolve a corporation's "IT the costs for redundancy and back up (as well as support staff, training, procedure development, test labs, etc.) has to be justified" when another service offers protection and a million dollar insurance promise!