Office 365 outages due to "routine maintenance"

I wonder whether the "routine maintenance" issue that affected Office 365 today, including Exchange Online, was a symptom of some of the preparatory work that Microsoft needs to do for the forthcoming migration to the Wave 15 set of products.

You can't expect to make an omelette without breaking some eggs and you can't expect to upgrade a massive multi-tenant environment without some disruption. At least, no one has a recipe to auto-magically transform the Office 365 datacenters so that they're running Exchange 2013, SharePoint 2013, and Lync 2013. It all has to be done through thorough preparation and execution. And in the case of Exchange, by moving every single mailbox to a new Exchange 2013 server.

Today's outage didn't just affect Exchange as the SharePoint service was also down. I have not seen any reports saying that Lync service was as disrupted.

(Click image for larger view)

Some reports state that the problem originated with Windows Azure Active Directory (WAAD), the authentication mechanism used for Office 365. Given the close working relationship that exists between Exchange and Active Directory on-premises or in the cloud, it wouldn't come as a big surprise if service suffered following an Active Directory problem.

In any case, we await with interest the details of the Microsoft post-mortem for the incident. Maybe it will reveal details of their migration plan... or then again, maybe not...

Discuss this Blog Entry 1

The fact that Microsoft cannot preform "routine maintenance" without causing a service disruption is more than embarrassing ... it exposes a fundamental flaw in their ability to succeed. Let's not forget that the November outage was due to a lack of available resources and the architecture's inability to scale.
Other services -- Google Apps, Rackspace, and others -- see quite capable of routine maintenance and major updates without disrupting the user base.