Microsoft Reports Major BPOS Outages, SLAs Affected

Microsoft this week disclosed that its Business Productivity Online Services (BPOS) had three service outages that affected BPOS customers in North America in August and September.

Morgan Cole, a Microsoft employee, admitted in a blog post that BPOS had service outages on Aug. 23, as well as on Sept. 3 and Sept. 7. All of the outages were associated with a Microsoft network upgrade effort, which initially knocked out the service for two hours on Aug. 23. A fix led to additional problems in September, including problems with the "sign-in service and administrative portals," Cole explained.

On Sept. 7, Microsoft had a problem with BPOS that had "more widespread customer impact, although the duration was relatively short," Cole stated, without explaining the nature of the problem. Microsoft is currently monitoring this situation after isolating "suspect traffic," according to the blog post.

A comment in that blog post by "Jim Glynn" indicated that Microsoft has credited some of its customers affected by the Aug. 23 BPOS outage. However, Glynn noted that customers should contract their BPOS representative to request compensation afforded by Microsoft for not meeting its service level agreement (SLA).

Uptime is a prime consideration for organizations using software-as-a-service (SaaS) applications instead of the more traditional customer premises-installed solutions. Microsoft's "all-in" organizational move to the cloud, providing services to businesses and organizations, hangs on meeting its SLA agreements. However, SLAs don't change the fact that businesses using hosted applications will be dependent on external infrastructure that they do not control.

Compensation for not meeting the SLA may not be equivalent to the costs of lost business time, but it's the common practice for service providers, according to Robert Mahowald, research vice president for SaaS and cloud services at the IDC research and consulting firm.

"Web applications rely on access to the Internet, which of course adds another potential weak link in the chain of getting access to information and functionality," Mahowald explained in a phone interview. "But it's pretty much common practice for SaaS providers to guarantee 'three nines' of uptime…which is about 28 hours a year in which they will not be accessible. Most of that is supposed to be scheduled downtime. Essentially, it's pretty much common practice for providers to pay service credits in recompense for the lost opportunity and to not pay any monetary fine."

The two-hour outage on Aug. 23 appears to have violated Microsoft's SLA guarantee for BPOS applications. BPOS is expected to be available 99.9 percent of the time per month, or as Microsoft's FAQ specifies: "Microsoft provides a 99.9 percent uptime Service Level Agreement for Exchange Online, SharePoint Online, Office Live Meeting and Office Communications Online."

BPOS users are credited based on a calculation of the monthly uptime percentage, according to Microsoft's Exchange Online SLA document. If the service availability dips below 99.9 percent, then the service credit is 25 percent of the monthly service fees. If it dips below 99 percent, Microsoft pays out 50 percent of the monthly service fees. Lastly, Microsoft pays the full monthly service fee if the service availability dips below 95 percent.

Microsoft informs its BPOS customers and the public about BPOS uptime problems via a "Microsoft Online Service Notifications" RSS feed. According to that feed, Microsoft restored services on Sept. 7 for multiple applications, including Exchange Online, SharePoint Online, Office Live Meeting, Office Communications Online, plus a few others.

Microsoft had planned to conduct maintenance on some of its BPOS services on Sept. 11 in its North American datacenters. However, the company has now postponed its network upgrade plans, according to the RSS feed.

Mahowald wasn't aware of any disaster scenarios for SaaS providers, but the prospect is "bound to happen," especially for educational institutions and governments that may have outsourced important operations by relying on SaaS. In such cases, SLAs will become even more important.

"It's an incredibly important issue to understand. It's no longer about simply saying, on a functional basis, 'does your application do what mine does and what's the price'," Mahowald said. "I think understanding the SLA behind it and actually having some teeth in the SLA is going to become an even more important distinction than it is right now -- perhaps more important than price as you go up the chain with mission-critical applications."