Microsoft botched all aspects of Exchange outage response

Microsoft blew it Tuesday when its Exchange Online hosted email service went dark for much of the business day, an expert said. But he wasn't referring to Exchange's technical failure.

"Three hundred and sixty-four days of excellent service can be undone on the 365th if a company doesn't communicate effectively," said Gene Grabowski, senior strategist at Levick, a Washington, D.C. firm that specializes in crisis communications.

Grabowski was talking about Tuesday's outage of Exchange Online, the hosted service that left hundreds, if not thousands of companies without email. Exchange Online is an off-premises service bundled with most Office 365 business plans, and also offered separately, and has been a key component in Microsoft's attempt to convince enterprises to adopt the "rent-not-own" software subscription service and shift more infrastructure to Microsoft-run data centers.

Grabowski blasted Microsoft, not for the outage -- those inevitably happen, he said -- but for its lack of communication to customers during the hours their email was unavailable, and even after the problem had been resolved.

Countless Twitter messages Tuesday took the Redmond, Wash. giant to task on the same subject, as did users on a fast-growing discussion thread on Microsoft's support forum, where what little the company shared did appear.

"Down since 8:30am central with no ETA to return to service," Josh Widuptweeted Tuesday five hours after the time he cited.

"[Microsoft] needs to provide better communication about cause, ETA for resolution, and steps to avoid recurrence of similar issues in the future," said someone identified as ISTE_IT on the support forum. "Hearing nothing is very frustrating and does cause customers to be justifiably angry."

"Most utilities have learned what Microsoft still has to learn," said Grabowski. "And Microsoft is the equivalent of a utility. They're vital for business and personal use, and have become for all intents and purposes a utility."

Learn what utilities have learned

Grabowski has advised electric utilities on outage response, and other kinds of organizations and businesses on urgent communications practices, and said Microsoft failed at all three critical components of such messaging when Exchange Online went dark.

"First, you have to get out there with the narrative of what happened and what you're doing about it," said Grabowski. "You have to do that within the first hour, you can't let your customers wait. In your absence, customers will create the narrative, and it will almost always be negative."

Microsoft dropped the ball there: The company didn't publicly message about the outage until 11:07 a.m. PT, five hours after email vanished, and then only in a single tweet to the Office 365 account. Even the tweet didn't describe the problem as the outage it was, at least from customers' perspectives, but called it "email delays."

In the interval, as Grabowski warned, users vented their frustrations on Twitter and the support forum thread, with nearly all of their comments negative.

"Second, you must tell customers that you're working on a remedy, and in real time be informing customers as you do," said Grabowski. "If you don't know the cause, tell them that, but share what the situation is."

Power companies and cable TV providers, he pointed out, staff telephone hotlines to take incoming customer calls during an outage. "But you can't just take calls, you must have a way to communicate outwardly through Twitter, email, Facebook, Instagram and other means," Grabowski said. "If possible, you should show and demonstrate visually what you're doing. You need to get out there and communicate."

A hidden dashboard

Microsoft did nothing of the sort, but that was by design. The single source of information -- and that was unreliable for many customers -- was the "Service Health Dashboard," a Web portal designed to show what Microsoft services, if any, are offline or degraded. But the SHD was available only to enterprise administrators: Individual users cannot see the dashboard.

"Microsoft does that because that's what their customers have told them they want," said James Staten, an analyst with Forrester Research. "IT has told Microsoft, 'I don't want you circumventing me, I don't want you talking to my users.' But the more Microsoft listens to the customer, the more it hurts them. They have to decide whether to do what the [enterprise] customer wants or to do the right thing."

"That's absurd," Grabowski said of the admin-only policy for the dashboard. In the same breath, he knocked the idea of the dashboard. "You shouldn't be sending anyone anywhere," he argued. "They're reaching out to you because they want questions answered, and those answers should be on the platform where customers pose questions."

Other firms, including Google and Apple make service status dashboards public. Admittedly, those companies are more consumer oriented than Microsoft, much more so when it comes to Exchange Online, but Salesforce, which is as business-leaning as Microsoft, offers a publicly-accessible dashboard where users can go, if necessary, to learn some scant details about problems or outright outages.

Experts needed

The third important element in an outage response, Grabowski said, was an expert, someone from the provider -- the power utility, or in this case, Microsoft -- who would reach out to the media, answer customers' questions online and generally be the "face" of the firm during an emergency.

"Where was their expert?" Grabowski asked. "They needed to have someone to talk to reporters. When the public is critical, an expert is necessary to set them straight in real time. They needed an expert and needed to advertise the fact that they had one."

Microsoft had no such representative; the closest the company came were the support technicians who added messages to the discussion thread. But although that thread was extensive, it was completely unknown to most users whose email suddenly quit working.

"So much of Microsoft's behavior was for the company's convenience, not the customers'," Grabowski said. "There are much better ways to answer customers in a crisis than Microsoft showed. It's their responsibility to guard against arrogance, and practice openness and transparency. Now [digital] service providers are suffering from the same kind of criticisms that they have chided brick-and-mortar companies about."

Although Grabowski and others said it was important for Microsoft to also manage the back end of the outage -- explain what went wrong and what the company would do or has done to insure the same problem doesn't crop up again -- Microsoft has said nothing of the Tuesday downtime as of mid-day Thursday. Outlets it has used in the past for such post-mortem discussions, like its service-specific blogs, were notably empty of any explanations.

That, too, was a failure. "I don't think we'll ever know what went wrong," said an IT administrator at a firm that was without email Tuesday because of the outage. That person asked not to be identified because they were not authorized to speak to the media. "Microsoft's response was terrible."

Copyright 2016 IDG Communications. ABN 14 001 592 650. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of IDG Communications is prohibited.