Why Twitter should open up about its infrastructure

Whatever investments Twitter is making to improve the reliability of its system aren’t working, or at least not as well as they should be. The world’s favorite micro-blogging site blamed Thursday morning’s approximately two-hour outage on problems within its data centers — specifically the parallel failure of its running system and its backup system — and it’s the second time in less than two month’s Twitter’s infrastructure has brought the site down. Maybe it’s time for Twitter to talk openly about what its doing in there.

The cause of today’s outage came from within our data centers. Data centers are designed to be redundant: when one system fails (as everything does at one time or another), a parallel system takes over. What was noteworthy about today’s outage was the coincidental failure of two parallel systems at nearly the same time.

If Twitter wants to remain opaque about its practices, that’s fine — but it shouldn’t expect any slack from upset users or investors. By contrast, we have a pretty good idea where Google’s data centers are and what’s going on inside them, and we know nearly everything about Facebook’s operations. When Amazon Web Services has an outage, it might take days, but provides a detailed post-mortem report explaining what went wrong.

Even if Twitter’s infrastructure team is filled with very smart engineers, there’s certainly benefit to be derived from public discussion about what it might be doing right and wrong. Clearly, something isn’t right; the site is down too often considering how much smaller it is than the aforementioned services. While it’s not disruptive enough to anyone’s business to warrant an AWS-style explanation, we gotta get something better than blaming two hours of downtime on an “infrastructural double-whammy.”