British Airways Outage: Is hypersensitivity about cyber security making us ignore basic reliability of IT systems?

The British Airways outage is not a one-off case. The cases of SouthWest and Delta are still fresh in memory. The common factor in all this: no external attack was involved

On Saturday, British Airways (BA) cancelled all its flights from Heathrow and Gatwick airports in London because of an IT failure. This led to a severe disruption to its global operations, Heathrow being its hub airport. Gatwick too operates many flights of BA.

Following a major IT system failure this morning, we've cancelled all flights to and from Heathrow and Gatwick for the rest of today. 1/2

Though BA announced that the problem would impact its operations for Saturday, many flights were cancelled even on Sunday.

“Many of our IT systems are back up today. All my British Airways colleagues on the ground and in the air are pulling out all the stops to get our operation back up to normal as quickly as we possibly can, we’re not there yet,” said CEO Alex Cruz in a video posted on YouTube and Twitter.

While the exact cost of this disruption is yet to be estimated, different analysts put the figure to be anywhere between GBP 100 million to GBP 150 million. That, however, is the combined cost of direct revenue loss, compensation to passengers and overtime charges to employees, if any alone. BA is also trying to deliver the luggage of customers by courier. That will add to the cost. Also, flights and crew are stranded at wrong locations. Aviation experts estimate it will take at least two weeks to normalize operations, which will make passengers cautious to book on BA. That indirect revenue loss is difficult to estimate but will add to the cost.

And we are not even talking of reputational cost.

In short, it is a nightmarish situation. Except for the Icelandic volcano smoke shutting down all flying in northern Europe in 2010, nothing of this magnitude has happened anytime in ear past.

What caused the glitch? Was it a DDoS attack or something similar? BA CEO Alex Cruz, clarified that it was not due to a cyberattack but due to a ‘power supply issue’.

While BA did indicate that it was suffering from a ‘major IT systems failure’ without describing what it was, we are not yet sure whether the airlines is calling it a major failure because of its complexity or just based on its impact.

In last August, Delta Airlines in the US had gone through a similar outage. The airlines had cancelled more than 2100 flights. The loss due to the problem was estimated to be USD 150 million. Later, the glitch was attributed to a small fire in one of its datacenters, which was quickly extinguished too. Yet, it did wreaked havoc on the passengers, throwing the airlines operations to chaos for multiple days.

Just a few weeks before the Delta outage (in July 2016) SouthWest Airlines had gone through similar problems. It had to cancel 2300 flights due to an IT failure. The loss was estimated to be between USD 54 million and USD 82 million, including revenue loss and added costs. It was a later traced to one router failure.

And this is not restricted to airlines. HSBC, Royal Bank of Scotland, one of the public sector banks in India—they have all gone through similar situations, where the glitches have been found to be quite simple.

The glitch does not have to be major for the impact to be major.

What is common to all this cases? Huge disruptions in business but with no external attack involved—not even DDoS kind of attack. They have all been caused by internal IT glitches.

In short, they are failures; they are not targeted crimes, with external actors involved.

Contrast this with the impact of global ransomware attack earlier this month. Described as “The Biggest Attack in History”, this hogged media headlines for days even as WannaCry became an everyday term. It was clearly an attack by external actors involved. Yet, we did not hear any story of any major business being impacted in a significant manner anywhere in the world. All that it has fetched for the attackers is just about close to USD 110,000, till now, going by the Twitter bot @actual_ransom that is tracking the payments to the three accounts associated with the attack.

Are we barking up the wrong trees?

Wrong priorities?

Go to an IT conference or talk to an IT leader about top challenges before her. You will rarely hear anyone talking about reliability of IT systems. If you persist, looks would tell you it is such a ‘90s issue’. In short, the reliability of today’s IT systems are taken for granted.

IoT security vulnerability, ransomware, attacks on critical infrastructure are issues that people love to deliberate on while assuming that all is well when it comes to upkeep of their systems.

Unfortunately, the cases of BA, Delta and SouthWest have shown us that they are not.

And these are large global corporations, not obscure small companies running small businesses.

Yet, small glitches bring them to their knees. Yet, there is little concern around this issue.

Whether it is IT or social issues; ideology or patriotism—it is easier to make people act by inducing fear of an external enemy, very often imaginary. But while it may be a good strategy in politics, businesses do not run on jingoism. The loss of BA, SouthWest and Delta are all real—even if you ignore the reputational loss altogether.

Well, the dangers of cyberattacks that we keep hearing may well be true. But so are the probabilities of system failures. You are only as reliable as your weakest link.

Yet, there seems to be a false sense of confidence about dependability and reliability of IT systems. Vendors, who often build and influence the discourse and priorities for CIOs, too are silent on it, as there is not much of business opportunities there.

It is not the Indian CIOs, or for that matters, CIOs alone. In World Economic Forum’s annual Global Risk Report 2017 (GRR), the rating of ‘critical information infrastructure breakdown’ now features in quadrant III, denoting it a low-impact, low-likelihood risk even as ‘cyberattacks’ features in quadrant I, denoting it is a high-impact, high-likelihood risk. Even ‘data fraud/theft’ is perceived to be a much higher likelihood risk as compared to ‘critical information infrastructure breakdown’. In fact, in the last three years, the perceived risk associated with ‘critical information infrastructure breakdown’ has shown a continuous slide.

The WEF GRR report just reiterates what we all know by anecdotal evidence—that the stakeholders now do not see information infrastructure as a possibility and high impact risk.

The BA, Delta and SouthWest cases may have been wake-up calls—to remind them that the confidence may be a little misplaced. Even if IT infrastructure maintenance and reliability receives 20% of the attention given to the possibility of organized cyberattacks, it would result in significantly bringing down the risk of failure of infastructure.

Else, naysayers will have a field day blaming automation in general and new decisions in particular, just as some employee unions of BA have blamed the IT glitch of Saturday on BA’s decision to outsource IT jobs to TCS in India last year.