Network failures: Why they happen and how to prevent them

The corporate WAN is at the beating heart of the enterprise. It’s a vital link in your ICT environment that is essential to keep up and running. Unfortunately, as systems become more complex, equipment gets older and IT teams more stretched, outages are becoming more common. So what’s causing them, and how can IT managers mitigate risk more effectively?

Major costs

The direct costs from network outages include lost employee productivity and possible regulatory fines to costs associated with any resulting IT investigation. There’s also a direct revenue impact from the potentially massive hit to the organisation’s reputation, including customer attrition and share price.

The recent ICT incident at British Airways illustrates the potentially catastrophic repercussions of a simple network outage. A suspected power surge at a UK datacentre damaged servers and had a disastrous knock-on effect, taking out check-in and other operational facilities. It left planes grounded, tens of thousands stranded for days and may have cost the airline as much as £80 million in lost revenue, passenger compensation and other costs.

Every incident is different, but IHS estimates ICT downtime costs North American organisations $1 million a year for an average mid-size company, and over $60 million for a large organisation. Separate data from IDC claims unplanned app downtime – potentially resulting from network outages – costs Fortune 1000 firms up to $2.5 billion every year. Meanwhile, Blue Wireless founder Ivan Landen estimates network downtime costs can range “from $500/hr for when a few people are sitting idle, to $100,000/hr when it impacts business logistics, customer transactions, etc.”

Top causes of network outage

So what are the main reasons behind a typical network outage? They can be broadly grouped into a few basic categories:

1. Human error: Rumours are that this had a part to play in the BA outage. Networks are complicated systems where too many manual processes can introduce errors in configuration and the like.

2. Power failure: Reportedly the main culprit behind BA’s woes, this can occur as a result of environmental issues like lightning strikes as well as problems with fans, capacitors, control ICs and power components.

3. Security: Failure to patch software and firmware properly can lead to network outages, while DDoS attacks are on the rise, designed to overload pipes and take entire organisations offline.

4. Hardware: Probably the main cause of network issues, and will only get worse as kit gets older.

5. Link failure: Fixed infrastructure is exposed to physical damage, which can lead to outages and congestion.

How to mitigate network risk

Now that we understand the main causes of network downtime, how do IT managers mitigate risk to keep vital systems up and running? Key investments should include:

In the end, the right strategy will very much depend on your organisation, budget and resources. But given the increasing reputational and financial impact of outages, IT leaders must address any issues upfront before they have a chance to escalate.