Overview

The BT core network successfully withstood the full force of Hurricane Sandy to keep customers connected

When Hurricane Sandy hit New York in October 2012, Phil Dobson, head of major incident response at BT Global Services, led the BT team responsible for mitigating the storm’s effects. Assisting, unseen, were design experts who’d used advanced modelling techniques to create the ultimate resilient network, ready for precisely such an event.

The BT core network stood firm: customers experienced outages, naturally, but these were mostly caused by power failures beyond its reach. BT and its service provider partners were acclaimed in newspaper and new media messages, while Phil and his team got down to reviewing the hurricane’s legacy and planning for the next onslaught, wherever on the planet that might occur.

Technical design precautions are vital, of course, and those built into our global network are second-to-none, but in such potentially disastrous circumstances it’s our people that make the difference. At the end of the day, they’re the ones who bring the BT service promise to life.”- Rogier Bronsgeest, President Global Customer Service BT Global Services

Designed-in resilience

Today, networks are the platform on which business is conducted. For most companies any network downtime is unwelcome. For many it’s unthinkable. That’s why resilience is a primary consideration when selecting a service provider.

In October 2012, Hurricane Sandy threatened to test that reputation. The tenth hurricane of the 2012 season and classified as a Category 2 storm, Sandy was the most impactful since Katrina in 2005. Developing in the western Caribbean Sea, Hurricane Sandy quickly intensified and first made landfall near Kingston, Jamaica. It then hit Cuba, the Bahamas and, finally, the north-east US seaboard.

Contingency planning

In situations such as this BT calls on its Threat Assessment and Response Group (TARG), a virtual team of key personnel drawn from different lines of business within the company. “We knew the storm was coming,” says Phil Dobson, head of major incident response at BT Global Services Customer Operations, “so we had time to assess likely impacts and draw up comprehensive contingency plans.”

BT has a strict redundancy policy for its core network, with dual bearer capabilities and dual management domains. Terrestrial and sub-sea cables are chosen in tandem for physical separation, while major cities have multiple network nodes each connected to two other nodes. This provides enormous diversity and flexibility. Continuous monitoring means a link failure will be detected within a maximum of two seconds, triggering automatic rerouting.

The TARG team used meteorological data to identify the elements of the BT network most likely to be at risk. Phil Dobson continues: “For example, we looked at storm surge predictions and the height above sea level of key locations, such as our transatlantic cable landing stations and main network points of presence. From this we could assess the level of risk to ensure there was sufficient capacity in the network to reroute traffic if necessary.”

With power cuts an almost certain occurrence, the TARG team also commissioned a review of back-up power arrangements at core network and access sites, making sure all standby generators were fully functional and fuel supplies sufficient to cope with days of disruption to commercial electricity supplies. It also created a comprehensive communications plan covering both customers and key service suppliers, such as the network operators responsible for local access routes.

Weathering the storm

Hurricane Sandy came ashore near Atlantic City, New Jersey, in the early hours of 29th October 2012. The storm surge hit New York soon after: flooding streets, tunnels, and subway lines, and cutting power to thousands of homes and businesses in and around the city. Damage was estimated at over US$63 billion.

Despite the devastation the BT IP Connect network core remained fully functional throughout. However, around 600 services were impacted in some way and around one hundred BT customers experienced either some loss of service or a reduction in resilience. These incidents were mainly due to power failures affecting customers’ equipment or access network providers’ local exchanges.

The BT network operations team used a proactive war-room model to manage potential service failures, remotely diagnosing issues and directing appropriate remedial action. For issues beyond its direct control, the BT team worked with local service providers. It also liaised directly with customers to assess the business impact of each local service failure.

Phil Dobson confirms: “Armed with this information we were better placed to help our service provider partners focus on priority issues and, if necessary, leverage our influence via predetermined escalation processes.” Meanwhile, BT Service Managers kept customers fully informed of expectations for full service restoration via email and conference calls.

Grateful customers

Thanks to the often heroic efforts of BT and its service provider partners, most services were restored within a maximum of five days. Many stories of BT people going above and beyond the call of duty have since emerged. The incident was officially closed by BT on 9th November 2012.

“Technical design precautions are vital, of course, and those built into our global network are second-to-none,” says Rogier Bronsgeest, president of Global Customer Service at BT Global Services, “but in such potentially disastrous circumstances it’s our people that make the difference. At the end of the day, they’re the ones who bring the BT service promise to life.”

Customers, too, were quick to recognise those efforts. The New York Stock Exchange, which remained closed for an unprecedented two days following the storm, thanked BT as well as selected other service providers in a two-page advertisement in the New York Times. And, while the storm may have passed, work for BT continues. Phil Dobson concludes: “In our post-incident reviews we look at what worked particularly well and where we could have done better. It’s part of our continuous learning process, so that we’re able to respond even better the next time.”