Zetta Blog

6 Data Outages And Lessons Learned From 2016

Data outages can have a major impact on businesses of all sizes and across all industries. This year we witnessed a number of major outages at large companies and the consequences of the downtime.

According to our State of Disaster Recovery survey for 2016, the number one cause of IT downtime is still power outages. Most of the outages on our list were also caused by power outages, so that reaffirms the trend. Hardware error came in at a close second at 52%. Our survey also revealed the cost and frequency of downtime events. According to our survey, more than half of companies report that they have experienced a downtime event that lasted more than 8 hours in the past five years. In the event of an outage, 67% estimate that their business would lose 20k for every day of downtime. Depending on the business, that could end it for good. But how were the major companies you heard about in the news in 2016 impacted by their data outages and downtime?

1. Twitter

Twitter’s year started off with some big headaches, to say the least. In January Twitter experienced a severe data outage. Hundreds of millions of users all over the world were unable to access the social networking service through any of their devices for several hours.

When Twitter users attempted to access the network, they saw an error message reading “network is over capacity” and “internal error. According to Twitter, a recent code change caused the technical problem. Throughout the day, users were unable to share updates in addition to being unable to access the website. Four of the five public APIs were brought down because of this code change, which is why so many users were impacted.

Though Twitter resolved this issue after a few hours, the price of their stock was actually significantly impacted by the incident. The day of the outage, stock prices dropped down to $16.43, down nearly 7% from its opening price. Even months after recovering from the outage, stock prices still haven’t recovered to where they were before the outage.

2. Amazon Web Services

On June 4th, Amazon Web Services experienced some significant outages in Sydney Australia, bringing down a number of major websites and possibly affecting customers who were unable to complete financial transactions at stores.

EC2 instances and EBs volumes in the region experienced a loss of power. This led to increased errors and latencies for the websites that use AWS. There were major storms hitting the area at this time, which some speculate may have caused the data outage, though Amazon denies storms to be the cause. The downtime ended up lasting several hours before it was resolved.

Quite a few websites were impacted by this outage. Websites included car sharing service Go Get, online streaming platforms like Stan, ticketing platforms like TryBooking and food ordering services like Menulog. Customers in stores were left stranded and unable to process their payments or withdraw money from ATMs, though a number of them later confirmed that their payments were not processed by AWS and they were not to blame. No matter the case, this downtime event really demonstrates how much we rely on data to be available to us in our daily lives and how critical it is that companies take precautions to minimize events like these. A Cloud Disaster Recovery solution can help minimize the effect of such an event.

4. Southwest Airlines

A router failed and triggered other crashes, slowing down the airline’s systems. This eventually led to other functions being overloaded and freezing. Though router failures are fairly common, this instance had severe consequences. Unfortunately, the backup system and DR deployments they had in place also failed, which added to the downtime.

Passengers were left stranded at airports across the country, since they were unable to check in for their flights or get their boarding passes at the kiosks. Customers ended up having to cancel their flights and miss bookings, which is estimated to be a $25 million loss for Southwest Airlines. Stranded travelers who needed hotel and meal accommodations, as well as employee overtime are estimated to total between $28 million and $57 million in losses. The outage could end up costing the company nearly 82$ million in revenue, and some deem it to be the worst outage in the airline’s history.

Southwest Airline’s stocks also saw a significant drop after the outage.

5. Delta Airlines

On August 8th, 2016 Delta Airlines passengers experienced some major inconveniences, to say the least. Customers were unable to get their boarding passes or get checked in to get on their flights. This was the beginning of a 3 day outage for the airline, and would eventually lead to 2,300 flight cancellations and $150 million in losses pre-tax income.

The massive outage was caused by a circuit breaker that needed to be reset in their Atlanta headquarters. This caused the transformer to shut down, so the power to their data center was shut down. The systems were moved to their backup power, but not all of the servers were connected to that source, causing the cascading issues. Some also speculate that the Delta outage was caused by a cyber attack.

Customers immediately took to Downdetector and began reporting their issues.

Delta Airlines’ $150 million loss is massive, but they also suffered other consequences. Their customers were outraged and their reputation was also seriously impacted. Of the 167 complaints filed with the U.S. Department of transportation, 92 of them dealt with this outage. Travelers will likely think twice before booking a flight with Delta, especially those who were stranded for hours as a result.

6. SSP Worldwide

SSP Worldwide, an insurance software house, experienced a 10 day outage in the UK in September. The outage had a major impact on their business and their customers.

The power outage occurred on August 26th was caused by a blackout that hit SSP’s Solihull Data Center. The outage fried the HPE Storage Area Network. The data center was doing daily backups from backup discs, re-coupling it with application data, and finally reconfiguring the systems for individual customers. A number of the disks were also damaged because of the power loss disruption, which added to the complexity and time it took for them to recover.

Customers and insurance brokers were unable to access SSP services during this period, which caused a number of issues. The downtime limited the amount of work the insurance brokers could do and ended up costing them business, including losing clients who were too impatient to wait for the problem to be resolved. The outage had serious ramifications for their customers as well. 40% of the insurance brokers use the services to remind their customers when it’s time to reinsure their vehicles. The outage meant that these customers weren’t getting notified their insurance was running out, and were potentially uninsured without even knowing it. This was a serious liability issue for the business and could’ve caused problems for their uninsured customers as well.

Data Vulnerabilities: Is Your Data Protected?

With everything that we’ve seen go wrong with this year’s data outages, how can you ensure that your data is protected in the event of a disaster? The first step is disaster recovery planning. Hoping for the best in a bad situation simply won’t cut it.

Make sure you have a documented DR plan in the event of a disaster. That means knowing what your RPO and RTO is. Your RPO is your recovery point objective and your RTO is your recovery time objective. You need to know how long you can be without your data and how much data you are okay with losing in the event of a disaster. Testing your DR plan is also a critical part of DR planning (link to DR planning ebook here), though most IT pros still neglect to test their plans more than once a year. 2 in 5 companies don’t have a DR plan at all.

You also need to send your backups offsite on a regular basis. This is because in the event that a disaster strikes your building, you don’t want the only copy of your data to be in that building. Cloud backup solutions ensure that the data is offsite at all time, since being in the cloud is automatically considered offsite. For the most stringent RTOs, disaster recovery as a service will allow you to failover in the cloud quickly following a disaster. That way you can continue running business operations until you recover you data from the cloud.

Less than 5-minute failover from anywhere.

Secure Backup Solution

According to our State of DR survey, one in three IT pros have been hit by a virus or malware attack. Unfortunately, cyber attacks will continue to rise. Businesses must keep up with the latest in data security to lessen their chances of being attacked. Seeking out a backup and disaster recovery solution that’s secure will become even more critical.

Make sure to ask your backup provider about their business, technical, and process controls. Limited access to data, data encryption in flight and at rest, and ongoing log and security reviews are just a few security measures to keep in mind. If you work in industries like healthcare or finance, there are certain government regulations like HIPAA and SOX that you need to abide by. Many of these regulations are set help protect client privacy and add extra layers of security, so seeking out a solution that’s able to meet those backup compliance rules is a must.

See how Zetta keeps your data secure.

2016 certainly taught us about the consequences of severe data outages. Loss of revenue is just the beginning of the issues companies face. Stock price drops, loss of customers and seriously damaged reputations can have major ramifications after as time goes on. That’s why getting serious about data protection is critical in helping to lessen the effects of data outages and the consequences of prolonged downtime.