2018 Website Outages: Key Lessons from Popular Website Downtime

Now that companies depend on the cloud for access to key services and business operations, downtime has a larger impact on productivity.

Uptime is just as critical to small businesses as it is to major ecommerce retailers on Black Friday. Even the public relies on various services like Alexa and email to be available throughout the day.

Looking at major outages over the past year provides insight into how companies prepare to handle these events.

Reviewing the major outages of 2018, it’s apparent that the most common causes of problems are not sophisticated DDoS attacks or malware. Problems ran the gamut from misconfigurations, upgrade issues, power failures and cut cords.

This isn’t to say that DDoS attacks didn’t happen. As usual, the gaming industry was plagued by attacks on its major networks including XBox and PlayStation. Most of the incidents were the work of hackers or disgruntled players looking to get ahead.

Downtime was a major concern at this year’s US midterm elections, and ProPublica partnered with Uptime to monitor access issues for voters. The ProPublica ElectionLand project also checked in on non-technical issues to ensure the vote was fair.

Let’s take a look at some of the year’s largest outages.

Problems in the Google Cloud

Back on February 15, both businesses and online game players were affected by a problem with Google’s Cloud Datastore. The problems affected many high-profile Google Cloud customers, as well as users of Snapchat and Pokemon Go.

Later in the year, an unrelated problem grounded the cloud again. On July 17, a load balancing problem took down major services like Spotify and Snapchat, as well as websites that rely on Google services.

Storms Cause Major Downtime for Multiple Providers

On March 2, a Nor’easter interrupted network connectivity at a major data center located in the path of the storm.

Power outages triggered by the storms caused problems at the Equinix data center in Ashburn, VA. This outage was linked to an Amazon Web Services (AWS) outage that occurred on the same day.

The AWS outage interrupted service to tools like Slack, Atlassian and Twilio. Alexa was reportedly silent for many as well.

On September 4, Microsoft Azure servers were damaged by a lightning storm in a Texas data center. When the lightning caused the center to revert to backup power, cooling systems didn’t come on in part of the center. Even though equipment sustained damages, all data remained intact.

Amazon Shoppers Couldn’t Access Deals on Prime Day

Next to Black Friday, Amazon Prime Day is one of the largest shopping days of the year. Unfortunately, many shoppers couldn’t get in on July 16 to access deals. Despite early connectivity problems, Amazon Prime Day broke sales records as expected.

A Rough Month for Facebook

November proved to be especially troubling for Facebook. The company experienced downtime with all apps, beginning with Instagram on November 8. Then, it suffered two Facebook outages, alongside problems with Instagram, WhatsApp, and Messenger as the month progressed.

Wrapping It Up

These problems are just a small sampling of major downtime events throughout the year, but they illustrate an important point: most downtime is outside our control. These incidents ran the gamut, from weather to configuration issues and other anomalies difficult to plan for. Hacking, espionage and nefarious plots to break the internet were not the issue.

Though bad actors in the gaming industry are a continuing problem, DDoS and other attacks are still a minor contributor to downtime. Of course, certain industries are more susceptible to attack. Healthcare IT teams have their work cut out for them, as 34% of ransomware attacks in 2017 struck this sector.

By having a contingency plan, and practicing transparency with users, companies were able to minimize the damage.

Good preparation is key to handling problems as they arise. Continuously monitor your domain for issues with a monitoring tool like Uptime.com. You’ll receive alerts as problems happen, giving your team the ability to react quickly.

Other Outage Coverage

For more information on major outages, check out our monthly outage reports and coverage on major events when they happen:

Don't forget to share this post!

Sharon McElwee is Uptime.com's content manager. Her focus is helping the team create great content for Uptime.com users. Sharon lives in Central Virginia, USA and enjoys scrapbooking, building websites and hiking.