Problems In The Cloud: Trouble With Amazon Servers

Technology progresses through the years. It gets more and more advanced that anything that can’t keep up disappears into oblivion. Many of the processes right now that make the world go round also rely on technology. So, it is not surprising that chaos follows once problems arise with these complex technologies.

This premise was just proven (yet again) by a glitch that affected Amazon servers last month (Amazon may just want to consider our Dell Services: http://www.harddriverecovery.org/raidcenter/dell_poweredge_data_recovery.html), which consequently affected the business of many popular websites and even that of ordinary people using it. It only goes to show that technology – in a way – is just like us, imperfect and capable of making costly mistakes.

Amazon’s S3 cloud service experienced an outage of several hours on Tuesday that caused problems for many websites and mobile apps that rely on it, including Medium, Business Insider, Slack, Quora and Giphy.

The company said earlier on Tuesday that it was experiencing “high error rates” on the platform affecting a large part of the east coast of the US. Then on Tuesday afternoon, Amazon posted on its service health dashboard that the issue had been resolved.

So, what happened to the company’s cloud service that resulted in this (costly) problem?

The Amazon Simple Storage Solution (S3) is used by tens of thousands of web services for hosting and backing up data, including the Guardian, which was heavily affected.

The problem had also affected some internet-connected devices, such as as smartphone-controlled light switches.

The outage even affected a site called “is it down right now?” which monitors when other sites are down.

In as much as we think highly of technology, it is still vulnerable to human error.

Amazon.com Inc. said a human error at its cloud business caused sweeping outages across the internet for several hours earlier this week.

Amazon said efforts to fix a billing system bug caused prolonged disruptions Tuesday. An Amazon Web Services employee working on the issue accidentally switched off more computer servers than intended at 9:37 a.m. Seattle time, resulting in errors that cascaded through the company’s S3 service, Amazon said in a statement Thursday. S3 is used to house data, manage apps and software downloads by nearly 150,000 sites, including ESPN.com and aol.com, according to SimilarTech.com.

A major failure from what appears to be a minor maintenance procedure highlights that AWS, and the cloud computing industry in general, still have some maturing to do, said Ed Anderson, an analyst at Gartner Inc.

“The fact that an incorrect keyboard entry could bring down an entire region shows they have some operational issues,” Anderson said. “Even though they are the world’s biggest cloud provider, they still have some work to do to refine their processes.”

Amazon said it is “making several changes as a result of this operational event.”

But then again, the damage has been done but it helps a lot that their server functions have been recovered and restored. The company issued a statement saying they are looking into the matter and will be making changes in the processes they follow so future mistakes like this one can be avoided at all cost.

An enormous number of sites, including Airbnb, Business Insider, Expedia, Medium, Netflix, Quora, Slack, Trello, and the Securities and Exchange Commission experienced issues related to the outage, VentureBeat reported at the time of the outage.

“S3 has experienced massive growth over the last several years and the process of restarting these services and running the necessary safety checks to validate the integrity of the metadata took longer than expected,” Amazon said.

Amazon already issued a public apology to those who have been affected by this major glitch but it also raise awareness that machines aren’t that reliable after all. In spite of all the technological advancements modern life has afforded us, human error persists to be the most reported reason for the majority of tech malfunctions.

It is high time to evaluate the life we have lived so far and determine if we need all this technology in our lives. Soon enough, science will be able to bring to life artificial intelligence and more mind-blowing discoveries that will set the pace on how we should all live our lives whether we like it or not.