Additional backup capacity to be added for Gmail to ensure delivery during network failures

Many of you may have noticed outages or issues with several Google services yesterday, but none were hit as hard as Gmail. Google acknowledged yesterday on its app status dashboard that Gmail was experiencing complete outages or at least service delays for most of the day, and has taken to its official blog today to explain what happened.

According to its post, an extremely rare "dual network failure" knocked out separate redundant paths, taking down Gmail's capacity starting at about 6am PST yesterday. Although engineers were made aware of the outage right as it happened, it took most of the day to clear up the issue and mail didn't start flowing at a regular pace until about 4pm PST.

Google says that 71 percent of messages delivered during the period were unaffected, and across the other 29 percent that were hit by the outage the average delivery delay was a mere 2.6 seconds. Naturally some of us noticed delays much longer than this, and Google does say that roughly 1.5 percent of mail was delayed over two hours.

Just as we would expect, the Gmail team apologized profusely for the delays and downtime yesterday, saying that they want to ensure "that Gmail users get the experience they expect." The plan is to make Gmail delivery more resilient, even in network failure situations by adding extra backup capacity and reviewing internal policies for dealing with a rare failure such as this.

Just in case you were worried, this small outage didn't drop Gmail below its beloved 99.9 percent uptime.

Reader comments

Actually, if you are on a paid-for Google Apps for Business account, then this is definitely not 99.9% uptime. If you pay Google for their services, then they are bound to their SLA. This SLA measures percentage uptime as number of minutes the service was down for at least a 5% "user error rate" divided by the number of minutes in the month. Google "Google SLA" (I can't post links here) to see where I'm coming from.

168 hours in week * 4 weeks in a month * 60 minutes/hour = 40,320 minutes.
If they were down for about 10 hours, then that's 600 minutes.
600/40,320 = ~0.0149, or 1.5% downtime (their SLA website only goes to the tenths place). This falls into the category listed as "< 99.0% - >= 95.0%."

TL;DR? If you're a paying customer who was affected by this outage, you could be entitled to 7 days' credit on your bill.

Here's something that driving me crazy..... I get this notification on my phone saying... Unable to send message. Showing me the title of an old message that failed to send. So I stopped the sending process. And I still got notifications that it could not send. So I deleted the email. I'm still getting this notification that takes me to Gmail to the Sent folder. And there is no messages in there. How can I stop this notification comming to my phone everyday? Please reply

I'm having problems with google play music unlimited account... Every time I try to download some music pining it, some moments after download has started it says the at the moment is impossible to download music... Someone have my same problem?