You can all relax now. The near-unprecedented outage that seemingly affected all of Google's services for a brief time on Friday is over.
The event began at approximately 4:37pm Pacific Time and lasted between one and five minutes, according to the Google Apps Dashboard. All of the Google Apps services reported being back …

Re: Ah...

In one place I worked, our machines would have most of their problems on Thursdays. Different machines, different architectures, different models of storage devices... we couldn't figure it out. This was a raised floor, halon protected computer room with a combination lock on the door.

So, with nothing else to try, one Wednesday I prepared to spend the night in the computer room. Sure enough, about 2:00 AM the cleaning crew came in with a big buffer machine, preparing to run it over the raised tiles.

I chased them out and next day confronted the facilities manager about (a) giving the cleaning crew the combination to a secure room, and (b) letting them bang a floor buffing machine against our disk arrays.

He looked at me like a guy who'd seen his first kangaroo. He couldn't fathom why I wouldn't want the floors polished in the computer room. I finally gave up, got some tools, took the lock apart, and changed the combination.

As I write this, I now realize that I did not pass on the combination when I left the company. Oops.

Re: @ Captain DaFT - Ah...

Re: @ Captain DaFT - Ah...

Oh, Apple fanboi alert!

Compared to Apple? I presume? Absolutely.

Imagine an Apple search engine, where content is filtered beyond your control/knowledge and you can only experience the Internet as Apple thinks you should experience it. Total information freedom nightmare. No thanks!

You Apple sheeple can keep your "think different". Go buy another overpriced "ooh, shiny" iCrap tablet that Samsung will out perform in every possible way for a whole lot less.

@ AC 07:35 Re: Ah...

More likely... It took that long to install the new, improved, back-door high-volume pipe direct to the NSA Utah "data collection center". Can't hook that stuff up while the system is live, you know...

Re: Ah...

Not necessarily. I worked at a place with a Push big red button that was not protected, was right beside the exit, and more importantly, right beside some equipment that I occasionally had to lean over to work on. The second time I tripped the power off, my boss warned me that 1 more time and I would be fired. The third time it went off, I was at my desk, jumped up and screamed "NOT MY FAULT". The big red button was shortly thereafter covered by a flip up plastic case.

Graph seems wrong

Re: Graph seems wrong

Methinks more likely a failure of our heroic churnalism soviet. According to the attributed source: "Google.com was down for a few minutes between 23:52 and 23:57 BST on 16th August 2013." which fits perfectly with the lifted graph.

I suppose there's scope for some disparity as the fault propagated across Google's infrastructure but some reference to the obvious contradiction in the article is surely warranted.

In lieu of the Reg headstone icon which seems to have been removed for our protection -->

Obvious explanation

The artificial singularity that powers the Googleplex creates a non-negligible effect on spacetime around Mountain View. The graph shows the time that their servers perceive. Since the singularity slows down time, it took an extra 15 minutes after the event began in the outside world before Google's servers registered it. The stalwart team of boffins at Vulture Central merely corrected Google Coordinated Universal Time to regular Pacific Daylight Time.

Re: See, you keep telling them but they don't listen

Re: See, you keep telling them but they don't listen

The US Army and Air Force use ferrets for cable runs at Site R and Cheyenne Mountain. Some Airman at Cheyenne Mountain AFS came up with idea after watching his pet ferret drag a loose CAT5 cable through a cardboard tube while trying to think up a way to do cable runs easier than how they'd been doing them previously.

I believe FEMA uses them at Mount Weather too, I'm not 100% sure on that facility but it would make sense. Tearing out walls in bunkers under mountains isn't cheap or easy. And the Military as well as DHS tend to prefer cheap and easy, especially in places like Raven Rock and Mount Weather where by their nature have to be up and ready 99.9% of the time just in case.

So anyway NSA using hamsters may be closer to something "fo' reals" than you might think.

No hamster icon, but Paris is about as intelligent as a small rodent, and much less intelligent than the Musteladae (ferrets, weasels, etc).

Holy undergarments

40% of the world traffic? I suppose once you count all the services they have and their price point and general reliability... I mean I use their DNS, so if that was affected (and I think it was) then that would be a bit of a kick in the nuts for other site access too.

Re: Holy undergarments

@AC 5:42

171.70.168.183

171.69.2.133

128.107.241.185

64.102.255.44

Perhaps ironically under the circumstances I had to Google it too. I've settled on OpenDNS myself, not least because they were the only service I saw competently and promptly address that phishing/poisoning débâcle a few years ago. The redirection for unresolvable queries is a bit naff though. Still, gifthorses...

Re: Holy undergarments

OpenDNS is not really a good thing to use for a server that needs to know if a hostname is valid or not. OpenDNS will reply with a fake address that points to them for invalid hostnames. This is cool if you want some special notice web page that the hostname doesn't exist page etc... but for a mail server, not knowing the hostname is not valid is a waste of system resources... NXDOMAIN is the better response.

Google DNS is fast, though using resolver.qwest.net is faster at the moment.

Re: Is it really that bad?

Re: Is it really that bad?

I'm guessing that your tongue is firmly in your cheek there, but here goes for a po-faced dumb response. I just checked, and I have exactly 1001 (decimal) bookmarks. I could spend a *long* time without needing to search.

Who?

Reminds me of a cartoonish jigsaw puzzle that's an old fave of mine. It's called "Computers: The Inside Story" and featured a minicomputer (bear with me—the puzzle IINM dates back some 30 years or so). Most of the joke was all the funny things that went on "inside" the minicomputer, but up top was the computer's responses to an unstated question. It isn't long before you realize the query was, "Why did the chicken cross the road?"

Nothing worthwhile in this post.

I believe that is 4.4.4.4. I have seen this all over the place as a DNS server forward, or in individual workstation settings at sites where Active Directory refuses to work properly. ISTR that used to be owned by MCI/UUnet (or some other equally obscure provider,) and is now Level 3.

"The Reg contacted the folks in Mountain View to see if they can account for the outage, but a spokesperson only directed us to the aforementioned dashboard. We'll fill you in with any further information as it emerges."

It was…

Upgrade Complete

To help prevent inadvertent collection of personal data, as reported in earlier stories; the NSA and Google are proud to unveil the new Fast Updatable Collection Keystone Utilities (FUCKU) system which went online at 21:53 Friday, August 16th.

Fully integrating the new system required a brief restart of the Internet our systems. We do not anticipate further interruptions of service.

Re: Upgrade Complete

I agree. This length of time is probably about the time required for a skilled spook to install new hardware at Google.

If you have a choice between buying a product or service from the USA and somewhere else of comparable quality, choose somewhere else. Hitting the entire USA in the wallet is the only way to stop this crap.

So now we know Google can account for 40% of internet traffic

Re: So now we know Google can account for 40% of internet traffic

Yeah I was wondering the same thing. If your providing for a huge amount of traffic there is gonna still be a large amount that is semi redundant to account for load sharing and redundancy in the case of an outage. While google might own large amounts of its own fiber, there are large tracts of fiber that it still doesn't own, would would show across the public networks.

Would Love to know the amounts public and pirvate bandwith they have and saturate :D

Google -

Recovery

What I'm impressed by is that everything seems to have run perfectly once Google came back to life.

What do engineers/admins of these kind of huge systems like this think? I would have expected load balancers etc to have gone out of whack, after receiving normal traffic, zero traffic then 50% above normal, in the space of 5 minutes. That strikes me as the perfect recipe for a cascade failure we've heard so much about of late.

Re: @DAM (was: Who the hell cares?) @Jake

"Any luck on getting the publication ban on your work on the Manhattan Project lifted yet, Jake?"

I'm more interested in the story of how he stole fire from the gods to give to us. Is there anything this man hasn't done? More importantly, why does he insist on telling us about it all the time? Still, I suppose a 13 year old in rural Cornwall has to find something to do ...

It's alive!

Explanation

It will be interesting to see how Google explain this event.

It is difficult to think up reasons for the outage that don't put dents in Google's claims of being reliable enough to trust ones entire business to. After all, if you've trusted your entire business to Google's cloud (Docs, mail, everything) then when Google are down there's nothing you can do; you're not working. There's not even a phone number you can call.

At least if you have your own IT you can go and harry the IT guys.

Companies are very bad at risk management. It always seems that they refuse to consider highly unlikely scenarios that have devastating consequences. For instance how many outfits are there that have all their IT in a cloud and have an effective Plan B in their sleeve just in case? Companies like Google are highly unlikely to go off line completely for a long stretch, but if all your IT is Googlised and they do vanish for a few days, your business is guaranteed to be in deep trouble.

So what exactly would a good Plan B be? There's no easy way to start using another cloud because there is no way to do a bulk export of everything (docs, calendars, contacts, sheets and mail, etc) that you can bulk import into another cloud. In fact such a thing would be the very last thing that Google, Microsoft, etc. would want to give you. I know that you can get at the data piecemeal, but file by file and user by user exports and imports is no way to perform disaster recovery.

Synchronising a cloud with your own IT is more like it, but surely the whole point of a cloud is to avoid having your own IT. Such synchronisation is available only because the cloud providers offer it as a way to get going with a cloud; I don't expect that it will be something that will work reliably and well forever.

And if you're going to have your own IT then what exactly is the cloud for anyway? Backup?

To me and presumably anyone else that cares about coping with the ultimate What-If problems clouds just don't meet the requirements. However, with the likes of Microsoft, Apple and Google trying very hard to push their customers onto their respective clouds and a large be action of those customers being happy (or stupid) enough to go along with that, what choice will there be for those that want to do things on their own IT?

Clouds also bring big national risks. Say Google got to the position where 50% of American companies were wholly dependent on Google's cloud for their docs, sheets, contacts databases, etc. That would mean that 50% of the US economy is just one single hack attack away from difficulty and possibly disaster. Is that a healthy position for a national economy to be in? Isn't that a huge big juicy target for a belligerent foe, be they an individual or nation state? After all, Google's networks have been penetrated before (they blamed the Chinese as it happens); why not again?

Re: Explanation

"Companies are very bad at risk management. It always seems that they refuse to consider highly unlikely scenarios that have devastating consequences. "

I used to work for a large British company.

One of their Manchester offices was damaged by an IRA bomnb in the 90's.

The staff were relocated; the servers replaced; but whilst the backups had been completed diligently, and kept safe in the firesafe, no one was allowed access to the site to retrieve them for many weeks, by which time they were virtually useless....

Re: Explanation

During my PhD, back in the late '80s/early '90s so before anything net other than email and usenet my thesis was stored on 3.5" 'floppies' (1.4Mb eventually). I had three sets:

A daily working set (didn't always have my own computer with a hard drive)

A travelling backup set that was updated daily and lived in my backpack (in a plastic disc box)

A home set that came in once a week to be updated.

The lab postdoc told of guy back before computers were available for such tasks who gave his handwritten thesis manuscript to a typist to type up, as was common practice. She put it on the back of her moped and set off across town. When she got there only a few pages were left. This was my motivation for backing stuff up. As well as an incident during my honours year (we were the first year to use computers to produce our theses). I took a 400k disc out of a computer, put it in my lab coat pocket and demonstrated a physiology lab. When I went back it would not work. Fortunately I had a backup but I lost a morning's work.

Re: Explanation

Floppies bit me too, ended up having to get the bus home, copy the files onto another disk then bus back into down and just made the hand in!

After that I got into the habit of emailing my NTHell world account and hoping Eudora would pull it down before I busted my mailbox limit (or the dial up connection dropped) :(

For my final year I got into the habit of emailing my final year project to myself every time I was about to shut my laptop down - came in rather handy when I deleted a completed section and didn't notice for a week, and when Office decided it was going to corrupt the document because I'd had the audacity of editing in Office XP and Office 2003.

One copy on my laptop - a more often than daily copy in my Google Mail account* - and then Eudora pulling those to my desktop at home, and back to my laptop as I went along :)

Re: Explanation

Re: Explanation

Nothing in my life is more reliable than Google, certainly not my electric power provider or ISP. If either is out (and at least one, usually electricity, is out for at least a few minutes each month), it doesn't much matter if Google is up or down.

Re: What time was that then?

El Reg have a very clear, years-old policy that all articles are published based on the conventions of the country in which it was written. In this case, it's clearly stated it's the San Francisco office issuing this article, so PST, and US English.

It's similar for their Australian office.

They don't have the personnel to convert every single article to make it sound like it was written in London - especially not at 1am GMT on a Saturday morning!

Re: Should the cord be stretched across the room like that?

That's pretty impressive

Either the whole thing failed or they power cycled it, impressed that a worldwide distributed system with that much traffic came up again in a minute or so and dealt with the backlog seemingly without issue though.

I noticed

I thought my net went down, until I noticed IRC was still chugging along as normal. Then I thought it was virgin media DNS as by the time I entered googles DNS in my system instead, it appeared to work, so i blamed virgin media, my bad :)

On the 40%

> I also find it strange that 40% of world wide internet traffic would be affected by that.

There are plenty of people who use google as their address bar. Instead of going to facebook.com they just type facebook (or more likely 'f' and it gets autocompleted to facebook) and then click on the first result in the corresponding Google search. For them Google down = Internet down.

I am guilty of doing this with some sites myself. If you don't remember the URL the default behavior is to do a search for the site. From them on the first autocomplete result in the address bar will the the search for the site instead of the site URL so it's pretty much a self-perpetuating behavior. With search results being displayed as fast as they usually are there are no incentives to modify such behavior.

I'm sorry,

Two minutes.....

Sounds like one hell of a multiple site failover, assuming they've managed to completely automate it (that's the hard bit, BTW). It can be done (and maybe was done deliberately, as part of a drill). I hope this wasn't caused by a single point of failure; there simply shouldn't be one.

Re: Two minutes.....

If it was deliberate--and I can't see it as having been a cascading fail/failover/single-point failure--I'd expect that the purpose was the big diff that it made possible. After all, Google is all about data collection and analysis.

Re: Even simpler ..

Heck, that was just how long it took to enable the NSA's Global Google data tap! Try as they might, they still were unable to coordinate the swap of fiber cables from one plug to another all over the world in less time...

Is it me..

Or is the fact that (what ever the cause for it going down, which of course, it shouldn't have)), the whole of Google infrastucture (ie something dealing with 40%) of the worlds traffic) came back after only a few minutes downtime. I would have thought that was a pretty impressive feat.

"When Google goes dark, the Internet knows fear"?

Isn't this only the second time Google has gone down?

I mean ever.

But I can well believe the 40% of traffic - Youtube alone is probably most of that.

On top of that, a lot of applications poke Google.com to determine if they've got Internet access or not - because they are quite simply the world-wide server farm that's least likely to have gone offline.

Along with every other commentard, I would really, really like to know what happened - and how they fixed it so fast. Most of the other "cloudy" services don't appear to have even realised they're down in the time it took Google to bring it back up.

I was trying to use a couple sites that use Google's doubleclick ad network and the pages would not load. At all. I know Google rarely goes down but it seems incredibly stupid to me that Google's ad network holds its customers completely hostage and prevents them from loading at all if it can't be found.

more likely bad site design, I know I would never build anything that relied on outside connections to load...

And for that purpose part of my testing is usually kill the internet on the test server, then try the site, see how it works, ensure I have no external dependencies remaining... (sure SOME things I offload onto cloud storage, such as media etc, since their content delivery networks are better than a single server)

Re: I know what must have happened. And it's not like we weren't warned.

Or maybe . . . .

. . . a Streetview car got too close to a datacentre, and while slurping, erm, looking at if there are available wifi networks, accidentally starting a fatal loop. (Much akin to setting up 3 accounts on a VAX cluster to forward e-mails in a ring) until it fell over.

We were warned !.

Wot ? Google went offline ?

To be honest, I didn't notice. I'm one of those old fashioned persons that still stick to AltaVista. I have no G*-whatsoever account - no Gmail, no Youtube no Gdocs - no nothing. I'd rather be off- line then using Chrome. Still, my world hasn't come to an end yet. Go figure.