Posted
by
timothy
on Tuesday September 01, 2009 @04:22PM
from the works-here-never-mind dept.

JacobSteelsmith was one of many readers to note an ongoing problem with Gmail: "As I type this, GMail is experiencing a major outage. The application status page says there is a problem with GMail affecting a majority of its users. It states a resolution is expected within the next 1.2 hours (no, not a typo on my part). However, email can still be accessed via POP or IMAP, but not, it appears, through an Android device such as the G1." It's also affecting corporate users: Reader David Lechnyr writes "We run a hosted Google Apps system and have been receiving 502 Server Error responses for the past hour. The unusual thing about this is that our Google phone support rep (which paid accounts get) indicated that this outage is also affecting Google employees as well, making it difficult to coordinate."

Hahahaha, I'm from Houston, but grew up even further in the south (think mid Georgia/mid Alabama). Yeah, it's a little tricky, but it'll build character, no?

For the rest of the world, who don't quite grok our quaint pronunciation here in the good ole South, the word ya'll is pronounced like [yawl] similar to [yawn]. It should also be interesting to note that I have used one of the two forms of the conjunction, and I'm often told I use it wrong, but it's a little closer to how we pronounce it. The other spelling is y'all, but that would be pronounced like [ya-awl] and that's just a little to hick-ish even for me.

So if you can get to [yawl] then just tack an extra [ull] on the end and you'll have ya'll'll. You might notice that I tend to conjoin a lot of words, but that's just the spoken style where I've grown up, and as literary style derives from spoken style, well, yeah.

So, ya'll'll have to get a kick out of reading this, and just shake yer heads and mumble something about "that poor southern boy" and if you'd be so kind, drop a dollar in me alms cup as you pass by.

Ok, I've tried now to enlighten the world to some Southern-isms, and I tried to do at least part of it in properly written English, so we'll see. Also, I know it's WAY OT, so hit me with the mod, let's get this over with.

This really isn't all that odd. "Y'all" may LOOK like a contraction of "You all" (because really, it is), but it has become lexicalized in several dialects of English. It now functions as a single word that is the standard second person plural personal pronoun.

So just as you can get "he'll", you can also get "Y'all'll". The GP misspelled it.:)

Because your company and personal sandbox are valid representation of a mail system that serves millions of people. When either of your servers do that you can post bullshit like this.

Hell, even the company I work for has outages for both proactive and reactive maintenance, and that's only for 5000 people.

To say that because you've never had an outage you never will have an outage is absurd.

On top of this, saying that google should "have a backup" is silly. Do you even understand how redundancy works? Do you even understand how web based mail systems work? I really don't think so from this comment. If the error has nothing to do with servers falling over and is an issue with routing then you can have all the redundancy you want, but it won't make a difference.

At this stage it's any comments are merely conjecture, until google make a press release advising of what happened comments like "have a backup" are just troll posts.

Because your company and personal sandbox are valid representation of a mail system that serves millions of people. When either of your servers do that you can post bullshit like this.

The parent poster's simple little postfix system doesn't NEED to serve millions of people. That's a feature: by not needing the immense complexity that goes along with running a web-based email system serving millions of people, his system is smaller, simpler, and less prone to problems.

It's impressive that Google's Gmail runs as well as it does given its size, but smaller, simpler solutions are almost always preferable. For company email (especially in a small company, not some behemoth company with 100k employees needing lots of mail servers), it simply makes more sense to use a small, simple mail server like the parent's postfix system, rather than to rely on some external vendor's multimillion-user system. Especially since the software needed to run that system is all available for free.

it simply makes more sense to use a small, simple mail server like the parent's postfix system, rather than to rely on some external vendor's multimillion-user system. Especially since the software needed to run that system is all available for free.

Have you ever tried setting up one of these things? I am not a novice and I can setup web servers without much headache, but things like postfix and exim are very old and have weird configuration files with their own syntaxes

The howtos you can find are literally hundreds of pages long and mostly outdated.

And if you have everything set up after two days, try adding a spam filter.

Yeah, that's funny. You know what's also funny? The treadmill I bought 3 years ago and never used is in mint condtion. I've never had a problem with it sitting there under the pile of clothes in the corner. I read that 24 Hour Fitness has TONS of problems with their treadmills going down, but mine just keeps going without a single issue. I guess they just bought the wrong brand. Stupid idiots.

I would look at it this way. There is absolutely no excuse for 24 Hour Fitness to have a single hour were they do not have functioning treadmills.

I would have to agree. Although how they use the treadmills in fault tolerant arrangements is important. Do they simply route people to a working treadmill, for example, when one fails? Or do they operate in an active-active clustering arrangement, where a person uses two treadmills simultaneously and fails over to a single treadmill when one stops? I imagine co-location of the treadmills would be a key success criterion in the latter configuration.

I dunno - I've been using G-Mail and Google Apps since each was introduced, and this is the first time one of their outages has impacted me, or anyone else that I talk to (true, that's not a lot of people, but...).

Why is the parent modded funny? I think it's an honest comment. I've been using Gmail for 5 years now (precisely since September 2004) and this is only the second outage that I've experience which prevented me from logging in.

The only thing that bugs me is the Gmail user interface. Sometimes it doesn't record my actions (such as reading messages) and has an indefinite "Loading..." message which forces me to reload the whole page. But, this could also be something related to Safari.

I'm the guy that switched our email service to Google. See, it only costs us $50/year/user and this has been the first outage in over year. We used to pay a full time sysadmin to manage the mail server and would average about 12-20 hours of total downtime per year (maintenance, outages, etc.).

Obviously, the switch to Google has been much better for the corporate bottom line. Not to mention that we also get calendaring, wiki/sites, docs, and chat for the same price.

So to meet 99.99% uptime, you have to have less than 52 minutes of downtime, total, planned and unplanned, in a year. That's really hard. Really. Think about it, few enterprise systems can rarely do that (Peoplesoft update in 50 minutes? HA!). But here, a 1.2hr outage puts them firmly out of the four nines club.

Why does it matter that the outage affects multiple organizations? If you're studying the reliability of an e-mail service (presumably to decide whether or not you want to invest in a local e-mail infrastructure, or use someone else's), shouldn't you care more about the reliability of that service as provided to you? What's the difference between you having X hours of downtime a year, versus you and 1000 other organizations having X hours of downtime a year? How is the latter worse for you?

I've used Google mail for years. This is the first outage I've heard of, and it hasn't even affected me. I can tell you the in-house exchange server at my company has caused for more trouble than this for our employees in the past 8 months.

Lies, gmail has been down several times in the past many times for hours at a time. The difference between google and other service provides isn't the number of outages (gmail has more than even hotmail) it is how they are dealt with. E.g. the App Engine outage [google.com].

Really? We had several outages that affected my old company, who was using it as their main email provider. No notice either for one scheduled outage, just "GMail is down for maintenance." Granted, we're in Australia so the times they took it down might have been fine in the US, but a notification would have been good beforehand.

I don't know. Our local email has gone down a few times since I've been here, and this is the first I've heard of Gmail being down.

Also our local email search sucks horribly. I can find a trivial personal message from 4 years ago on Gmail in a fraction of the time I can find suddenly-important work email from six months ago, if I find it at all.

This describes many smaller and even moderate-sized organizations. Every non-tech office I'm familiar with of suffers from frequent (compared to gmail) and severe (day long or multi-day) e-mail outages.

Ditto, the team of geniuses/smart people working at google are frankly far better at their job than Bob, our IT guy. Bob's nice and all, but, well, he's not exactly google material, and there's only one of him, etc. Gmail goes down once, Bobmail goes down once a week.

No one said it's perfect but internet mail services have had at least one or two downtimes and all of the online mail services have been more reliable than my company's mail server. I'd say Google's doing quite well to be honest.

We handed our mail over and it's the first time I've ever had a problem with them as a corporate mail provider. Almost two years. There may have been one other short outage, but I don't remember it being during business hours.

I doubt you could run a mail server more reliably. And, for the difference in cost, I'd stay with Gmail.

We handed our mail over and it's the first time I've ever had a problem with them as a corporate mail provider. Almost two years. There may have been one other short outage, but I don't remember it being during business hours.

I doubt you could run a mail server more reliably. And, for the difference in cost, I'd stay with Gmail.

Just wanted to add that despite GMail's outage, POP was still working. Email on my iPhone was working the whole time, for example.

We've had longer outages locally... but we're a small company so when the exchange went out it took everything out with it: exchange, domain and by extension of domain--file servers.

While we may have had 3-4 hours or so of domain related outages this year they were times when we couldn't do anything anyway. We've never had JUST our exchange go out since it's on the same system as our domain.

If Gmail goes out for 2 hours and we have 4 hours of general down time per year then the Gmail (despite being more reliable) actually increased our email down time by 50% over hosting locally.

You're confusing cause and effect. Presumably your organization is running on Windows Small Business Server (IT people shudder when they hear that name for a reason) and so when Exchange goes down, it does cause problems for SBS, and vice versa.

However, if you take Exchange out of the equation, or if you give it nothing to do, then all your domain problems go away. Now instead of having 4 hours of downtime you get about 1 hour every three years. (that's been about my experience overall with Gmail, of course

So much for handing your email over to Google because it's more reliable than hosting locally...

Yeah. As my subject heading says...

Seriously, though: it probably depends on who's doing the hosting. I'm sure there's slashdot readers for whom setting up (and maintaining) their own mail server is a short task done before breakfast without breaking a mental sweat. I'm further sure I could learn to be one of those people, but I'm betting the time I'd invest in doing that is less than the amount I've time I've sp

I'm sure there's slashdot readers for whom setting up (and maintaining) their own mail server is a short task done before breakfast without breaking a mental sweat.

Not really. The problem is that, as with most hard tasks, it's easy to trivialize in the abstract. In 10 seconds, I can type a command-line script that will answer port 25. In a slightly longer time I can pull down the appropriate packages for the Linux distro of choice and configure them with whatever domain name. In slightly longer, I can configure appropriate countermeasures and firewalling. Given budget and time, I can deploy a suite of additional features including redundancy (local and/or remote), var

That was dumb. I have handed over our email because it's more reliable than hosting locally. This was the first time we've been affected in over a year and it was for a little more than an hour. That's an order of magnitude better uptime than we had before.

Can you beat Google's uptime? I doubt it. Sure, it's not impossible, but you won't be doing it for less than $50/user?

I have been using gmail for years and it's still far more useful and reliable than any other competing service I've tried (including paid services!). When they lose some of my email or are totally down for a week, then I will start complaining. But I probably still wouldn't switch.

Or someone will get congratulated and promoted. It depends on the response to diagnose and fix the issue, whatever it is. Major outages aren't always the fault of some apocryphal guy asleep at the switch.

Why would you fire the guy who caused it. He would probably be the most carful employee after that. People learn from mistakes firing people even for big mistakes isn't a solid business model and bad HR.

I doubt it. Once you get out of high school and work in the real world, you'll find that just because something happens, people don't always get fired.
Why, because usually you'll be firing one of your best employee's a 20 percenter, one of the ones that actually does the work and knows what is going on. And even if it wasn't a 20 percenter , you don't want to send out the message, that if you do something and it causes a problem you're going to get fired.

I don't know that this is actually news-worthy.
I have never worked for a company which has not suffered email outages, no matter how their email is supported.
Granted, GMail has a large list of client companies, but you are a fool of the highest order if you think the name will protect you from outages.

They have state of the art redundancy (I presume), and they have been extremely reliable so far. So this is really IS surprising and interesting. Hopefully nothing major happened to Google's infrastructure. I even opened CNN to see whether some 9/11 event is not in progress or something... GOOD thing they are back online now.

I think the real story here is that it outlines the downside to moving everything to The Cloud, as a lot of people are trying to promote these days. As you said, email outages are pretty common even at large enterprises. The difference is, CIOs like to be able to go and yell at someone in their office for an outage, and know that it's being worked on in some measurable fashion. They don't like it when your answer is, "I don't know what's going on. Ask Google."

The Cloud is great, as long as it always works. But, in my experience, downtime is far less tolerated in hosted solutions than it is in on-site infrastructure. And stories like this make executives nervous about this stuff.

This is front-page worthy because it let's us all know that we're not losing our frigging individual minds over this, that it IS a collective problem. The fact that google knew what the problem was and fixed it before it had time to hit the frontpage of/. just goes to show that they are trying and they do care.

Personally, this just renews my confidence in Goog, regardless of what the twats are doing inside the beltway...

As I type this, I can get in to GMail just fine, but a friend in Texas can't (I'm in Nevada). Guess Google likes us better.

And kudos to the Google team for updating the status when they say they will. Looks like the script they use automatically puts current time + 1 hour in as the default next update time, and they're posting updates before that expire. Too many times, something simple like that gets overlooked.

Google Postini is the service you need for message archiving. Looks a bit pricy as 1 year retention is 25 dollars per user, and up to 10 years is 45 dollars per user. If you want to host your email with Google, I would think Postini would be a necessity for legal discoveries.

Feel like I'm feeding a troll, but johnjones's ID is so low that I feel this silliness may be taken seriously:

how do you get the data out of gmail to switch providers ?

Same way you would do any remote hosted email migration. POP and IMAP. Additional tools are provided for Google Apps (their for-pay version).

ever serviced a discovery litigation from google ?
(you know where they judge you guilty of you dont come up with the data)

sorry but there is a good reason to keep this stuff on site and working...

Umm, an hour of downtime doesn't mean your data is gone. I'll also echo earlier comments -- locally hosted email generally has more problems, as no company but the largest enterprise has the same magnitude of IT equipment and experience as Google.

I've never really understood why so many Slashdotters have this attitude about hosted services. Perhaps they are local IT folks for smaller companies, and fear for their jobs?

I've never really understood why so many Slashdotters have this attitude about hosted services. Perhaps they are local IT folks for smaller companies, and fear for their jobs?

It's the same reason Slashdot has:

such a large component of libertarians

every tech/science story hit with a slew of +5ed comments questioning the basic underlying premise of the research and/or machinery

every story about a study tagged with "correlationisnotcausation"

etc.

...and that reason is that code-hackers, having succeeded in something most people find impossible, go on to generalize that they must simply be hypercompetent, and therefore anything done by others must be questionable by comparison. Thus, hosted services, being run by mere mortals, can't be as good as something set up by one's own brilliant self.

Umm, an hour of downtime doesn't mean your data is gone. I'll also echo earlier comments -- locally hosted email generally has more problems, as no company but the largest enterprise has the same magnitude of IT equipment and experience as Google.

I've never really understood why so many Slashdotters have this attitude about hosted services. Perhaps they are local IT folks for smaller companies, and fear for their jobs?

Could be in part that.
Another explanation is that most that work as local IT folks (for any kind of business) know that when anything breaks, its always considered their fault (they are the people-facing shields, not the actual service providers elsewhere). And everything anything remote "breaks", or suffers any kind of troubles THEY will know it (because people will complain to them). Therefore, they both consider remote services less reliable than the average person (they know about more outages) as wel

I think it's just the psychological impact of the lack of control. It's the same reason that people fear flying more than driving (one of the reasons, anyway) or that it's much scarier when you're the passenger during a dangerous maneuver than if you are driving the car and doing the same thing yourself.

Umm, an hour of downtime doesn't mean your data is gone. I'll also echo earlier comments -- locally hosted email generally has more problems, as no company but the largest enterprise has the same magnitude of IT equipment and experience as Google.

I've never really understood why so many Slashdotters have this attitude about hosted services. Perhaps they are local IT folks for smaller companies, and fear for their jobs?

It's more than that. There are more moving and breakable parts between you and a hosted provider than between you and an internal service, which changes the math a bit.

Some of the single points of failure are shared between both approaches too, so they're a wash for a small implementation. If you're a small company and your non-redundant core switch fails, your email is down either way, because you can't get to your email server or to your hosted provider, no matter how redundant your provider is. There are various components for which this is true, which helps to mitigate the benefit of a hosted service where your mail server is replaced by a massively redundant cluster.

You also have additional dependencies. If you're a small business with a single T1 to the internet, let's say, and the telecom bunker outside your building catches fire and you lose internet access, you've got problems. With a local email service, internal mail works, but you can't send email to or receive email from external users (let's pretend you don't have an offsite secondary MX or an outbound mail spool where this stuff queues, mostly invisibly to users). For organizations that are hugely dependent on internal email, that's quite a bit better than having no access to your (hosted) email at all.

Additionally, you get concerns about "If we outsource this today and we have problems in 2 years, will we still have somebody here who can design/build/find a better solution, or will it cost us a fortune in consultants if we let the in-house expertise lapse?".

You also have support issues. Google specifically is well-known for only doing things that can be automated (and doing them well, mind you). Support isn't always one of those things, and small companies are well-acquainted with getting the shaft from vendors because your business isn't worth enough for them to care (check out the quality differences between the enterprise and SMB versions of various products for examples). Given the importance of email to most organizations today, folks are a bit reluctant to hand it over to an outsider with minimal financial incentive to devote resources to their specific problems.

If you're a 5-person business, outsourcing email is likely a good idea, but once you start getting into the teens and twenties or so, it's probably worth a look at your particular circumstances before continuing that assumption.

Full disclosure: I'm currently a local IT guy for a smaller company, with enough on my to-do list that if I thought outsourcing email would work well for my users and save us time & money, I'd be all over it.

I have been the full time sysadmin responsible for the mail server. I have had the job of keeping the mail service up. It's not cheap. You need redundant networking, redundant servers, redundant storage, redundant staff, and the glue to make sure it all works. For anyone spending less than a couple hundred thousand a year on IT, it's damn near impossible to beat Google's uptime for hosted mail.

As for your other concern about getting the data out of Gmail - you use the same protocols the rest of the Internet uses - IMAP/POP and SMTP. Not rocket science.

It's a good idea as long as your allowed to do something about it. At some companies, you just use it and suffer. From what I've seen, the culture at Google allows people to make contributions (e.g., Labs) to fix things.

Upside: shows confidence in your products; makes it more likely that your engineers will spot problems if they use the software and services themselves; can increase how motivated people are to improve the products

Downside: tainted dogfood kills the engineers who would have investigated the issue

Cloud computing is exactly the kind of buzzword-laden, idiotic fad that tends to be loved both by corporate marketing droids and technophobic Baby Boomers, both of whom have roughly equivalent levels of intelligence.

All it is going to take is a single major, successful DDoS attack against Google or some other cloud provider, and the cloud will go to the memetic rubbish bin where it belongs.

If you're one of the intellectual cripples who has difficulty understanding why cloud computing is a bad idea, ask yourself the question of whether or not you're going to be able to access your email if Google goes down, or if web access outside your ISP's own subnet does.

Yes, I have a Gmail account, but it is a convenience linked to my WoW blog, and a spam trap at best. It isn't something which I rely on for anything truly important, because I'm old enough to remember decentralised email, and to have more fucking sense.

Darn fool kids; they never learn. We keep seeing the same old mistakes being made, over and over and over again. I'm reminded of the old Frantics [youtube.com] song, here.

Dumb terminal/"cloud" computing? Boot to the head. Creating a single, centralised point of failure which is just waiting for a DDoS attack. Genius.

Binary subpackaging of libraries? Boot to the head. Given what bandwidth and disk space is at these days, any claim that it saves space is totally bogus, and the only thing it does do is add needless complexity, and reduce reliability. Put the whole thing in a single package, and stop thinking you're smart for doing otherwise. You're not.

Writing opaque package management in C, with a dep list a mile long, when a system written in shell, awk, and using the graph/dep management ability of Make itself would work probably more effectively? Boot to the head. Although sorry; I keep forgetting that Awk isn't considered a "real," programming language. You might want to let the guys using it for AI research know that, though; they could forget otherwise.

Being a snot nosed, latte sipping, yuppie CS graduate who thinks they know how to code, and then spawning attrocities like Dbus? Boot to the head. The kernel hardware notification system and udev work perfectly well by themselves. Adding more daemons when you don't need to simply adds unnecessary complexity, which again potentially reduces robustness.

Writing opaque, non-standard, dynamic GUI "automounter" garbage for Crapbuntu instead of teaching users how to edit/etc/fstab? Boot to the head. Use things which are easily locatable, and written in text which can likewise be edited easily. Then again, I guess I can't expect the Stallmanite 14 year olds who code Linux's userland these days to know about real UNIX philosophy, now can I?

Causing GRUB to default to "quiet splash," in Crapbuntu so that when the boot process inevitably fails due to the distro coming with Bit Torrent servers by default, the user can't see the daemon that is causing the boot process to fail, and are thus left with a totally opaque, unfixable black screen that they can't recover from? Boot to the fucking head, x100.

Most of the long distance in the country dropped that day, triggered by 4ESS switches hitting a bug, detecting, it, going offline (with load shifted to other switches). Increased load made the bug in question more likely to be hit, so those switches would in turn drop and shift load away (sometimes back to the originator). 9 hours of basically no long-distance service.

And just think, it was a year and a half before Berners-Lee announced the "World Wide Web" and Linus announced that he was working on this "Linux" thing.