Slashdot videos: Now with more Slashdot!

View

Discuss

Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

1sockchuck writes "Lightning, floods, car crashes, and coding snafus had starring roles in major Internet outages of 2010. Data Center Knowledge reviews the year's business downtime, including outages for banking and e-commerce sites and several incidents that knocked state government services offline. Meanwhile, Pingdom focuses on downtime for major social media sites and Wikileaks. Then there's the guy who got drunk and shot up a server."

Are you one of those Californian hippies? Kombucha is some good shit, man. We've had such bad downtime lately from all the storming and flooding, epecially in San Diego.

Some people would be happy to postpone their field tests until the next year, but for me it's a missed opportunity to get the hell out of the office for fresh air. Plus, the street leading to my gym is also flooded. I'm having a sad.

And then you have the cheap bastards and/or business without money. Time and materials are often needed to perform preventive maintenance. So, I'd say were seeing a lot more reactive vs proactive support as a result.

And then you have the cheap bastards and/or business without money. Time and materials are often needed to perform preventive maintenance. So, I'd say were seeing a lot more reactive vs proactive support as a result.

Preventative maintenance on server farms has pretty much been proven a losing proposition.

Nothing is usually done on a routine basis in medium to large server farms until some automated reporting software indicates a significant malfunction.

With today's fail over technology, even that is often easier to deal with AFTER a blade fails and its work load is instantly migrated to a hot spare.

There is just not that much preventative work you can do these days. Its all AFTER the fact replacement.

If you think firmware on modern devices is perfect, you're sorely mistaken. Just as with any other software, firmware on shipped servers can be buggy and lead to data loss. That can range from a nuisance to extended downtime. Things like BIOS, back-plane, RAID controllers, and HDDs may require updates to prevent data corruption or total loss. These are all things can be scheduled ahead of time in a controlled manor rather than be blindsided with issues during normal business operation.

And that's just strictly hardware. OS, backup job review, and patching are all part of the preventive maintenance routine. If you wait till it's too late, you could be kicking yourself later and wishing you HAD spent the money over time rather than one lump sum in man hours with extended down time.

No! It is not. That's like saying having two bridges side-by-side is preventative maintenance. But that would be incorrect. While it's great for down-time mitigation, in totality the infrastructure still must be maintained.

You wouldn't let your company limp along on the last remaining failed over server for an extended period of time, would you? The answer should be "No".

Of course you can do PM. You check the charge level of each cell in your battery packs, you check the temp in all the critical portions of the converter stages of the UPS's, you test the ATS by killing the utility power and run the generators under real production loads which also tests your AC units ability to quick cycle and that's just the infrastructure side.

I would like to add our brand new little 'stimulus' rural sewer company that cut the main, and ONLY, fiber line to our little M&P ISP's backbone while digging a hole. No internet for 2,500 people and businesses on "Main Street" (we actually have one of those) for about a day. Keep in mind the fiber is actually above ground here - and runs right past my house, which is not fun to think about when I pay the bill for my 12/2.5 MB line, but that's neither here nor there.