Troubleshooting a downed mission-critical system can be terrifying, but a slow, methodical approach can save you time

If you've been in IT for more than a few minutes, chances are you've seen it happen: A mission-critical production system falls flat on its face, and you have absolutely no idea why or how to even begin to fix it. Moments of true terror punctuating the monotony of too many project meetings, application rollouts, and systems upgrades is really what makes IT interesting -- and one reason why it's not for everyone.

The troubleshooting process of seemingly inexplicable failures can be one of the most stressful parts of the job. Unplanned downtime of a mission-critical system can invite the harshest scrutiny from coworkers and management in even the smallest of organizations, and it only gets worse as the size of the enterprise grows and the stakes get higher. That additional pressure often leads even the best engineers to make very dumb mistakes, further compounding the problem and prolonging the downtime.

Staying cool under pressure isn't easy no matter how many times you've been tossed into the fire, but there are five easy rules you can add to your emergency troubleshooting processes to get to a resolution faster, conclusively prove the cause of the outage, and avoid making things worse.

To continue reading, register here to become an insider. You'll get FREE access to premium content from Computerworld, NetworkWorld, InfoWorld, CIO, and CSO. Go now!

This story, "5 rules for better troubleshooting" was originally published by
InfoWorld.