Why you should have a torch in your server room.

The first law of systems administration is Murphy's law. If you don’t know what Murphy’s law is, you haven’t been a systems administrator very long.

Here is a story of what happened when I’d just started a new job.

It was my third day. I'd just started as a senior sysadmin for a large company whose headquarters were in a six floor building located in a suburban office park. I was replacing a developer who had managed the servers part time as his primary responsibility was building a database for the company. He’d decided to move on and the company figured they needed someone who was solely responsible for managing the company’s servers. When I arrived on the job, documentation on server configuration was non-existent. The server room was located in a room in the basement. I worked on the sixth floor.

At about 10:30 on the morning of my third day, the building’s electricity went out. I found out later that some guys in hardhats down the road had done something exceptionally clever with a back hoe, knocking out the power not only to our particular building, but also to half the suburb our office park was located in. When the electricity went out, I was in a meeting with my manager. Being new, I knew that the servers were connected to several UPS. I did not know much more than that. I hadn’t yet had the time to go down and audit whether or not these servers were configured to shut down properly. I asked my manager whether or not he knew if the previous developer guy had set the servers to shut down gracefully if they got a message from the UPS telling them that the electricity had just gone out. My manager wasn’t really sure. There was only one way to find out. I decided to go down to the basement and check.

With the power out, the building’s elevators were unusable. Luck for me the stairwell had a skylight. I’m overweight and not especially fond of stairwells. I did my best to jog down the six flights of stairs. It was really something between a controlled fall and a quick amble. When I reached the basement level, I found it completely dark. Being my third day, I didn’t know whether or not the building had a backup generator. If it did, it certainly wasn’t working. The basement was so dark that I couldn’t at first even see where the server room door was. After letting my eyes adjust, the location of the server room given away by the dim green light of the swipe card lock. I wasn’t going to be able to do much if it was this dark in the server room.

I headed back up to ground level and asked the receptionist if she had a torch. She didn’t, but offered me her cigarette lighter. It wasn’t one of those cool metallic lighters, but one of those cheap ones that you can buy at the gas station. It would make light and, as the saying goes, any port in a storm.

In the movies, people can see a great distance when holding a small flame. I learned that this isn’t the case with a cheap plastic cigarette lighter. It is also difficult to keep the lighter lit for more than a few seconds. They aren’t designed to run continuously. After a bit of fiddling around, I was able to get it to cast a little bit of light and was able to find the server room door. My swipe card worked. I entered the server room.

The blinking lights and the whir of fans indicated that the servers were still operational and weren’t currently in the process of shutting down gracefully. The two UPS were beeping indicating that the continued operation of the servers was measured in minutes. Each UPS had LED bars on the side indicating how much charge was left in the battery. There were four lit LEDs on the first UPS and three on the second. I found the desk that had a single monitor, keyboard and mouse connected to a KVM switch. All of the servers were connected to the KVM switch.

It was at this point that I discovered the monitor had no power. The monitor was not plugged into the UPS. Of course I needed to plug the monitor into a UPS. Working out where cables go using illumination from a cigarette lighter is challenging, especially considering the cable spaghetti that covered the floor. I was lucky in that there was one outlet left on one of the UPS. I connected the monitor cable. The monitor switched on. I couldn’t see anything.

The KVM switch also required power. There were no slots left on the UPS. The only way I could get power to the KVM switch was to find a server with a redundant power supply and to unplug one supply from the UPS, replacing it with the adapter for the KVM switch. Not the world’s best solution, but better than having these servers crash when the UPS ran out. With the very hot cigarette lighter in my hand, shuffling the cables took a few minutes. I was beginning to wonder if these small lighters could get so hot that they could explode when I finally brought the KVM switch online. I looked over at the status LEDs on the UPS. Both now only had two functional LEDs. I still had some time, but the batteries were definitely draining.

The servers had not been configured to shut down gracefully. The two UPS had been installed merely to safeguard against brownouts. I spent the next few minutes using the KVM switch to flick between servers, entering the necessary commands to gracefully shut them down. As every administrator knows, it can take a while to bring down a server properly. After the appropriate commands had been input, I waited for more minutes as disks whirred and finally fell silent. In the dark silence I sat with a grin. I had done all right. My actions had ensured that the servers came down gracefully.

Rather than having a few moments to savor my victory, the fluorescent light overhead flickered on as the building’s electricity supply was restored. I sighed and began the bringing each of the servers back up.

Discuss this Article 1

What would be more helpfll would be an article to have the Server MANAGE Power Outages Automatically. E.G. Lights Out. This SysAdmin was not armed with the light of a Cell Phone? Did not contact the Buildings Facilities Staff who would have a Flashlight or perform follow-up to ensure some lights were on a backup generator. What about follow-up concerning Lessons Learned and what actions were taken to safeguard the systems in the future. Just moving your Office to be next to the servers would be a help.