ITAPPMONROBOT

At the turn of the 21st century, Initrode Global's server infrastructure began showing cracks. Anyone that had been in the server room could immediately tell that its growth had been organic. Rackmounted servers sat next to recommissioned workstations, with cables barely secured by cable ties. Clearly there had been some effort to clean things up a bit, but whoever put forth that effort gave up halfway through.

It wasn't pretty, but it worked for years. As time passed, though, a proprietary gateway server to communicate with credit processing agencies would crash more and more frequently. And these were bad crashes, too — the kind of crashes where the server wouldn't respond to ping and would have to be restarted manually. It wasn't really a big deal for the admin, Erik, to hit the restart button on the server when he was there, but that was only 40 hours a week. The credit union needed it to be active 24/7, but was unwilling to hire 24 hour staff in the datacenter. The problem kept getting worse and worse, so the IT manager called up a meeting.

"OK guys, what can we do about this?" asked Laura, the IT manager. "Can you guys in dev fix this?"

"No," began Erik, before anyone in dev could respond. "The issue is with the server, not our software."

"Well, when does the support contract end?"

"Two years ago."

"Great. And we can't replace the unit while we're in a budget freeze..." Laura wasn't sure what to do. "Well, what's our workaround for now? What happens when it goes down?"

"Right now, I just hit the restart button."

"OK, well, we'll have to replace it once I get the budget approved. For now, though, what can we do? We need this online all the time." Laura sighed and began tapping her pen on the table. "No one has any other ideas?"

At this point the room fell silent and everyone tried to avoid making eye contact with Laura. Erik had a script running that would ping the server every few minutes and alert him if it didn't respond so he could halfway proactively keep things running. It had to be restarted manually whenever it crashed, so there was no easy way to fix it remotely.

"We could build an admin robot," Erik joked.

Hours later, Erik was in the datacenter, hitting the restart button again, disappointed that the meeting had ended without a workable solution. Laura walked into the room and greeted Erik.

"So, at our meeting earlier, you suggested building a robot." Laura had apparently taken his suggestion seriously. "Is that something we can really do?"

"Well, I was just ki... I mean, I don't know anything about circuitry, or how to build robots." Erik tried to keep his tone somewhere between serious and kidding, so he could gauge Laura's reaction.

It was then that he idly looked at his computer, which had just ejected a disk image DVD he'd burned.

It sparked an idea, but it was too absurd to say out loud. Still, he couldn't help but chuckle at the thought.

"What?" Laura asked.

"It's nothing," Erik responded. "It's stupid."

"We're desperate. Do you have an idea?"

"No, it was really stupid." Erik sighed. "I just had the idea that a CD ROM drive in an old system could eject and hit the reset button. It was a ridiculous idea."

"Wait," Laura began, "could you really do that?"

It was another uncomfortable moment for Erik, but she seemed serious, so he just went for it. "Uh, yeah, I could, but it's hardly the best solution... I mean, I'd have to position the servers just right, somehow get the heights and alignment correct, and update the polling script to eject the CD ROM drive any time it didn't respond to ping."

And that was exactly what Erik found himself spending the rest of the afternoon setting up. He found an old PC, updated his script to ping the server every two minutes and eject if there was no response, and with the help of a few phone books found the perfect height and position on the floor. At any point while he was setting it up, he expected Laura to jump out from a corner and yell "just kidding," but it never happened. Finally, Erik stood up, and ashamedly admired his work. He slapped a label on it that read "ITAPPMONROBOT," and another below with big underlined letters that read "DO NOT MOVE."

Years later, and long after Erik had left, the faulty server was taken offline and replaced with a new one working under a new IP address. During the swap, ITAPPMONROBOT was moved to a neglected corner of the server room, plugged back in, and promptly forgotten. It spent the last weeks of its life dutifully opening and closing its CD ROM drive every two minutes, reaching in vain for the restart button that it'd never touch again.

Awww, come-on guys, this is not a WTF! This is classic seat of the pants make the best of a bad situation with some techie innovation.

Amen. The situation is a WTF, but the solution is a brilliant hack, and not in the Paula Bean sense. He reused old hardware to create a workaround. It was better than what they had before. No budget was spent. Brilliant!

Wouldn't a UPS controlled by the 2nd machine have beeen easier? Just cycle it when the machine stopped responding.

It's also a good diagnostic tool--the weakest component will fail first, and much more quickly, if you're power-cycling the machine 10 times a day.

Mind you all my critical servers are set up this way (a ring o' UPSes, where machine N's USB port controls machine ((N+1) mod M)'s UPS)--although the monitor script will refuse to power-cycle a given machine more than once every 12 hours. If it doesn't come back up the first try, I want someone to check out the hardware just in case magic smoke is leaking out of something.

I used to build a circuit with a 555 timer, a 74LS00 and a couple of 74LS191's which would count very slowly to 16 (it took about 4 minutes) but would reset the counter to zero every time the hard disk light blinked. The "overflow" output pin on the 4-bit counters was connected to the RESET signal on the motherboard. No disk activity for 4 minutes and the machine gets reset. With an appropriate interface chip the serial port could be monitored too (I used to need this when I ran a BBS).

This reminds me of a story from an ex-IBM employee who told me a story about one of their old mainframes which didn't quite work right. Every now and then it would just stop working for whatever reason and had to be reset, losing lots of impoertant information.

This happened fairly often and they decided to rebuild it from the ground up to fix whatever was wrong with it. No luck: it still stopped working quite regularly, so they posted a guy near it at all times so that when it shuts down he just has to hit the button and it'd restart. Eventually, the guy got so pissed off at the machine he just kicked it; somehow it started up again with no lost information at all.

The next day he shuffled over to my teacher and said, "last night when I it stopped I just kicked it, and it just started working again."

My teacher looked at him and asked, "where did you kick it?" So the guy points it out and my teach got out a peice of chalk and put a circle where he kicked the machine. "Next time it breaks, do that again."

So, eventually it broke again and the guy kicked it. The machine whirred back to life the way it did last time. After a week or two of doing this without losing any information they decided to report it to a senior manager.