This has been a very difficult several days and we are still far from out of the woods.

This past Saturday morning the air conditioning in our server closet started acting up, apparently cycling on and off. Around noon that day, we deemed it bad enough to come to the lab. It's a good thing, because the AC was completely down when we got here. We shut most machines down and restarted the AC. It seemed to hold. But later that day our monitors showed the temperature increasing again, even with a small number of machines running. We came back to the lab and shut down everything except the web servers. That small load is OK even with no AC.

That's the way it has been, off and on, since. The physical plant people have been here several times. They have been doing a good job, even though low staffing levels have cut into the time that they can give us. The current diagnosis is that the AC has a bad condenser fan. Now it is a mater of getting the part - not trivial, unfortunately. In the meantime, they rigged up a piggyback fan, which did help some. Just not enough to run the project.

Back when I ran VAX's in an airconditioned computer room (I know, I'm showing my age) we had temperature sensing attached to the UPS which could be set to signal the dependent servers to shut down if a temperature emergency arose (too high for too long).

I'd be surprised if there was not something available relatively cheaply nowadays to give a warning to signal the servers to shut down gracefully if a temperature emergency arises. Save you having to dash into work, and would be a safeguard for your data integrity too, removing the need for you guys to have to jump when the rude mechanics says "frog"
____________

I'd be surprised if there was not something available relatively cheaply nowadays to give a warning to signal the servers to shut down gracefully if a temperature emergency arises. Save you having to dash into work, and would be a safeguard for your data integrity too, removing the need for you guys to have to jump when the rude mechanics says "frog"

How about free?

Most systems have temperature monitoring built in these days, and many Linux distributions have the tools either available or built in.

The problem of course is that it's an inexact science. Monitoring a CPU doesn't help you if a hard disk gets too hot (and they tend to be the first to 'die' now as modern CPUs usually have built-in thermal protection).

Then there's the lack of standards. There's a communication standard (I2C), but no standard as to what sensors should be implemented or what values represent a particular temperature. I got caught out by this - implemented thermal monitoring on a system which promptly shut itself down because the seemingly reasonable high temperature limit I'd set was misinterpreted as a value that was lower than the ambient temperature in the server room.

Anyway, glad to see the project survived the AC woes - that was too close for comfort. Thanks to the team for being on top of the problem even at the weekend.
____________
Stats site - http://www.teamocuk.co.uk - still alive and (just about) kicking.

We're taking the project off line for the night, partly (mostly) for temperature concerns, but also to let the back end queues drain and give more I/O to the thumper root mirror re-sync that became necessary.

We're taking the project off line for the night, partly (mostly) for temperature concerns, but also to let the back end queues drain and give more I/O to the thumper root mirror re-sync that became necessary.

I do find it hard to believe that the condenser fan is hard to get.....
Most AC equipment uses fairly standardized motors, unless this unit is either very old or a one-off odd duck.
Grainger has about every motor used in the refrigeration field in their online catalog. HVAC motors...
____________
***************************************
I am still the kittyman.
Accept no imitations.