How to ban a machine from executing jobs

Known to work with HTCondor version: 7.0

Suppose jobs are mysteriously failing on a particular machine. Some kind of hardware problem such as memory corruptionis suspected. It is probably a good idea to turn HTCondor off until the problem is solved. In addition, just so no mistakes are made, it may also a good idea to take the machine out of the HTCondor pool, in case HTCondor gets restarted prematurely.

Do this by adding to the HTCondor configuration visible to the condor_collector daemon:

The existing machine ClassAd will take time to expire (~15 minutes); restart the condor_collector daemon if the expiration interval is too long to wait.

This work supported in part by NSF grants MCS-8105904, OCI-0437810, OCI-0850745, and/or ACI-1321762. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Site built using CVSTrac.