Adopting Intel Cluster Checker as an overnight quality gate

Imagine a cluster that runs Intel® Cluster Checker on a fixed schedule and even reports failures automatically.By using a few extra tools, some of which may already exist on your system, one could have a self-checking cluster in just a few steps. The key to reach an automatic Intel Cluster Checker is creating a script that will use all these tools.

Within the script, one could update a file on the system that is checked by Ganglia (or Nagios) for monitoring. The script could also be setup to email a local tracker upon completion with all pertinent log information sent to the correct people. The possibilities are endless. Once the script is set up, it can be run in a batch system, such as Slurm (or PBS), and scheduled in a daily/weekly/monthly basis using cron.

By taking the examples above and adding in some site specific flavor, Intel Cluster Checker execution can be fully customized to detect and report a cluster failure without user intervention. One can run jobs with confidence knowing their cluster is being checked regularly and the right people are being informed when something goes wrong.

See the Intel Cluster Checker product documentation for more details.

To download the latest release, log into the Intel® Registration Center and click on the Intel® Cluster Checker product.