December 10, 2013

We are currently experiencing some performance degradation on the Infiniband network. Jobs that generate large amounts of multi-node traffic may be impacted. This issue is due to Mellanox firmware defects that cause ISL’s to drop and not retrain. Although the ISL’s are redundant, losing paths degrades performance. The resolution is intrusive and requires cluster downtime.

We plan to update the firmware on all IB switches in the cluster at the next downtime; 12/28/13.

November 15, 2013

The HPC system will be shut down during the maintenance detailed below. HPC jobs that are still running on December 28th will be canceled by 8am. If you have any questions, please contact help-hpc@uky.edu.

University of Kentucky Academic Planning, Analytics and Technology (APAT) has scheduled electrical maintenance for the McVey Hall Data Center’s Uninterruptible Power Supply (UPS) systems and the building switchgear on Saturday, December 28, 2013 [~ 6:00am]. The entire building will experience several short interruptions in power over a couple of hours while UK’s Physical Plant Division (PPD) electricians perform required maintenance on the automated distribution switchgear. At the same time, two of the Data Center’s UPS systems will be taken down for electrical system changes. The third UPS system will remain on to maintain network, Active Directory (AD), Domain Name System (DNS), and F5 availability.

The UPS system downtimes are expected to last up to 8 hours. During the downtime, most systems including SAP, Exchange, the VM-Farms, Blackboard and its peripheral systems will be unavailable. This means individuals will be unable to access data in SAP, receive or send email from UK Exchange Accounts or view course information on Blackboard. The functional outage time will vary depending on the system.

For questions about the upcoming electrical maintenance, please contact Butch Adkins at butch@uky.edu or 859-218-1716.

November 10, 2011

We have added four new C6100 compute nodes to the cluster, each currently configured with four GPUs each (NVIDIA Tesla M2070). The GPU nodes are identical to the basic compute nodes (12 cores and 36 GB), except for the externally attached GPUs.

July 21, 2011

0730 The system is currently experiencing issues with the cluster file-systems. This may affect login sessions as the HOME dirs may be impacted. There is no ETR yet as root cause is still being identified. Check this page for more updates.