Ninth Annual Workshop for the Energy Efficient HPC Working Group (EE HPC WG)

Monday, November 12th 9:00AM-5:30PM SC18 Dallas, Texas USA

This annual workshop is organized by the Energy Efficient HPC Working Group. This workshop closes the gap between facility and IT system with regards to energy efficiency analysis and improvements. For sustainable HPC and especially exascale computing, power and energy are a main concern, which can only be addressed by taking a holistic view combining the HPC facility, HPC system, HPC system software, and the HPC application needs.

This workshop is unique in that it provides a forum for sharing power and energy best practices and lessons learned from supercomputing centers from around the world. Discussion and audience participation is encouraged. There are presentations, panels and discussions. Presenters are mostly from major governmental and academic supercomputing centers. The panels encourage discussion around more controversial topics and include panelists from supercomputing centers, academic institutions as well as the vendor community.

"If you can't measure it, you can't improve it." We now have some sites that are extremely well instrumented for measuring power and energy. Lawrence Berkeley National Laboratory’s supercomputing center is a prime example, with sensors, data collection and analysis capabilities that span the facility and computing equipment. We have made major gains in improving the energy efficiency of the facility as well as computing hardware, but there are still large gains to be had with software- particularly application software. Just tuning code for performance isn’t enough; the same time to solution can have very different power profiles. We are at the point where measurement capabilities are allowing us to see cross-cutting issues- such as the cost of spin waits. These new measurement capabilities should provide a wealth of information to see the tall poles in the tent. This panel will explore how we can identify these emerging tall poles.

BOF: Data Analytics for System and Facility Energy Management

Several leading edge supercomputing centers across the globe have been working on developing or acquiring data acquisition and management systems to support energy efficiency studies and reporting, to understand rare and important events, and to help energy providers with energy scheduling and management. This BoF presents experiences from HPC centers regarding data source integration and analytics and seeks those interested in hearing about or sharing experiences with dynamic power and energy management. It is important to get the community together to share solutions and educate each other about the challenges, success stories, and use cases.

BOF: Power API and Redfish: Standardizing Power Measurement and Control for HPC

The HPC community faces considerable constraints on power and energy of HPC installations. A standardized, vendor-neutral API for power measurement and control is needed for portable solutions to these issues at the various layers of the software stack. In this BOF, we discuss the Power API and Redfish; APIs for measurement and control of power/energy on large systems. The BOF will introduce newcomers to these efforts, differentiate the goals of the two APIs and discuss inter-operability. An interactive panel discussion with experts from involved organizations will facilitate discussions between both API communities with ample time for audience questions and comments.

With power becoming a first-order design constraint on-par with performance, it is important to measure and analyze energy-efficiency trends in supercomputing. To raise the awareness of greenness as a first-order design constraint, the Green500 seeks to characterize the energy-efficiency of supercomputers for different metrics, workloads, and methodologies. This BoF discusses trends across the Green500 and highlights from the current Green500 list. In addition, the Green500, Top500, and Energy-Efficient HPC Working Group have been working together on improving power-measurement methodology and this BoF presents case studies from sites that have made power submissions that meet the highest quality of measurement methodology.

BOF: A Look Ahead: Energy and Power Aware Job Scheduling and Resource Management

Energy and power aware job scheduling and resource management (EPAJSRM) capabilities are implemented or planned for large-scale HPC systems in ~10 sites worldwide. Some of the sites are interested in using these capabilities to allow an application to provide hints and other relevant information to an EPAJSRM job scheduler. Another important capability is to notify applications of power management decisions, such as changes in power usage targets and providing awareness of what is going on in the machine that might have made a job run slower. This BoF explores the these capabilities from the perspective of three different sites.

The goal for procurement of HPC systems is to identify optimal solutions to both technical and financial targets that maximize the contribution of the system to the organization's mission. It is important to consider total cost of ownership, cooling & power requirements, the needs for interfacing with the HPC facility, and power management and control. In this BoF, we present energy efficiency highlights in recent procurements by four sites and encourage audience participation. Presenting sites are from Japan on power efficiency; Europe on cooling enhancements; the United States on future efficiency requirements as well as on facility integration and efficiency.

BOF: The Facility Perspective on Liquid Cooling: Experiences and Proposed Open Specification

As compute densities increase there is growing demand to more effectively cool power dense equipment and improve energy efficiency with compressor-less cooling. This BOF will explore the steps necessary to take advantage of warm liquid-cooling in the data-center and introduce an open-specification for a secondary fluid warm liquid-cooled rack.. Lawrence Berkeley National Laboratory and China Institute of Electronics steer this initiative and seek input from the HPC community. This BoF will feature a panel of seasoned operations managers from major supercomputing centers to talk about strategies for effectively enabling warm-water cooling, including a discussion on the need for industry standards.