John Mellor-Crummey, professor of computer science (CS) and of electrical and computer engineering (ECE) at Rice University, is participating in the design the 2021 exascale supercomputer Aurora with Argonne National Laboratory, Intel and other Department of Energy (DOE) partners.

Aurora is expected to become the first supercomputer in the United States to break the exaFLOPS barrier.Â

â€śI see it as a real opportunity to try to influence the hardware and software development for this system,â€ť Mellor-Crummey said. â€śThe goal is to design more usable supercomputers that the laboratory will receive three to five years in the future.â€ť

Mellor-Crummeyâ€™s involvement in supercomputing dates from 1989, when he joined the Center for Research on Parallel Computation (CRPC) at Rice University. Led the by late Ken Kennedy, CRPC was a multi-institutional NSF-funded research center active from 1989 to 1999.

As part of the effort to build software systems for supercomputers, Mellor-Crummey, his students and research staff developed HPCToolkit â€“ a suite of multi-platform performance tools widely used for pinpointing performance bottlenecks in parallel programs.

Based on their early experiences with HPCToolkit on Titan â€“ a supercomputer with Graphics Processing Units (GPUs) to accelerate computation that was installed at Oak Ridge National Laboratory in 2012, Mellor-Crummeyâ€™s research team advocated for changes in GPU hardware to support better attribution of performance losses. New NVIDIA GPUs released after 2015 include support for sampling-based performance analysis based on their suggestions.

Currently, Mellor-Crummeyâ€™s group is developing a new generation of analysis tools for the emerging GPU-accelerated Summit and Sierra supercomputers, recently ranked first and third on the list of the top 500 supercomputers. Â

Mellor-Crummey has also been co-leading development of an application programming interface for performance and correctness tools for OpenMP 5.0. The latest version of the OpenMP programming model for multithreaded processors enables an application developer to offload computation to an attached accelerator, such as a graphics processing unit.

Keren Zhou, a Ph.D. student in Mellor-Crummeyâ€™s team, said they are extending HPCToolkit on emerging GPU devices to support a variety of GPU programming models, including CUDA, OpenMP and RAJA.Â

â€śOur tool is the first complete implementation that composes code-centric profile views on both CPU and GPU, by overcoming many difficulties like merging CPU and GPU call trees, analyzing GPU binary files and constructing control flow for GPU programs,â€ť he said.

â€śWe believe our tool will make significant contributions to the exascale project by helping scientists understand the complicated behaviors of parallel programs and in turn improve the performance of critical applications,â€ť Zhou said.Â

â€śHardware technology has been evolving very rapidly over the last 10 years, and the pace of change is accelerating. Supercomputers are using the newest hardware technology to deliver high performance, and there are significant challenges in using them efficiently,â€ť Mellor-Crummey said.Â

When the Aurora supercomputer is delivered in 2021, Mellor-Crummey hopes to empower its developers through performance measurement tools. Â

â€śNext-generation systems such as Aurora will be radically different from the computers we use today. It will be difficult for application developers to exploit the full power of these systems without guidance from performance measurement tools. Iâ€™m pleased to be collaborating with Intel and the DOE laboratories to define features of next generation processors for these systems,â€ť he said.

Cintia Listenbee is a Communications and Marketing Specialist in Computer ScienceÂ