Previous Projects

This project aims to address auto-tuning’s two principal limitations: an interface ill-suited to the forthcoming ubiquitous hybrid SPMD programming model; and its scope limited to ﬁxed-function numerical routines. Read More »

Our work represents one component of a larger DOE X-Stack2 project (X-Tune) that represents a collaboration between the University of Utah, Lawrence Berkeley Lab, the University of Southern California, and Argonne National Lab. Building on the algorithmic and pathfinding work of the CACHE institute in conjunction with the CHiLL/ROSE auto-tuning framework, we at LBL are researching and developing tools that automatically implement code transformations that minimize vertical (i.e. from DRAM) data movement and aggregate horizontal (i.e. MPI) data movement. To that end, we are leveraging the CHiLL/ROSE compiler to automatically transform and autotune numerical methods including Multigrid, the Spectral Element Method, and block eigensolvers like LOBPCG. Read More »

The CACHE Institute is focused on Communication Avoiding and Communication Hiding at Extreme Scales. The project is a collaboration between researchers at Lawrence Berkeley National Lab (LBNL), Argonne National Lab (ANL), the University of California at Berkeley (UCB), and Colorado State Univeristy (CSU). Read More »

Researchers of the Performance and Algorithms Research group are heavily involved with Researchers from the Computer Architecture Group and the Center for Computational Science and Engineering on Co-Designing algorithms, implementation, and architecture to maximize performance and energy efficiency in the context of combustion simulations. Read More »

miniGMG is a compact benchmark for understanding the performance challenges associated with geometric multigrid solvers found in applications built from AMR MG frameworks like CHOMBO or BoxLib when running on modern multi- and manycore-based supercomputers. It includes both productive reference examples as well as highly-optimized implementations for CPUs and GPUs. It is sufficiently general that it has been used to evaluate a broad range of research topics including PGAS programming… Read More »

The starting point of our Application Performance Characterization project (Apex) is the assumption that each application or algorithm can be characterized by several major performance factors that are specific to the application and independent of the computer architecture. A synthetic benchmark then combines these factors together to simulate the application's behavior. Thus, the performance of the benchmark should be closely related to that of the corresponding application. Such… Read More »

This work evaluates existing and emerging large-scale HEC architectures using a set of in-depth studies from full applications. The novel aspect of this research is the emphasis on full applica­tions, run with real input data and at the scale desired by application scientists in the domain. These problems are much more complicated than in traditional benchmarking suites such as the NAS Parallel Benchmarks or the LINPACK benchmark, and therefore reveal the kinds of performance issues… Read More »

Jonathan Carter (PI)
For the past 15 years, CPU performance has improved at an exponential pace &emdash; doubling approximately every 18 months with remarkable consistency. In order to maintain performance improvements within the conservative power envelope allowed by practical system design, the historical trend of increasing clock rates at an exponential pace has given way to a chip-scale multiprocessor (CMP) design strategy where the performance of individual CPU cores stays… Read More »