X-Tune: Auto-tuning for Exascale

Automatic Performance Tuning (or Auto-tuning) has emerged as an effective means of providing performance portability from one architecture to the next. Rather than hoping a compiler can deliver optimal performance on ever more novel multicore architectures, or worse manually hand tune, auto-tuned kernels and applications can tune themselves on the target CPU, network, and programming model.

Our work represents one component of a larger DOE X-Stack2 project (X-Tune) that represents a collaboration between the University of Utah, Lawrence Berkeley Lab, the University of Southern California, and Argonne National Lab. Building on the algorithmic and pathfinding work of the CACHE institute in conjunction with the CHiLL/ROSE auto-tuning framework, we at LBL are researching and developing tools that automatically implement code transformations that minimize vertical (i.e. from DRAM) data movement and aggregate horizontal (i.e. MPI) data movement. To that end, we are leveraging the CHiLL/ROSE compiler to automatically transform and autotune numerical methods including Multigrid, the Spectral Element Method, and block eigensolvers like LOBPCG.

Software

HPGMG-FV (a scalable compact benchmark developed under the ExaCT project for understanding the challenges of Geometric Multigrid on petascale and exascale systems built from multicore processors and manycore accelerators). X-Tune leverages this code for compiler research.

miniGMG (A compact geometric multigrid benchmark developed under the CACHE project for optimization, architecture, and algorithmic research at small scale) X-Tune leverages this code for compiler research.