All computing systems, from mobile devices to supercomputers, are becoming energy limited. This has motivated the adoption heterogeneous computing to significantly increase energy efficiency. A critical ingredient of pervasive utilization of heterogeneous computing is the ability to run applications on different compute engines without software redevelopment. Our C-FAR work in the past two years has focused on performance portability at two distinct levels. At a higher-level, Tangram supports expression of algorithm hierarchies that allows generic library code to be auto-configured to near-expert-code performance on each target hardware. At the lower-level, MxPA compiles existing OpenCL kernels with locality-centric work scheduling and code generation policies. In this talk, I will present the important dimensions of performance portability, key features of Tangram/MxPA, experimental results, and some comparisons with existing industry solutions.

The rise of rich media in mobile devices and massive analytics in data centers has created new opportunities and challenges for computer architects. On one hand, commercial hardware has been undergoing fast transformation to drastically increase the throughput of processing large amounts of data while keeping the power consumption in check. On the other hand, computer architecture has evolved too slowly to facilitate hardware innovations, software productivity, algorithm advancement and user perceived improvements. In this talk, I will present some major challenges facing the computer architecture research community and some recent advancements in throughput computing. I will argue that we must rethink the scope of computer architecture research as we seek to create growth paths for the computer systems industry.