In the past few years, general purpose microprocessors have increased in the number of available cores and threads available in a system. Application developers have typically optimized for a specific parallel architecture and core count or have remained in a sequential mode, and have not extracted the maximum performance capabilities of multiple intelligent cores.

This tutorial is aimed at researchers and commercial software developers to learn about techniques to extract the maximum performance and maintain forward scalability to future architectures with methods that are open and well established in the industry to exploit all levels of parallelism provided by modern HPC solutions. Attendees will also learn about techniques that have been used in real world scenarios to extract performance benefits, and also future techniques and technologies that are being productized and researched.

The tutorial starts with a brief introduction of multi/many-core and cluster systems, parallel programming models, and tools. We will dive into well established techniques to extract performance from the cores available in your system and also scale performance forward to future architectures. We will structure this approach covering the key three levels of parallelism available in today’s and future HPC system solutions: data-, thread- and cluster-level parallelism, e.g. Ct, Cilk, MPI. For each level we will introduce the key challenges and how to tackle them by adapting the program code and using sophisticated tools to tune for optimal performance.

Level of Tutorial
Introductory: 20%
Intermediate: 50%Advanced: 30%

PrerequisitesAttendees are expected to have a basic understanding of parallel HPC Systems – multi-core, clusters, etc. and also a basic understanding of implementing and developing high-performance parallel and cluster software.