Profiling OpenMP* applications with Intel® VTune™ Amplifier XE

This is a computer translation of the original content. It is provided for general information only and should not be relied upon as complete or accurate.

Parallelism delivers the performance High Performance Computing (HPC) requires. The parallelism runs across several layers: super scalar, vector instructions, threading and distributed memory with message passing. OpenMP* is a commonly used threading abstraction, especially in HPC. Many HPC applications are moving to a hybrid shared memory/distributed programming model where both OpenMP* and MPI* are used.

This paper focuses on the OpenMP parallel model, and particularly on profiling the performance of OpenMP-based applications. Intel supplies a powerful performance profiling tool: Intel® VTune™ Amplifier XE that is quite handy for finding performance bottlenecks in OpenMP codes. This article contains the steps to profile OpenMP applications and describes the common performance issues that can be discovered by Intel® VTune™ Amplifier XE.