OpenMP is an application programming interface for high-performance computing systems that enables compute nodes to work on multiple components of a larger problem simultaneously. It is widely used in programming parallel scientific applications and is often combined with the Message Passing Interface (MPI), a standard that enables different computing processes to communicate with one another. The annual workshop, which took place on September 27-28 in Barcelona, promotes and advances parallel programming with OpenMP and offers a forum for discussing related issues, trends, and research.

In parallel computing, large, complex calculations are divided into many small steps. Using OpenMP, researchers can organize complicated parallel programs into so-called tasks — self-contained, user-defined work packages managed by a scheduler. The user can influence task scheduling using a directive called taskyield, which prompts the scheduler to interrupt the execution of one task in favor of another. This can help to minimize MPI communication lags and achieve higher levels of parallelism. For this reason, taskyield is an important factor in the interaction between OpenMP and MPI.

However, OpenMP does not precisely define the expected behavior of taskyield. It specifies that a current task may be suspended and replaced by a different one but does not provide any guarantees. Hence, some implementations simply ignore the taskyield directive and continue executing a current task, even if it does not enable the most efficient workflow and might lead to incorrect program executions (called deadlocks). In addition, OpenMP does not provide a way to detect what taskyield is doing in different OpenMP implementations, making it difficult for users to estimate the extent of fine-grained communication tasks and their potential effects on performance.

To tackle this problem, Schuchart et al. examined advantages and disadvantages of several potential implementations of taskyield. Specifically, they explored the impact of these implementations on the task design of the Blocked Cholesky Factorization, a component of a larger application (also called an application kernel) that requires frequent data exchange between processes. The investigators specifically compared the correctness and performance of these multiple implementations. As the paper is intended for developers and users of OpenMP, Schuchart and his colleagues presented a black-box test that can detect which variant of taskyield is used in different OpenMP compilers and runtimes.

“Our motivation is to help developers and users by informing them about the weak spots of the taskyield feature and providing a black-box tool that offers insights into the inner workings of OpenMP implementations,” says Schuchart. In the future he and his colleagues intend to explore other types of applications and add to the research on OpenMP tasks.

This work has been a successful cooperation between HLRS and the Riken Advanced Institute for Computational Science (AICS) in Kobe, Japan.