On the agenda: • The foundations of OpenACC • About OpenACC’s merge with OpenMP • Code and performance portability vs accelerator specificities • The key features of next OpenACC specification release

Interview by Stéphane Bihan

Mr Poole, can you remind us of the foundation of OpenACC and its mission statement?

The foundation of OpenACC comes from some work we were doing in the OpenMP accelerator subcommittee in 2011. This work was largely based on the PGI accelerator compiler as well as some effort that was going on at Oak Ridge National Laboratory with Cray. As this work was being developed, we figured out that it would take a lot longer before the OpenMP standard would be available and there was also some concerns that the way the problem was being approached was not enough like OpenMP. As a result we needed to try to get some convergence at that time and not wait for OpenMP. We needed the developers to start using one model. CAPS had a compiler available since 2007 and PGI since 2008. We therefore formed a group as a non-profit organization whose mission was to unify these models and to produce a performance portable programming model. That’s why we called it OpenACC.

NVIDIA, another important player in the equation, was mainly focused on CUDA at that time. Why this shift towards programming directives for accelerators?

Directives are important because they require relatively little changes to parallelize codes and make them more maintainable. We simply needed to offer another way of using software with GPUs. There were libraries, there was CUDA and in the middle of these two, we just needed to have a simpler way of writing code.

As you may know OpenMP directives look like comments, in FORTRAN particularly. This means that if you compile your code with a compiler that does not understand those directives, it will just ignore them. This is a great benefit when you want to produce codes that can run in multiple places. OpenACC has exactly the same intention: preserve readability and portability but with features designed to directly access among other things two address spaces as the GPU has a distinct address space than the CPU. This is a very different model than OpenMP, notably regarding data movement and also because accelerators have distinct memory regions depending on the accelerator. So it’s important to think that even if you get to an accelerator, you program it a slightly different way, a more restricted way.

What still distinguishes OpenACC from OpenMP, especially now that they have their own extensions for accelerators?

OpenMP has two different ways of programming accelerators. There is an offload mode where once you get onto the device, you just use regular OpenMP. This represents a problem for devices that have multiple memory regions. Late in the development of OpenMP 4.0, NVIDIA contributed to the team directives that provide some support for running on any accelerators. This work is still on going and because virtually almost everyone in OpenACC is also a member of the OpenMP technical committee, OpenACC supports that effort as well. You see a lot of conversations happening twice. One, more performance-oriented, inside the OpenACC organization and another inside the OpenMP organization. With all that information in mind, what should we do? OpenACC has more advanced features that roughly correspond to a two-year lead and it takes more time to OpenMP to develop them. We continue working on some interesting features going forward by pushing the performance boundaries inside OpenACC especially for C++ codes. That is the area we are trying to explore right now.

Can users expect any possible merge between both programming models or, if not, how can OpenACC become more of a de facto standard?

OpenACC is a de facto standard, you can just see all the important sessions here at the GPU Technology Conference. There are talks that discuss the differences between the two programming models, how do you write performance code with OpenACC, some of the efforts that we are doing right now in C++ and deep copy… There is also a whole session track in climate modeling, how OpenACC has been used a lot in climate modeling recently.

So, no merge?

Well, there are several ways to answer that. One is that it does not have to be a merge because OpenMP is looking at what OpenACC is doing and then decide what they like and we are helping them. At another level, GCC is implementing OpenACC and this work is done in the same development branch as the OpenMP work is being done. So, there is a lot of code reuse between the two approaches and I think that we could see potentially a de facto merge coming out of these implementations.