OpenMP® Forum

Discussion on the OpenMP specification run by the OpenMP ARB. OpenMP and the OpenMP logo are registered trademarks of the OpenMP Architecture Review Board in the United States and other countries. All rights reserved.

I have some questions and comments about the teams construct introduced in RC2.

It looks like the "omp teams" construct is trying to divide up the threads on accelerators into different teams. And "omp distribute" is used to fine tune the scheduling of loop iterations among These two constructs are the major additions to the accelerator support compared to the previous technical report (TR1).

Question 1:---------------Will the following code example roughly based on TR1 still be valid in RC2?

Question 2:---------------Can you provide a few examples in the Appendix A to show case how users can use "omp target", "omp teams" , and "omp distribute" together to port a loop to GPU?

From a user's point of view, it is best to have an easy migration path from traditional OpenMP code to accelerator code .My fear is that with "omp teams" and "omp distribute", users have to replace their favorite "omp for" with something totally alien.

Comment 1:---------------I feel like it is too early to introduce thread teams on accelerators when the support for a single team is still being defined. Could you share what the urgent needs are to define multiple teams on accelerators?

My understanding is that, the threads on host (CPUs) are mostly managed as a single team, though nested teams are allowed. Thread subteams on CPUs have been discussed for quite a while but are still not formally introduced.

Comment 2:--------------------For accelerator support, I think the number one issue is the lack of a clear canonical architecture model. Part of the success of OpenMP so far relies on its simple, generic SMP architecture model: identical cores/processors attached to a single shared memory. Users and compiler developers can immediately get the big picture and work together.

What is the generic architecture for accelerators? Without a generic architecture for accelerators in mind, it is very hard for users to program and compiler developers to implement the accelerator directives.

It is interesting that the OpenMP specification has sections about execution model and memory model. But the assumptions about the targeted hardware architectures are not really articulated. It may not be necessary when OpenMP tries to support the simple SMP machines. But it may become necessary when dealing with complex types of accelerators.

This code is legal and will work as expected on a Intel PHI, however on a GPU the implementation will likely confine the parallel to a single block, ThreadBlock in Nvidia terms. This means that the code may be leaving significant performance opportunities behind.

Leo wrote:Question 2:---------------Can you provide a few examples in the Appendix A to show case how users can use "omp target", "omp teams" , and "omp distribute" together to port a loop to GPU?

Yes, we are working on examples.

Leo wrote:From a user's point of view, it is best to have an easy migration path from traditional OpenMP code to accelerator code .My fear is that with "omp teams" and "omp distribute", users have to replace their favorite "omp for" with something totally alien. [\quote]Only users who want to exploit the power and performance of GPU like devices will have to use the teams and distribute construct. However, even host codes could likely benefit from these new constructs if the implementation properly exploits the extra knowledge the provide.

Leo wrote:Comment 1:---------------I feel like it is too early to introduce thread teams on accelerators when the support for a single team is still being defined. Could you share what the urgent needs are to define multiple teams on accelerators?

My understanding is that, the threads on host (CPUs) are mostly managed as a single team, though nested teams are allowed. Thread subteams on CPUs have been discussed for quite a while but are still not formally introduced.

See my response to question 1 as to why we felt we needed to add contention groups and thread groups. With out teams OpenMP would be ignoring the current leading group of accelerators, or devices, GPUs. The teams construct does not create subteams it creates "new" teams. Think of a team as an entirely new OpenMP instance everything gets reset and the team works completely independently from the spawning thread and its team.

Leo wrote:Comment 2:--------------------For accelerator support, I think the number one issue is the lack of a clear canonical architecture model. Part of the success of OpenMP so far relies on its simple, generic SMP architecture model: identical cores/processors attached to a single shared memory. Users and compiler developers can immediately get the big picture and work together.

What is the generic architecture for accelerators? Without a generic architecture for accelerators in mind, it is very hard for users to program and compiler developers to implement the accelerator directives.

It is interesting that the OpenMP specification has sections about execution model and memory model. But the assumptions about the targeted hardware architectures are not really articulated. It may not be necessary when OpenMP tries to support the simple SMP machines. But it may become necessary when dealing with complex types of accelerators.

OpenMP continues to assume a shared memory model. The only thing that the target construct does is allow a tunnel from one device type to another device type; here device type can be thought of as hardware type.