I have the following code, which repeats a large number of calculations for every element in vector y and returns the results in vector z. The main program makes numerous calls to this subroutine, which compiles without error and appears to execute without a problem.

Within the accelerator region, the only difference between this and the first version of the code is the addition of the variables ilo and ihi to divide the workload among the available devices. I've checked omp_get_thread_num(), ilo, and ihi prior to entering the accelerator region. All are returning the expected values. This code compiles fine and appears to execute fine the first time it is called, but when called a second time it fails and returns the following message:

It might be a problem calling acc_set_device_num() more than once. Try putting that in a conditional so it only happens once.

Brent, thanks so much for your help. I assumed that acc_set_device_num() could be called anytime outside an accelerator region, so that processing could be redirected at any point and as often as needed. As you've suggested, however, that is not the case. I convinced myself of this by inserting a call to acc_shutdown() just before ending the omp thread. The program then runs without the error, but so slowly that I'd be better off confining all work to a single device. As currently written, the program can't work as intended if I place the call to acc_set_device_num() in a conditional, as you suggested. Guess I'll need to revise the flow of work in the main program.

Maybe it's only me, but this seems to be a real limitation. Can anyone from PGI comment on the chances of improving on this in future revisions?