I'm working on a quite complicated piece of code and trying to make it GPU-enabled. I've been already asking some questions about it. Right now I have a problem with a Not a Number results.
I have a loop that I want to compile and execute on the GPU:

After executing it on the GPU some elements in vvect array are NaN. They are not NaNs when the code is executed on the CPU.
The funny thing is that when I remove the copy() directive from code and leave only:

Code:

!$acc region do local(ijk,i,j,k)

The resulting array contains only zeros. It is weird because the compilator add the directive

Code:

Generating copy(vvect(:,igfy:igfyp1))

by its own, so there should not be any difference.

So, any ideas where the NaNs are comming from and why those two versions of directives gives different results?

I though about emulating the GPU and writing out all the variables in each iteration, but I understand that I can not emulate the GPU using PGI Accelerator model, right? If I could I would check all the variables that are used to compute vvect elements. So, are there other ways than moving from PGI Accelerator model to CUDA Fortran to check it?

We have a sentence in Poland: "Who asks do not wander". So, I've asked you and partially solved my problem by my own. ;)
Ok, so the NaNs are caused by rcsqf array which is used in calculation of vvect. This array is declared as below:

Code:

real(kind(zzz)), dimension(:), allocatable, save, target :: rcsqf

Others are declared similar but without the "target" directive. I assume there are some problems with pointers. How can I correctly copy the values of rcsqf array on the GPU?

I see this "$p" sign only in case of this array, which as the only one is defined as "target".
Adding rcsqf to the region copy directive does not change anything. Even the above copy message (doesn't change from copyin to copy).

When I copy the rcsqf values to another array on the GPU and then write out this temporal array i get something like:

Sending all the code would be difficult because I'm working on a program that belongs to someone else. I have source code of only one procedure and execute it by starting the main program with special parameters. I'm rather not allowed to send this code to anybody.

In the original program, a1 and p1 are the same memory locations. However, the accelerator compiler can't preserve the pointer / target relationship of the data that is copies to the GPU. So the compiler will allocate and copy data for a1 and for p1 separately. On the host, p1(i) would get the same value that was just stored by a1(i)=0.0; on the GPU, p1(i) would get uninitialized memory, because the GPU copy of p1 would be at a different place in memory.