OpenMP® Forum

Discussion on the OpenMP specification run by the OpenMP ARB. OpenMP and the OpenMP logo are registered trademarks of the OpenMP Architecture Review Board in the United States and other countries. All rights reserved.

An array with the ALLOCATABLE attribute must be in the allocated state.Each thread's copy of that array must be allocated with the same bounds.

This seems to imply that it's the programmers responsibility to allocate thememory for each thread's copy of an allocatable array, whereas lines p. 101,lines 22-24, state:

On entry to any parallel region, each thread’s copy of a variable that isaffected by a copyin clause for the parallel region will acquire theallocation, association, and definition status of the master thread’s copy...

which seems to imply that the OpenMP implementation is responsible.Could I have clarification regarding this? Thanks.

This points out some of the differences between Fortran ALLOCATABLE and Fortran POINTER. In many compilers, they are implemented using essentially the same mechanism (some kind of descriptor or "dope vector"), but the semantics are perhaps subtly different.

Page 101 states that if a variable has the POINTER attribute, the association status (associated or disassociated) will be copied from the master to each thread, and if associated, each thread's copy will be associated with the same target. This is probably clear.

The top of page 102 states that for any other variable (non-POINTER), each copy becomes defined with the value of the master thread's copy. For ALLOCATABLE, the question the committee struggled with is what to do if you have a master thread which has allocated an array of size 10, and you have three worker threads, where one thread has allocated the array of the 10, one has allocated it only of size 5, and a third has not allocated the array at all. Should the implementation reallocate all the arrays and copy the data? If the implementation does the allocation, at what point do they get deallocated? Or, if the programmer allocated the array to size 5, should the implementation only copy the first 5 elements?

In Fortran 90 and 95, ALLOCATABLE array allocation and deallocation is fully under control of the programmer. It would be wrong for a compiler to insert allocates, because there would be no place to put the matching deallocate. So, for OpenMP 3.0, the rules are as stated: the ALLOCATABLE arrays must be allocated by the programmer to the correct size.

When OpenMP moves to Fortran 2003, the rules change; assignment to an allocatable array in Fortran 2003 implies checking whether the array is allocated and is of the right size, and if not, reallocating the array. With the F2003 rules, OpenMP can change its copyin rules to allow the implicit allocate and matching implicit deallocation according to the language.

Thanks for the informative response. If I understand, I suppose this means that fora program that has an ALLOCATABLE array in a copyin clause to be conforming, eachthread must have allocated its threadprivate copy in a previous parallel region, andthat previous parallel region and the parallel region containing the copyin clause(and any parallel regions in between the two) must satisfy the constraints listed on p. 82, lines 12-15? e.g.:

If this is the case, a simple example in the next version of the OpenMP API spec illustrating howto use this new OpenMP 3.0 feature would be helpful (though perhaps unnecessary if that futureversion used the Fortran 2003 semantics?).

If not, please clarify, and thanks for bearing with me, as Fortran "isn't my native tongue"

See OpenMP 3.0 spec, 2.9.2, page 82, lines 9-18.The guarantee that you are looking at the same thread is there only forparallels not nested in another parallel, with nested parallels there is nosuch guarantee. Note that you use num_threads(2) on the first nested parallel,so even if the outer parallel is removed, the program would be guaranteed towork only if it decides to use just 2 threads (say with OMP_NUM_THREADS=2etc.).

Because of this, the solution in this thread wouldn`t be valid in a nested region, so COPYIN cannot be used in a nested parallel directive.

Is there any other solution to copy some allocatable threadprivates in a nested zone? It seems that it is not possible... Any ideas?

The reason I said that it was possibly a compiler/run-time problem is because there is one compiler that this works with - though it isn't gcc. The answer you received from the gcc folks is correct. The OpenMP V3.0 spec doesn't allow it. I should have been more specific with my first response. Sorry about that. There was a great deal of discussion during the Version 3.0 work on this issue and it was decided that it needed more investigation before implementations could move forward. The problem is associating the threadprivate variables with the correct thread when you are using a thread pool to distribute the work.

Since most compilers don't support threadprivate in nested parallel regions, I am afraid that the only solution is to do it yourself (if you really need it). Basically you have to set up the variables yourself and then only use them from the nested regions with the appropriate thread. Arrays would be the most natural since you could use the nesting level and thread number to access an element.