OpenMP® Forum

Discussion on the OpenMP specification run by the OpenMP ARB. OpenMP and the OpenMP logo are registered trademarks of the OpenMP Architecture Review Board in the United States and other countries. All rights reserved.

2.7 says "Note – When storage is shared by an explicit task region, it is the programmer's responsibility to ensure, by adding proper synchronization, that the storage does not reach the end of its lifetime before the explicit task region completes its execution."When exactly are supposed all explicit tasks tied to the current thread waited for?Say is:

valid? I.e. is the point where all explicit tasks tied to the current thread after the last instruction in the implicit task, or later (e.g. after all reduction var merging, running destructors, etc.)?Mandating it before might penalize even OpenMP 2.5 code or code that never uses tasks - as tasks can be created in other functions called from implicit task, not necessarily the implicit task itself,this would mean the compiler needs to add before reduction merging code, lastprivate copying and running destructors probably a function call which would wait for all pending tasks. WithOpenMP 2.5 that wasn't necessary. Or is the above undefined and the above mentioned note applies in this case (and so the commented out taskwait is needed)?

There is one case where you might need a barrier (related to Jakub's previous question). It is unspecified whether the implicit barrier at the end of the parallel region is executed before the block goes out of scope. So with:

i is allowed to go out of scope before the tasks, possibly executed in the barrier at the end of the region, are completed. So if you want to ensure that i remains in scope you would need an explicit barrier before the end of the parallel region. Again, this is a programmer requirement, not an implementation requirement.

I still am not sure I understand when exactly variables go out of the scope. For local variables declared inside of parallel block it is expected that they get out of scope before the implicit barrier. So

is invalid, because by the time the task is executed i might be already out of scope in the implicit task. Now, are variables in private/firstprivate clauses on the parallel going out of the scopebefore or after the implicit barrier?

If they are going out of the scope before the implicit barrier, then what makes reduction special? If they are still in the scope at the implicit barrier, what if their destructors have#pragma omp task ? Then tasks would need to be scheduled after the implicit barrier (of course they could be then forced to be if (0) task).

Another thing are task firstprivate variables with constructors. While for POD firstprivate vars the implementation can copy the variables into some buffer and defer creation of the task's stacktill it is actually scheduled to be run, I doubt the implementation would be allowed to construct vars with constructors in a temporary buffer - that would mean one extra pair of user visible copy-ctor/dtor calls per such firstprivate variable. But if it is not untied task and would create the task stack right away, switch to the new task context temporarily to run all the copy-ctors and then switch back and let the new task wait until it is actually scheduled to be run, the user could observe a tied task executed by two different threads (e.g. if the copy-ctors call omp_get_thread_num () and so does the body of the task). Is that ok?

Ok, with the help of some folks on the OpenMP language committee that actually read the specification instead of just editing it, here are the answers:

First, it is unspecified when the variables in a reduction clause go out of scope. This is part of the reason for the user-supplied synchronization requirement.

Second, the reduction example is non-conforming. See Section 2.9.3.6. (p. 95, line 26 in the public comment draft) in the Restrictions section for reduction. You can't access the reduction variable in an explicit task. So that's what makes reductions special.

For the third item, the scope of the private and firstprivate variables is the task, so any destructors are called when the task is finished. You are correct that the implementation cannot introduce any extra ctor/dtor calls for privatization. However, the specification does not constrain the points at which construction occurs (except of course they must occur before the variables are used). I would expect most implementations to construct the private variables when the task is encountered and place them in a data area that implements the data environment for the task. So it would be perfectly reasonable for the thread that executes the constructor be different from the thread that executes the task.