OpenMP® Forum

Discussion on the OpenMP specification run by the OpenMP ARB. OpenMP and the OpenMP logo are registered trademarks of the OpenMP Architecture Review Board in the United States and other countries. All rights reserved.

Dear forum, I have been trying to improve the threading performance on a nested loop, two versions given below, one using forall + OMP WORKSHARE and one usingdo + OMP DO. I get a factor of 1.7 speed up from 1 to 4 processors using the DO (which is dissappointing), and no speed up at all using the forall. What am I doing wrong? The first index of the arrays is n/2+1.

Workshare is very loose on assignment of work to threads. DO gives a lot more control. Further, not all compilers implement workshare and may not do a great job of implementing it. For the particular compiler you are using, I would have to do more investigation on the exact strategy that the compiler uses. As for the dissapointing speedup with DO loops, that may have to do with other factors like memory bandwidth and parallel overhead, and would also require more investigation.