OpenMP® Forum

Discussion on the OpenMP specification run by the OpenMP ARB. OpenMP and the OpenMP logo are registered trademarks of the OpenMP Architecture Review Board in the United States and other countries. All rights reserved.

removing the assignment to big_array is causing the compiler to optimise away most of the code

I also agree on

I strongly suspect the lack of scaling of the code is due to memory bandwidth contention: the code is basically just repeatedly trawling through the coord array with no re-use.

but I don't have enough cores in order to verify the "extreme" case (which would be 10, and my maximum is 8). In your case, please take a look at the i5 model, since, for example, the i5-2540M has only 2 cores with HT (http://ark.intel.com/products/50072), and I didn't find an i5 with 4 cores (but there are a lot of i5 models running at 2.6GHz and I didn't look at all of them).

Thanks Mark and Fernando. I found the main problem of my code, which was in this line: dx=minval((/x2,xx2/))-maxval((/x1,xx1/)). If I change the fortran intrinsic functions into simple 'if-else' statements, the problem is gone. The 32-processor parallel runtime is only 6 seconds (move nn loop the most outer loop).

I guess OpenMP compliler does not like those intrinsic functions, instead it prefers simple basic statements when it comes to performance optimization.

The problem with using minval/maxval may not be the intrinsics as such: it might be that the compiler is repeatedly allocating/deallocating temporary storage for the likes of (/x1,xx1/), which could cause contention for lock somewhere low down.