OpenMP® Forum

Discussion on the OpenMP specification run by the OpenMP ARB. OpenMP and the OpenMP logo are registered trademarks of the OpenMP Architecture Review Board in the United States and other countries. All rights reserved.

ftinetti wrote:It's complex without actually seeing subroutine(K) code, is it a problem to post that subroutine (or, at least, the declarative part)? Are there saved data in the subroutine, for example? Are you checking the results do not change? Just guessing...

Indeed, it is very complicated to post the subroutine. It contains calls to other subroutines, some of them generate random numbers. So, on top of this, the results are not the same each time I run the program.

Obviously this makes debugging very difficult. This is why I tried the same OpenMP coding paradigm in another version of the program without random numbers but, necessarily, in another kind of loop. In that case I can reproduce the results that I have from the serial version. And to be precise, this happens only when the number of the threads is equal to 1, because otherwise there is a variable conflict. But this is a separate problem. I know its source but I don't have a solution. I wanted for starters to understand why the execution speed is not affected by the number of the threads.

Indeed, it is very complicated to post the subroutine. It contains calls to other subroutines, some of them generate random numbers. So, on top of this, the results are not the same each time I run the program.

I can't think of anything else without any code at hand... based on your description of 1 thread performance it seems there is some data problem (static, or local uninitialized, or...).

Many Fortran random number generating routines use variables with the SAVE attribute to preserve state between calls. In parallel these variables are shared, and cause race conditions which, in addition to being a bug, can cause very poor speedup due to contention for the cache lines where these variables are stored.

How convinced are you that the one thread parallel version is actually giving you the correct result?

Random number generators are often designed to give different results on different runs by using a seed based on reading a clock. For debugging purposes it can be useful to use ahard coded seed which ensures the same random sequence is generated on each run.

Thank you guys for the input you provided so far. There is obviously a number of issues but for the moment I would like to point out something very strange that I just noticed.

So, I commented the lines of the code containing anything OpenMP-related by adding in the beginning of each line the character "c", like we do in Fortran. Also, I removed from the makefile the -fopenmp flags. I recompiled and ran again the program. To my big surprise it ran with the speed of the parallel version! However there were no compilation errors and the binary was updated, so no way the old binary was used.

I really don't understand what's going on here. Is not "c" a valid comment character when we talk about OpenMP? Or is there some kind of memory lock in the system? Your lights please.

MarkB wrote:It looks like you have successfully disabled OpenMP! Perhaps the previous sequential time was a mistake or machine glitch: have you been able to reproduce it?

For the sequential mode I keep another version of the program which still runs much slower, as I reported previously. There are no real differences between the two (sequential and parallel), except the OpenMP loop and some subroutine splitting in order to be able to apply the OpenMP directives. The calculations are the same.

So, by putting everything OpenMP in comments should get me back into sequential execution times, yet the speed increase is still there! Disabling/activating the OpenMP lines has no effect whatsoever, my binary runs always in high speed mode. As if the system remembers something about it. Is it possible? I am totally baffled.

Regal wrote:So, by putting everything OpenMP in comments should get me back into sequential execution times, yet the speed increase is still there! Disabling/activating the OpenMP lines has no effect whatsoever, my binary runs always in high speed mode. As if the system remembers something about it. Is it possible? I am totally baffled.

I think the most likely explanation is that you have inadvertently changed something since you compiled the slow version.

Regal wrote:So, by putting everything OpenMP in comments should get me back into sequential execution times, yet the speed increase is still there! Disabling/activating the OpenMP lines has no effect whatsoever, my binary runs always in high speed mode. As if the system remembers something about it. Is it possible? I am totally baffled.

I think the most likely explanation is that you have inadvertently changed something since you compiled the slow version.

Talking about the difference between the "slow" and "fast" versions, it could certainly be as you say, I have to double and triple check that.

But how to explain the fact that the activation/deactivation of OpenMP lines has no effect on speed? It makes no sense because the execution time is in minutes, not milliseconds. This is why I thought about some caching system of OpenMP. Or should I explicitly destroy the old object files? The makefile is supposed to update them after each change in code source, no?