If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Nope: You cannot distinguish between a weakly parallelizable algorithm and compiler performance. To get good data, you would also have to provide a GCC run without openmp: That would show the speedup due to OpenMP.

compiler optimization

(Sorry I didn't read too closely if it already does this, and didn't look in the source to check). Anyway one thought is "is this using gcc -march=native"? and another might be to use gcc's profile guided optimization for it. And/or clang's equivalent if it exists.
Cheers.
-roger-