If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Comment

But I don't think modules will land this decade, if ever. Because adding a fifth parameter-passing method is more important than, say, not crashing whenever you throw an exception from a shared library. Who cares about modules or encapsulation or terrible copy-paste coding (sorry, "textual inclusion"), now you can have rvalue-references and a dozen standard optional random number generators!

Comment

having written high performance massively parallel software my experience with openMP is quite unimpressive. Gains are typically quite sub linear, more log-like. It's okay as a stop gap for trying to upgrade old non thread safe software. IMHO new vector unit technology can be dramatically faster. In my experience there are 2 levels of parallelization: instruction level. best left to vectorization techniques, and task level parallelism which is an issue of software design and not a coding problem.

Comment

having written high performance massively parallel software my experience with openMP is quite unimpressive. Gains are typically quite sub linear, more log-like. It's okay as a stop gap for trying to upgrade old non thread safe software. IMHO new vector unit technology can be dramatically faster. In my experience there are 2 levels of parallelization: instruction level. best left to vectorization techniques, and task level parallelism which is an issue of software design and not a coding problem.

This depends very much on your algorithm. Most importantly how large the thread copying overhead is compared to the actualy workload of each thread and how much of your algorithm needs to be inside critical sections.

If you apply it to the outermost loops, with each loop iteration doing lots and lots of work with minimal critical sections, gains can be very close to linear, in my experience.

OpenMP is a wonderful, wonderful way to make software multi-threaded without major rewrites.

Comment

This depends very much on your algorithm. Most importantly how large the thread copying overhead is compared to the actualy workload of each thread and how much of your algorithm needs to be inside critical sections.

If you apply it to the outermost loops, with each loop iteration doing lots and lots of work with minimal critical sections, gains can be very close to linear, in my experience.

OpenMP is a wonderful, wonderful way to make software multi-threaded without major rewrites.

Agreed. We use it for stochastic simulations, where each path numerically solves a set of stochastic partial differential equations with different noises. The processes only need to communicate infrequently to average results. Speed up is basically linear up to at least 12 cores (unless you become limited by memory bandwidth issues).