> Dear OpenMPI users
>
> I am dealing with an arithmetic problem. In fact, I have two variants
> of my code: one in single precision, one in double precision. When I
> compare the two executable built with MPICH, one can observed an
> expected difference of performance: 115.7-sec in single precision
> against 178.68-sec in double precision (+54%).
>
> The thing is, when I use OpenMPI, the difference is really bigger:
> 238.5-sec in single precision against 403.19-sec double precision (+69%).
>
> Our experiences have already shown OpenMPI is less efficient than
> MPICH on Ethernet with a small number of processes. This explain the
> differences between the first set of results with MPICH and the second
> set with OpenMPI. (But if someone have more information about that or
> even a solution, I am of course interested.)
> But, using OpenMPI increases the difference between the two
> arithmetic. Is it the accentuation of the OpenMPI+Ethernet loss of
> performance, is it another issue into OpenMPI or is there any option a
> can use?

It is also unusual that the performance difference between MPICH and
OMPI is so large. You say that OMPI is slower than MPICH even at small
process counts. Can you confirm that this is because MPI calls are
slower? Some of the biggest performance differences I've seen between
MPI implementations had nothing to do with the performance of MPI calls
at all. It had to do with process binding or other factors that
impacted the computational (non-MPI) performance of the code. The
performance of MPI calls was basically irrelevant.

In this particular case, I'm not convinced since neither OMPI nor MPICH
binds processes by default.

Still, can you do some basic performance profiling to confirm what
aspect of your application is consuming so much time? Is it a
particular MPI call? If your application is spending almost all of its
time in MPI calls, do you have some way of judging whether the faster
performance is acceptable? That is, is 238 secs acceptable and 403 secs
slow? Or, are both timings unacceptable -- e.g., the code "should" be
running in about 30 secs.