I'm measuring barrier synchronization performance on the v1.5.1 build of OpenMPI. I am currently trying to measure synchronization performance on a single node, with 5 processes. I'm getting pretty weak results as follows:

Testing procedure - initialize the timer at the start of the barrier, stop the timer when the process break from the barrier. Cycle through N number of times and calculate the average.

I am wondering if this is the expected performance on a single nodes. I presume Open MPI automatically uses Shared Memory for barrier synchronization on a single node which I think should be able to provide better performance when running on a single node. Is there a way to determine what transport layer I am using and I would greatly appreciate tips on how can I tune this performance.