On Fri, Nov 6, 2009 at 5:43 PM, amjad ali <amjad11 at gmail.com> wrote:
> Hi all,
>>> Suppose that the grid/mesh is decomposed for n number of processors, such
> that each processors has a number of elements that share their side/face
> with different processors. What I do is that I start non blocking MPI
> communication at the partition boundary faces (faces shared between any two
> processors) , and then start computing values on the internal/non-shared
> faces. When I complete this computation, I put WAITALL to ensure MPI
> communication completion. Then I do computation on the partition boundary
> faces (shared-ones). This way I try to hide the communication behind
> computation. Is it correct?
>>There are two issues here. First, correctness. The data for messages that
arrive while you are computing may be written into memory asynchronously
with respect to your program. Be sure that you are not depending on values
in memory that may be overwritten by data arriving from other
ranks. Second, overlap is good, but whether you actually get any overlap
depends on the details.
For example, the work of communicating with other ranks and sending messages
and so forth must be done by something. For ethernet, there will be a lot of
work done by the OS kernel and in general by some core on each node. If you
expect to be using all the cores in a node to run your program, who is left
to do the communications work? Some implementations will timeshare the
processors, giving the appearance of overlap, but not actually running
faster, while other implementations simply won't do any work until the
WAITALL that demands progress. If you have multicore nodes, and you don't
need every last core to run your program, it can help if you only allocate
some of the cores on each node to your program, leaving some "idle" to run
the OS and the communications. The job control system should have a way to
do this.
You can test to find out if you are getting any overlap, by artificially
reducing the actual communications work to near zero and seeing if the
program runs any faster.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20091107/f41bd6e5/attachment.html>