Toon Knapen wrote:
> For instance, I wonder if any real-life application got a 50% boost by
> just changing the switch (and the corresponding MPI implementation). Or,
> what is exactly the speedup observed by switching from switch A to
> switch B on a real-life application?
I could not agree more. We should always keep in mind that a parallel
application mostly computes and, from time to time, send messages :-)
Interconnect people often lose track of this, and using micro-benchmarks
with no computation yields to some warped picture of the problems
(message rate).
Improving communication overhead and patterns is way down on the
optimization list, way after playing with various compiler flags or
understand how the cache works. It has usually a very low
improvement/investment ratio. It's not a surprise that most clusters out
there are using Gigabit Ethernet. In addition to be near-free, it is
enough for a lot of real world usages. However, this cease to be true
when large scaling is involved, the definition of "large" depends on the
application of course.
There are simple things that MPI applications should not do (like doing
a MPI_Barrier after each message), looking at an MPI trace would make it
obvious. This is where the improvement/investment ratio is the greatest
for communications.
Patrick
--
Patrick Geoffray
Myricom, Inc.
http://www.myri.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf