Article Index

6: Blaming MPI for Programmer Errors

A natural tendency when an application breaks is to blame the MPI
implementation, particularly when your application "works" with one
MPI implementation and (for example) seg faults in another. While no
MPI implementation is perfect, they do typically go through heavy
testing before release. It is quite possible (and likely) that your
application actually has a latent bug that is simply not tripped on
some architectures / MPI implementations.

This sounds arrogant (especially coming from an MPI implementer), but
the vast majority of "bug reports" that we receive are actually due to
errors in the user's application (and sometimes they are very subtle
errors). For example, some compilers initialize variables to default
values (such as zero). Others do not. If your code accidentally
depends on a variable having a default value, it may work fine under
some platforms / compilers, yet cause errors on others.

Before submitting a bug report to the maintainers, double and triple
check your application. Use a memory-checking debugger, such as the
Linux Valgrind package, the Solaris bcheck command-line
checker, or the Purify system. All of these debuggers will report on
the memory usage in your application, including buffer overflows,
reading from uninitialized memory, and so on. You'd be surprised what
will turn up in your application.

{mosgoogle right}

Where to Go From Here?

So what did we learn here?

Ensure your environment is setup correctly. You only need
to do this once.

If anything, realize that you are not alone if you run into MPI
problems. The problems discussed in this column are all relatively
easy to fix. So even if you can't get your MPI application to run -
don't despair. The solution is probably just a few Google searches or
a system administrator away.

Stay tuned - next column, we'll continue the list with my Top 5, All
Time Favorite Evils to Avoid in Parallel.

This article was originally published in ClusterWorld Magazine. It
has been updated and formatted for the web. If you want to read more
about HPC clusters and Linux, you may wish to visit
Linux Magazine.

Jeff Squyres is the Assistant Director for High Performance Computing
for the Open Systems Laboratory at Indiana University and is the one
of the lead technical architects of the Open MPI project.