Recent Articles

The decade-old Open MPI project, beloved of the HPC community, has shipped code for a major feature release that brings it close to a complete MPI 3 implementation.

Speaking to The Register, Cisco's Jeff Squyres (who along with Richard Graham, then of Los Alamos and now with Mellanox, was one of the instigators of the Open MPI project ten years ago) said that perhaps one of the most important characteristics of the new release is it wraps up much-needed FORTRAN support in the interface.

While that will probably raise giggles among the cool kids, FORTRAN is not only alive and well in the HPC community, it remains one of the key languages in the world of big number crunchers.

“This is the first set of MPI-FORTRAN bindings that are actually compliant with the language,” Squyers said.

MPI, he explained offers handles to languages and compilers that connect them to the underlying MPI objects – but “all of those handles used to be simple integers. With this release, the handles are now unique types.” With those types FORTRAN-compliant, he said, the user – the scientist writing software for a particular function – gets better assistance from the FORTRAN compiler.

“There are also strict prototypes for all the MPI function engines”, Squyres said, which again means users get better assistance in fixing errors.

The 1.7.4 Open MPI release also comes close to a full implementation of the MPI 3 standard, he said, with only one major item – support for one-way operations – missing.

A big item in the new release is support for “non-blocking collectives” in the communications model. While there are a host of communication types beyond simple point-to-point, Squyers chose non-blocking broadcasts as a good example of what this means for the HPC user.

“It can be really useful to ask MPI to start a broadcast, while the compute thread goes away to do computations because that's what it does, and come back later to see if the broadcast is finished.”

For example, he said, a much-used operation in fluid dynamics is an all-reduce operation: a vector of numbers is distributed across thousands of cores, and the user wants the calculations on each core to be tested against each other.

A non-blocking blocking broadcast means you can start an iteration of the calculation, send a broadcast to collect the result – and instead of waiting for responses to the broadcast, go ahead with the next iteration.

“I can overlap iteration and computation, without waiting for the communication,” Squyres said.

A new binding policy that makes “extensive” use of process affinity by default, so that a process can be forced to remain on a given core; and

Better CUDA support, courtesy of contributions from Nvidia.

What is Open MPI anyway?

So why would there be a fuss about a feature release of a message-passing standard? Haven't we been doing this, practically forever?

The Message Passing Interface, MPI – the standard that underlies both the open source efforts and proprietary implementations – dates back to the 1990s, and its longevity makes it the lingua franca of the HPC community.

Supercomputing, Squyers explained to The Register, has requirements that were never fulfilled by commercial middleware busses, nor by the world of TCP sockets.

In particular, HPC needs extremely low latency.

A very well-tuned TCP implementation might be able to get pings of 10 microseconds between a couple of servers in the same data centre, but the supercomputer user wants latency closer to a tenth of that. That's one of the aims of MPI, and both its proprietary and open source implementations.

The other, Squyers said, is that the middleware should present communications to the users as simply as possible.

“It's a very high-level network abstraction,” he explained. “The person writing the application is a chemical engineer, or a physicist or a biologist. They want to crunch some numbers, and they understand their own algorithms and math, but they don't understand the network.

“They want to tell their software 'take this gigabyte of double-precision numbers and send it to number 38'. Under MPI, a miracle occurs: the data gets sent, but the user doesn't have to know whether it's on a core a couple of racks away, or on the same server sent via shared memory.”

The MPI standard first got turned into Open MPI by Squyers and Graham in a project that began in 2004 and shipped its first code in 2005.

While there are alternatives to MPI – SGI's SHMEM (symmetric hierarchical memory access) which has an open implementation, and PGAS (partitioned global address space), for example – Squyers believes that MPI underlies as much as 90-95 per cent of HPC code out there.

The penetration of the Open MPI implementation is harder to assess, since it's included in Linux and some OpenBSD distributions. ®