Threaded MPI for the Epiphany Architecture

Brown Deer Technology (BDT) is excited to announce that we have demonstrated the use of MPI for programming the Epiphany architecture. Programming is easy and the performance has been incredible. In this post I will share an overview of what is being developed and highlight an example of its application to a 2D FFT.

MPI, the overlooked API for Epiphany. The use of MPI for programming Epiphany was suggested and demonstrated by BDT with a simple proof-of-concept example in 2013. The example received little interest or attention at the time. In retrospect it is surprising since MPI is a perfect match for the architecture. What is different today is that we have gone beyond a proof-of-concept example applied to a simple algorithm (trivially parallel calculation of Pi) and demonstrated the use of a real library (libcoprthr_mpi) to achieve performance on non-trivial examples.

Why is MPI the right programming model for Epiphany? It’s the architecture. When looking at the Epiphany architecture, similarities to a parallel distributed cluster with a 2D network topology are glaring. The fully divergent RISC cores play the role of conventional processors, and the 2D Network-on-Chip plays the role of the inter-node network. If there are similarities, the differences are also striking. Limited resources make it difficult if not impossible for a core to run a full process image like conventional MPI, and Epiphany is integrated as a coprocessor on the Parallela platform, controlled from the ARM host processor and requiring offload semantics. These challenges have been addressed with the design of a threaded MPI implementation based on the COPRTHR API for Epiphany. What makes the API so perfectly suited for Epiphany is that performance on this architecture requires the careful orchestration of inter-core data movement to maximize data re-use.

MPI wins again. Attempts to replace MPI have been a theme in industry and academia for years. Presently the excitement in the industry is usually reserved for the new wave of GPU and accelerator APIs. Yet time and time again, MPI seems to win out for many parallel platforms. For Epiphany, MPI wins again. Given the pervasive expertise with MPI in the industry, developed over decades, this should be welcome news for developers.

Proof is in the performance. We have tested threaded MPI on several common algorithms requiring inter-core communication, using the production 16-core Epiphany III as well as the 64-core Epiphany IV. The example discussed below – a 2D FFT – not only shows how easy it is to program Epiphany with MPI, but also that the performance attainable with this parallel programming model is excellent. The MPI 2D FFT example discussed here achieves on-chip performance of over 1600 transforms per second for a 128×128 complex FFT on the 16-core Epiphany III coprocessor. This algorithm is known to be a challenge due to significant data movement and transposition, and this level of performance, especially in the context of Epiphany’s power-efficiency, is extremely good and rivals many, more prominent chips in the industry.

A 2D FFT with MPI. The source code for the 2D FFT implemented with threaded MPI can be found in the Parallella examples on github. Rather than execute the conventional MPI command mpiexec at the command line, an equivalent host program call is executed to offload the parallel work to threads executing on the Epiphany coprocessor. Existing COPRTHR API calls are used for device and memory management. Within the thread function, standard MPI semantics and syntax are used to implement the 2D FFT following a conventional parallel algorithm one would use on a parallel distributed cluster.

Availability. The COPRTHR MPI library is being actively developed as part of a larger initiative for COPRTHR-2. However, we are making available a limited preview that will allow anyone to try out the 2D FFT example. With a few modifications, the library was made to work using the existing COPRTHR-1.6.1 release, even though the future library will be part of a new software package. This preview library can be downloaded from the BDT website and is free to use under the terms and conditions provided.

Programming the Epiphany coprocessor – it’s going to get a lot simpler very soon. The new COPRTHR-2 software with threaded MPI support is under active development, and we hope to provide a beta release soon.