IBM has ended development of Blue Gene, so I would not recommend that any developers expend any significant effort on optimising codes for Blue Gene execution.

IBM is now fully committed to future development of its Power8 processor through the new OpenPOWER consortium. NVIDIA is part of the consortium, and NVIDIA and IBM have announced plans to work closely together on future hardware integration. This had led to the recent decision by the US DoE to purchase two massive supercomputers for delivery in 2017 (see link above).

The performance of the Power8 looks very good. Each single-socket DCM CPU (dual-chip module -- essentialy one CPU formed by "glueing" together two chips, like AMD's Interlagos CPU) has up to 12 cores, with a huge 8MB of L3 cache per core, and an impressive memory bandwidth of 192 GB/s (128GB/s read and 64 GB/s write -- see HotChips presentation above). With the cores running at 4GHz, and each core capable of 4 double precision FMA operations per cycle, the peak performance is 384 GFlops per chip. This is not as high as a top-of-the-line x86 CPU, or a GPU, but with the huge cache and memory bandwidth it should achieve a higher percentage of peak on many real world applications. Together with tight integration with NVIDIA GPUs in the future, this is a chip with significant potential.

We initially bought an IBM Power S822L system, and then upgraded to a S824L system with 2 NVIDIA K40 GPUs. In this process we have learned various things which will hopefully be helpful for others thinking of buying a Power8 system:

historically, IBM systems have been "big-endian" -- if that term means nothing to you then read this Wikipedia article

Power8 systems are designed for virtualisation and can support both big-endian and little-endian flavours of various Linux distributions, but you need to purchase the correct underlying hypervisor when you buy your system

if you want to use GPUs in your Power8 system then you need to use a little-endian Linux for compatibility -- this may also be true for other third-party PCIe cards

the S824L system which is designed to house two K40 GPUs runs a "bare bones" (i.e. no hypervisor) little-endian Ubuntu OS -- this is my personal recommendation for anyone interested in HPC

because IBM has been big-endian historically, the compilers are very mature for big-endian systems, but less mature for little-endian systems

to obtain the maximum memory bandwith, it is very important that the memory slots are fully-populated -- this probably means using low-density DIMMs, and accepting that there is then no growth path for additional memory, other than swapping out low-density DIMMs for high-density DIMMs

our system is available for benchmarking/testing by UK researchers who are members of a CCP or HEC; there may also be benchmarking systems available at Daresbury's Hartree centre

we are doing some benchmarking and will make the results available on this website