The final analysis and future plans

A week ago we finished our 5 days of intensive work optimising CP2K (and to a lesser extent GS2) for Xeon Phi processors. As discussed in previous blog posts (Day4, Day3, Day2, Day1), this was done in conjunction with research engineers from Colfax, and built on the previous year's work on these codes by EPCC staff through the Intel-funded IPCC project.

MPI and vectorisation: Two ends of the optimisation spectrum

Day four of this week of intensive work optimising codes for Xeon Phi saw a range of work. The majority of the effort focussed on the vectorisation performance of CP2K and GS2; looking at the low level details of the computationally-intensive parts of these codes and seeing whether the compiler is producing vectorised codes, and if not is there anything that can be done to make the code vectorise.

Moving from OpenMP to vectorisation and MPI

Reality hit home a bit on the third day of our intensive week working with Colfax to optimise codes for the Xeon Phi.

After further implementation and analysis work it appears that the removal of the allocation and deallocation calls from some of the low level routines in CP2K will improve the OpenMP performance on Xeon and Xeon Phi, but only because there is an issue with the Intel compiler that is causing poor performance. The optimisation can see a reduction in runtime of around 20-30% for the OpenMP code, but only with versions 15 and 16 of the Intel compiler, on v14 there is a much smaller performance improvement.