We model, simulate and visualise the dynamics of particles and radiation phenomena that are of interest when investigating the physics of laser particle acceleration and develop massively parallel computing schemes.

The Particle-in-Cell algorithm is a central tool in plasma physics. It describes the dynamics of a plasma by computing the motion of electrons and ions in the plasma based on Maxwell's equations.

How does the Particle-in-Cell Algorithm work?

The PIC algorithm solves the so-called Maxwell-Vlasov equation. To solve this equation, electric and magnetic fields are interpolated on a physical grid dividing the simulated volume into cells.

Charged particles like electrons and ions are modeled by macro-particles. These can describe the motion of up to several hundred particles by the motion of a single spread-out particle distribution. The macro-particles' motion is influenced by the electric and magnetic fields on the grid.

The particle motion in turn creates currents. Following Ampère's law these currents create magnetic fields. These magnetic fields in turn create electric fields as described by Faraday's law.

These new fields then act back on the particles.

What is so new about PIConGPU?

GPUs show very high computational performance, because many processors work in parallel. In order to make the most out of this performance, the processors should work independently of each other. In case of the PIC-Algorithm this is hard to achieve, since in the current deposition step, currents which are fixed to the cells have to be computed from the velocity of particles moving freely between grid cells. This motion leads to memory access patterns in which current data and particle data are located at different places in the memory and parallel processes can disturb each others execution when accessing the same part of the memory.

Recently, this problem was solved in our group using a new data model for particle and grid-based data and asynchronous data transfer [3,4].

All that with a single GPU?

No, because GPUs do not have enough memory to simulate large physical systems. This makes it necesarry to use more than one GPU and distribute the simulated volume between the GPUs.

The problem with this approach is the data transfer between GPUs. For this, data has to be transferred from the GPU to the main memory of the computer housing the GPU. The data has then to be sent via network to the other computers housing the other GPUs

This process normally takes a long time and GPUs have to wait for the end of the data transfer before continuing their computations.

We were able to solve this problem by interleaving the data transfer between GPUs and the computation on a single GPU, so that the GPUs can execute the algorithmic steps continuously without interruption [3,4]. This was only possible because we got help from ZIH, TU Dresden, which provided an efficient library for data transfer between GPUs and tools to measure the performance of our code.

What does this mean for simulations?

We want to speed up the time of the simulation to reduce the time between the start of the simulation and the reception of the final result. With GPUs this speed up can mean that a simulation that normally takes a week to finish can finish within a few hours.