PIConGPU is a relativistic Particle-in-Cell code running on graphic processing units as well as regular multi-core processors. It is Open Source und is freely available for download.
PIConGPU is an extremely scalable and platform portable application for particle-in-cell simulations. While we mainly use it for studying laser-plasma interactions, it has also been used for astrophysics studies of the Kelvin-Helmholtz-instability.
PIConGPU has been a finalist for the prestigious Gordon-Bell-Award in 2013 and has been one of the flagship applications for a number of leading edge high performance computing (HPC) systems since then (Titan, JUWELS Booster, Frontier1, Frontier2, Frontier3). Through this work, PIConGPU has established strong ties with a lot of national and international partners, especially the underlying hardware agnostic libraries like Alpaka and Llama are now adopted in the CERN LHC software stack as well. Another collaborative effort also driven by PIConGPU is a standardization in data formats for plasma physics via openPMD, which is becoming one of the leading data standards in the community.
A snapshot from a simulation of an ultrashort, high-intensity laser pulse (orange-striped sphere) driving a plasma wave in ionized helium gas on the Oak Ridge Leadership Computing Facility’s (OLCF) Summit supercomputer. Purple areas highlight the electron density. Streams depict the stronger (red) and weaker (green and blue) electric fields. See also video on this (Link to Youtube).
This image was generated using ISAAC, a tool for visualizing simulations in real time on the Frontier supercomputer being built at OLCF. Image Courtesy of Felix Meyer/Helmholtz-Zentrum Dresden-Rossendorf.
The PIC algorithm approximates the solution of the so-called Maxwell-Vlasov equation. In this approach electric and magnetic fields are interpolated on a physical grid dividing the simulated volume into cells.
Charged particles like electrons and ions are modeled by macro-particles. Each of these describe the motion of up to several hundred real particles with same momentum by the motion of a single spatially spread-out particle distribution. The macro-particles' motion is influenced by the electric and magnetic fields on the grid.
The particle motion in turn creates currents. Following Ampère's law these currents create magnetic fields, which then are used to compute the electric fields as described by Faraday's law.
These new fields then act back on the particles, starting the next iteration cycle.
GPUs (Graphics processing units) show very high computational performance, because many processors work in parallel. In order to make the most out of this performance, the processors should work independently of each other. In case of the PIC-Algorithm this is hard to achieve, since in the electrical current deposition step, currents which are fixed to the cells have to be computed from the velocity of particles moving freely between grid cells. This motion leads to memory access patterns in which electrical current data and particle data are located at different places in the memory and parallel processes can disturb each others execution when accessing the same part of the memory. This problem was solved in our group using a novel data model for particle and grid-based data and asynchronous data transfer. This enables to compute 100s of billions of macro-particles on a GPU compute cluster.
Furthermore, PIConGPU is written in a hardware agnostic way using an hierarchical system model, that is mapped to the available compute resources by the zero-overhead compile time library Alpaka. As a result, PIConGPU shows similarly great performance on all supported platforms (CPUs from all vendors, as well as AMD and NVIDIA GPUs and FPGAs).
No, because GPUs do not have enough memory to simulate large physical systems. This makes it necessary to use more than one GPU and distribute the simulated volume between the GPUs.
The problem with this approach is the data transfer between GPUs. For this, data has to be transferred from the GPU to the main memory of the computer housing the GPU. The data has then to be sent via network to the other computers housing the other GPUs
This process normally takes a long time and GPUs have to wait for the end of the data transfer before continuing their computations.
We were able to solve this problem by interleaving the data transfer between GPUs and the computation on a single GPU, so that the GPUs can execute the algorithmic steps continuously without interruption.
We want to speed up the time of the simulation to reduce the time between the start of the simulation and the reception of the final result. With GPUs this speed up can mean that a simulation that normally takes a week to finish can finish within a few hours. This enables creating digital twins of lab experiments for day-to-day simulation feedback within an experimental campaign instead of weeks between experiments and simulation results.