The PEPC project (Pretty Efficient Parallel Coulomb Solver) is a public tree code that has been developed at Jülich Supercomputing Centre since the early 2000s. Our code is a non-recursive version of the Barnes-Hut algorithm, using a level-by-level approach to both tree construction and traversals.
The oct-tree method was originally introduced by Josh Barnes & Piet Hut in the mid 1980s to speed up astrophysical N-body simulations with long range interactions, see Nature 324, 446 (1986). Their idea was to use successively larger multipole-groupings of distant particles to reduce the computational effort in the force calculation from the usual O(N2) operations needed for brute-force summation to a more amenable O(N log N). Though mathematically less elegant than the Fast Multipole Method, the Barnes-Hut algorithm is well suited to dynamic, nonlinear problems, can be quickly adapted to numerous interaction kernels and combined with multiple-timestep integrators.
Simulation of interaction of laser radiation with neutral nanocluster with 3871 ions and electrons. Bottom left: spacefilling Hilbert curve used in the code for domain decomposition; Bottom right: Oct-tree domain decomposition represented by the so-called branch nodes.
The PEPC project (Pretty Efficient Parallel Coulomb Solver) is a public tree code that has been developed at Jülich Supercomputing Centre since the early 2000s. Our tree code is a non-recursive version of the Barnes-Hut algorithm, using a level-by-level approach to both tree construction and traversals. The parallel version is a hybrid MPI/pthreads implementation of the Warren-Salmon 'Hashed Oct-Tree' scheme, including several refinements of the tree traversal routine - the most challenging component in terms of scalability.
The code is structurally divided into three parts:
Due to this structure, the adaption to further interaction kernels as well as additional applications and experimental variations to the tree code algorithm can be implemented conveniently.
Structure of the tree code framework. Currently, PEPC supports three interaction-specific modules and several respective frontends for different applications ranging from plasma physics through vortex-particle methods to smooth particle hydrodynamics. Due to well-defined interfaces, symbolized by the grey blocks, the implementation of further interaction models as well as additional frontends is straightforward.
PEPC itself is written in Fortran 2003 with some C wrappers for POSIX functions. In addition to the main pthreads-parallelised tree traversal, a version based on OpenMP is available and there has also been a version using SMPSs. Besides the hybrid parallelisation, a number of improvements to the original Warren-Sallmon ‘hashed oct-tree’ scheme have been included and allow for an excellent scaling of the code on different architectures with up to 2,048,000,000 particles across up to 294,912 processors (at the then largest machine at JSC: JUGENE).
In addition, it has been adapted to > 32 parallel threads per MPI rank to take full advantage of the capabilities of the BlueGene/Q installation JUQUEEN that was available at JSC. As a result it showed great parallel scalability across the full machine and qualified for the High-Q Club.
Weak scaling and parallel efficiency of the parallel Barnes-Hut tree code PEPC across the full Blue Gene/Q installation JUQUEEN at JSC.