(PhD12) From Molecular Dynamics towards a Node-Level Auto-Tuning Library for N-Body Simulations
Performance Analysis and Optimization
Scientific Software Development
TimeMonday, June 25th1:45pm - 1:49pm
LocationAnalog 1, 2
DescriptionMolecular Dynamics simulations are tools of great potential for fields like Chemical or Biological Engineering. From a Computer Science point of view, building highly scalable and versatile simulation codes proves to be an interesting but challenging task. Due to the nature of the problem, system characteristics have a drastic impact on the time to solution, node-level performance and can influence runtime by up to orders of magnitude. To make matters worse these characteristics can also change at runtime, thus requiring adapting solution strategies during runtime.
Over the last years, ls1-mardyn was developed at the Chair of Scientific Computing in Computer Science at the Technical University of Munich. This is a highly scalable code for molecular dynamics simulations of small rigid multi-centered molecules. The code is developed in C++ with an efficient vectorization and a hybrid OpenMP + MPI parallelization which was shown to be capable of excellent scaling up to thousands of nodes. ls1-mardyn also features a reduced memory mode through which it was possible to perform the largest particle simulation to date, containing about 2 * 10^13 molecules.
In every particle simulation short range, pairwise interactions make up a significant portion of the overall runtime. While developing ls1-mardyn it became obvious that there is no silver bullet to provide highly scalable and efficient approaches for every scenario. As an example, a decision needs to be made concerning the underlying algorithm of a simulation. Classic choices are either Verlet lists, which provide good computational efficiency, or Linked Cells which excel at memory efficiency. Also, tree structures provide interesting options. Further aspects that are difficult to get efficient are SIMD vectorization (data layout, intrinsics vs compiler generated), OpenMP patterns or optimization for accelerators like Xeon Phi. For this reason, we want to start the open source project AutoPas: a node-level library for short-range pairwise interactions. The vision for the library is to be easy to use and to provide a powerful general-purpose base to build full N-Body simulations on top of it. The library aims to be highly flexible by providing different particle containers, OpenMP parallelization patterns, and vectorization options.
Two of the core requirements to achieve the goals are a modular code design and auto-tuning. The first is necessary since different scenarios benefit from dedicated solution techniques, so for optimal flexibility, these need to be exchangeable in an easy way. The optimal combination of these techniques depends on many characteristics and can change at runtime. Furthermore there exist too many combinations of techniques to just test all in order to find the optimum empirically. Auto-tuning enables the code to find the optimal combination of aforementioned techniques on its own and adapt itself during runtime by periodically reevaluating the current state of the simulation. One approach to this is to create performance models for characteristics that define the scenario and empirically compare the combinations the performance model predicted to be the most efficient.
Our current research focuses on expanding the library and exploring ways how to employ performance modeling for auto-tuning.