(PhD10) Quantification of Load Balancing related Performance Optimizations
Performance Analysis and Optimization
Scientific Software Development
TimeMonday, June 25th1:37pm - 1:41pm
LocationAnalog 1, 2
DescriptionQuantification of load balancing related performance optimizations
While extensive research in the area of load balancing MPI applications broad up a wide range of metrics to describe and differentiate load imbalance as well as numerous methods and algorithms to improve load imbalances, the field still lacks criteria to specify the applications sensitivity to load imbalances and measures to quantify the actual costs and benefits of load balancing efforts.
Earlier performance analysis of two simulation codes showed that the implemented data structures, tailored to the highly tuned computational routines, provide little flexibility for load balancing (LB) and restrict the LB quality. Implementation and testing of alternative approaches (data structures and LB strategies) is unreasonable considering size and complexity of the source code. Application developers are left alone with the assessment and decision of expensive redesign issues in order to optimize application runtime.
The central idea of this thesis is the development a software engineering tool that facilitates MPI application developers to study the effects of alternative load balancing strategies and data structures for their problems without time-consuming code modifications to the original application or cumbersome analytical modeling. The aim is to simulate the application's behaviour and runtime based on abstract information of the application. This includes a description of the application pattern as well as the specification of the computational and communicational load per work item. The user represents the application schedule as a sequence of computation and communication phases. Each phase may have one or more weights associated with it to characterize static or dynamic work load per work item respectively. Computation load describes the time it takes to process a single item or element and communication load models the connectivity between items as well as their amount of data exchange. This information is provided by the user either from direct measurement within the application, utilization of a performance measurement tool, a hypothetical estimate or a combination of the above. This abstract application description decouples the mapping of items to processors from the original application and enables to apply different partitionings. For a given partitioning, the simulator executes the application schedule with its prescribed weights on the target architecture and allows for direct performance analysis.
One application of this load balancing simulator is to apply different load distributions (partitionings) to the two simulation codes and compare the resulting simulation timings. This quantification of potential performance optimization co-determines the decision for expensive software re-design. Further use cases target the multi-stage nature of scientific applications: the coupling of different solvers in APESmate and the handling of multi-level grids in APES. We investigate the effects of alternative data structures and processing strategies by modeling and simulation in our load balancing simulator.
This work implies detailed studies to understand and model the complex interplay between data structures, application pattern, partitioning and system parameters. Software developers need applicable means and models to early identify optimization potential in their codes and make informed choices about adaptions to load balancing strategy and data structure.