(PP04) The High-Q Club
Performance Analysis and Optimization
TimeTuesday, June 26th3:15pm - 3:45pm
DescriptionThis poster summarises knowledge gained from running and tuning highly-scalable applications on JUQUEEN, the IBM Blue Gene/Q at Jülich Supercomputing Centre. The ability to execute successfully on all 458,752 cores with up to 1.8 million processes or threads could qualify codes for the High-Q Club, which serves as a showcase for diverse codes effectively defining a collection of the highest scaling codes on JUQUEEN. The intention was to encourage other application developers to invest in tuning and scaling their codes while identifying the necessary aspects for that goal. At the close of this era with the decommissioning of JUQUEEN in spring 2018, it is timely to compare the characteristics of the 32 High-Q Club member codes, considering their strong and/or weak scaling, exploitation of hardware threading, and whether/how intra-node multi-threading is employed combined with message-passing. Standard programming languages and MPI combined with
multi-threading was found to be sufficient, and provided a straightforward migration path for application developers supported by scalable tools and optimised libraries which has also delivered performance and scalability benefits on diverse HPC computing systems. Obstacles for scaling such as inefficient use of limited compute node memory and file I/O are identified as key governing factors. Details are discussed in a review article [DOI: 10.14529/js180104]. Overall, the analysis provides guidance as to how applications may (need to) be designed or adapted to exploit expected exa-scale computer systems, although there are likely to be significant additional aspects such as processor heterogeneity and variability. We are considering how the High-Q Club should evolve for assessing extremely scalable applications on successor leadership computer systems, and invite suggestions from the HPC community.