Hans Meuer Award Finalists
Research Paper
Hans Meuer Award Finalist 1: Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs
Event Type
Hans Meuer Award Finalists
Research Paper
Parallel Algorithms
Parallel Applications
Performance Analysis and Optimization
TimeMonday, June 25th4:10pm - 4:50pm
LocationPanorama 1
DescriptionChebyshev filter diagonalization is well established in quantum
chemistry and quantum physics to compute bulks of eigenvalues of
large sparse matrices. Choosing a block vector implementation, we
investigate optimization opportunities on the new class of
high-performance compute devices featuring both high-bandwidth and
low-bandwidth memory. We focus on the transparent access to the full
address space supported by both architectures under consideration:
Intel Xeon Phi "Knights Landing" and Nvidia "Pascal"/"Volta."
After a thorough performance analysis of the single-device
implementations using the roofline model we propose two optimizations: (1)
Subspace blocking is applied for improved performance and data
access efficiency. We also show that it allows transparently
handling problems much larger than the high-bandwidth memory without
significant performance penalties. (2) Pipelining of communication
and computation phases of successive subspaces is implemented to
hide communication costs without extra memory traffic. As an
application scenario we perform filter diagonalization studies for
topological quantum matter. Performance numbers on up to 2048 nodes
of the Oakforest-PACS and Piz Daint supercomputers are presented, achieving
beyond 500 Tflop/s for computing 100 inner eigenvalues of sparse
matrices of dimension 4 billion.