(PhD06) A First Principles Approach to Performance and Power Models for Contemporary Multi- and Many-Core Processors
Performance Analysis and Optimization
TimeMonday, June 25th1:21pm - 1:25pm
DescriptionThe scientific objective of the research is to better understand performance,
power, and energy properties of contemporary HPC processors through analytical
modeling. The practical objective is to develop models, best practices, and
tools related to performance, power, and energy modeling and engineering.
Current results include the extension and refinement of the original ECM
model, as well as the development of quantitative power and energy models.
The ECM model was originally developed for Sandy Bridge processors. This
model was ported to and evaluated on the Intel Haswell, Broadwell, Skylake,
AMD Zen, and IBM POWER8 processors. This required modeling of the following
components/features: store-through and victim caches; IBM's Centaur memory
buffer chip; partial and full overlap of transfers in the cache/memory
hierarchy; streaming stores; cluster-on-die mode; separate clock domains (core
vs. Uncore frequency); impact of variable Uncore frequency on L3 and memory
bandwidths. Further the memory bandwidth model was refined to take into
account reduced throughput with increasing bus utilization, fixing a problem
with multi-core performance estimates near the saturation point (where earlier
relative model errors of 20+% were reduced below 1%). Apart from an accurate
model providing upper performance bounds on supported microarchitectures,
insights into workings of these microarchitectures gained during
investigations are of practical relevance to performance engineering.
A quantitative power model for steady-state codes was developed that estimates
processor power consumption based on core count, core and Uncore frequencies.
Motivated by physical first principles the model resorts to fit parameters to
describe the relationship between core and Uncore frequencies to dissipated
power; nevertheless, the mode's analytic nature enables insights into how
different chip components interact with code and contribute to power
consumption. To the best of my knowledge: It is the only analytic model that
takes all relevant processor parameters into account (active cores, various
frequencies). Model quality is unprecedented (maximum relative model error
over all examined applications, processors and parameters was 4%). Also, it is
the only analytic model that works for both scalable and saturating codes. The
latter is address with the help of the ECM model: By relating multi- to
single-core performance estimates the application's parallel efficiency is
calculated and per-core power consumption dampened accordingly.
By combining the performance and power models an analytic energy model can be
constructed. The analytic nature of the energy model provides an point of
entry for analytic deductions: Conclusions drawn from observations of
empirical data can be derived analytically from the model, thereby cementing
In addition to theoretical insights, the models can be used to identify
optimum performance, energy-to-solution, and EDP and their corresponding
operating points. Determining the model parameters takes few measurements
(six on processors with single clock domain, nine on those with separate clock
domains). Assuming five-minute samples (to reach temperature equilibrium), it
takes less than an hour to determine model parameters vs. two weeks to sweep
all 3672 parameter combinations of a Xeon E5-2697 processor (18 cores, 12 and
17 different core and Uncore frequencies, respectively).