Bright Computing: Scalable Accounting & Reporting for Compute Jobs
TimeWednesday, June 27th12:40pm - 1pm
DescriptionHPC systems usually come with a significant price tag, which means that it is highly desirable to be able to gain insight into how effectively the resources in an HPC system are being used. It is important to be able to differentiate between reservation of resources and actual resource usage (e.g. wall-clock time versus CPU time) to be able to make statements about efficiency. In this talk we will describe the workings of an accounting & reporting engine that was introduced in the latest version of Bright Cluster Manager. HPC system administrators can use this to answer questions such as "Which users typically allocate resources that they don't use effectively?" or "How much power was consumed for running jobs of type X by user Y?". In particular we will address how the system can scale to large numbers of nodes & jobs, and how the use of the PromQL query language provides flexibility in terms of report generation.