(RP02) Early Evaluation of a New Vector Processor SX-Aurora TSUBASA
Performance Analysis and Optimization
TimeTuesday, June 26th8:30am - 10am
DescriptionA brand-new vector supercomputer, SX-Aurora TSUBASA, has been launched. It has a newly developed Vector Engine(VE) processor. VE is developed to achieve a high sustained performance by powerful vector processing and a high memory bandwidth.
VE is equipped with eight vector-cores, each of which executes 256 operations per vector instruction. As the peak performance of each core is 307.2 GFlop/s at 1.6GHz, the peak double-precision performance of VE reaches 2.45 TFlop/s.
VE and six HBM2 memory modules are integrated by using a CoWoS (Chip on Wafer on Substrate) technology to provide both high memory bandwidth and large memory capacity, which is the world-first integration. The memory provides a 1.22 TB/s memory bandwidth and a 48GB memory capacity. This memory capability enables a high computational efficiency especially for memory-intensive applications.
Furthermore, SX-Aurora TSUBASA employs a new execution model. VE is connected to an X86 server called Vector Host(VH) via PCIe. Different from the conventional accelerator execution model, a whole application is executed on VE, and system calls are automatically offloaded into VH. This new model improves usability and avoids special programming for SX-Aurora TSUBASA.
This paper examines the potential of SX-Aurora TSUBASA through the performance evaluations. SX-Aurora TSUBASA achieves about 987 GB/s stream performance, which is about 4.7 and 11.6 times higher than the previous vector processor SX-ACE and Xeon Gold 6126, respectively. The high sustained memory bandwidth contributes 2.0 to 9.8 times performance improvements on application kernels. These results indicate the high potential of SX-Aurora TSUBASA for accelerating memory-intensive applications.