Research Poster
(RP01) Can Unified-Memory support on Pascal and Volta GPUs enable Out-of-Core DNN Training?
Event Type
Research Poster
AI/Machine Learning/Deep Learning
Computer Architecture
HPC Accelerators
Programming Models & Languages
TimeTuesday, June 26th8:30am - 10am
DescriptionExisting Deep Neural Network (DNN) training frameworks like Caffe and CNTK
cannot train large DNNs that do not fit the GPU memory without explicit memory
management schemes. In this poster, we propose OC-DNN - a novel
Out-of-Core DNN training framework that
exploits the new Unified Memory (UM) features (since CUDA 8) along with new
hardware mechanisms in Pascal and Volta GPUs. OC-DNN has two major
design components --- 1) OC-Caffe; an enhanced version of Caffe that
exploits innovative UM features like asynchronous prefetching, managed page
migration, and exploitation of GPU-based page faults and cudaMemAdvise interface
to enable efficient out-of-core training for very large DNNs and/or DNNs that
require large batch sizes, and 2) an interception library to transparently
leverage these cutting-edge features for several other DNN frameworks without
any design changes. To the best of our knowledge, this is the first attempt to
design an out-of-core DNN training framework that exploits CUDA UM interface in
tandem with page migration and prefetching capabilities of Pascal/Volta GPUs to
deal with memory-bound out-of-core DNNs with high-performance and
high-productivity. We provide a comprehensive performance characterization of
our designs. OC-Caffe provides comparable performance (to Caffe) for regular
DNNs. OC-Caffe-Opt is up to 1.9X faster than OC-Caffe-Naive and up
to 5X faster than optimized CPU-based training for out-of-core workloads.
OC-Caffe also allows scale-up (DGX-1) and scale-out on multi-GPU clusters.
Poster PDF